The Search Term Tug-of-War: What About Machine Learning

Keyword search continues to dominate the eDiscovery process and it is almost a certainty that the parties will disagree on the appropriateness of certain keywords. Unfortunately, if the parties are unable to reach an agreement on keywords, the matter may need to be resolved by the court. 

The decision in Jim Hawk Truck-Trailers of Sioux Falls, Inc. v. Crossroads Trailer Sales & Serv., Inc., 2022 WL 3010143 (D.S.D. 2022) is another example of a keyword dispute.  The defendant ran 92 of the 99 requested keywords and proceeded to apply a Continuous Active Learning process to the records hitting the keywords. The plaintiff filed a motion to compel to require the defendant to use the 7 disputed keywords.  The disputed keywords used wildcards and, as is common when wildcards are used to expand the query, a substantial number of documents were returned by the terms.

While the case presents an everyday pattern in eDiscovery, a few points in the decision are curious.

First, the court used a "reasonable accessibility" analysis to decide the keyword dispute rather than the proportionality test. If the requested discovery was within the scope of discovery defined in Rule 26(a), it is difficult to understand how a keyword dispute involving searchable active data would reach the reasonable accessibility limitation in Rule 26(b)(2)(A).   

To be within the scope of discovery under Rule 26(a), the discovery must be "proportional to the needs of the case" and one of the enumerated considerations is whether "the burden or expense of the proposed discovery outweighs its likely benefit."  What would make a keyword search on active data whose burden does not outweigh the likely benefit of the discovery fail because of reasonable accessibility?

Second, the court pointed to the low prevalence rate (7 percent) in the prior reviewed data, and assumed that the entire set of data that hit the new terms would take 600 hours of attorney time to review. There was no mention of recall.

Machine learning provides an important tool for applying proportionality. Where search terms produce results with a very low percentage of relevance, applying machine learning and limiting the scope of review to documents with high probability of relevance reduces the cost burden and provides more protection against missing important evidence. Machine learning can reduce keyword disputes and provide a viable solution that balances the burden of review with the need to produce important information in discovery.

 
Back to Blog
card-2

Related Articles

eDiscovery Search: Focus on Negotiating Search Process Not Keywords

While I hate to show my age, this blog topic reminds me of a statement I heard at one of the first...

An eDiscovery Dilemma: What Are My DeDuping Options

Sometimes it’s good to get back to the basics of eDiscovery. This post will focus on how to deal...

Preserving Chat Data: Takeaways from the Google Play Store Case

Chat is becoming an increasingly important tool for business communication, as employees rely more...