As law firms invest more heavily in artificial intelligence, we're seeing a growing challenge: separating fact from fiction when it comes to AI technology. Complex technical concepts often get misinterpreted as they move between developers, marketers, non-technical thought leaders and legal practitioners.
The Origin of Legal Tech Myths
Technical myths are bound to arise with any cutting-edge technology. When explaining complex technical concepts to non-technical audiences, some details can get oversimplified. The risk is especially great for legal AI, where we're dealing with both the complexity of artificial intelligence systems as well as the intricacies of the law.
Marketing teams craft stories intended to resonate with lawyers and executives, but these simplified explanations of sophisticated systems can inadvertently create misconceptions. Often, the mistaken concepts are amplified by well-intentioned industry "thought leaders" that lack the technical expertise or basis to render accurate opinions. As these partial explanations spread through law firms and legal departments they can become accepted facts that are difficult to unseat.
Case Study: Frontier Models vs. Task-Specific Models
Let's look at a recent example of a technical myth. Many people assume that the most advanced large language models (LLMs) are always the best choice for complex legal work. These “frontier” models, such as GPT-4 or Claude Sonnet, are certainly impressive, but do they reliably perform specific legal tasks?
We recently saw marketing explanations equate frontier models with "senior partner tasks" and smaller models with "junior associate" tasks. If the analogy is directed at an understanding that the appropriate-sized model should be selected based on the nature of the task, the statement is helpful. However, "thought leaders" can pick up this statement to support the assertion that a frontier model is better at performing legal tasks.
A recent academic research paper from a team at John Hopkins and the University of Maryland Law School studied the effectiveness of frontier models on simple legal tasks in “BLT: Can Large Language Models Handle Basic Legal Text?”. The following is the Abstract from the paper:
We find that the best publicly available LLMs like GPT-4 and Claude currently perform poorly on basic legal text handling. This motivates the creation of a benchmark consisting of examples that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs’ poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning on our training set brings even a small model to near-perfect performance. This benchmark will be useful for fine-tuning LLMs for downstream legal tasks, as well as for tracking LLMs’ reliability as-is for basic legal tasks.
In contrast to the technical myth that frontier models reliably perform simple legal tasks out-of-the-box, in reality, smaller task-specific models specifically trained for a legal task often perform significantly better in accuracy and reliability on the specific task (not to mention reduced latency and cost).
Lessons from eDiscovery: The Random Sampling Myth
We’ve been here before. Take predictive coding in eDiscovery as a cautionary tale. Early on, conventional wisdom within some eDiscovery "thought leaders" emerged that random sampling was the optimal approach for selecting training examples for classification models. This seemed logical on its face as random sampling was seen as providing a better overview of the data set.
But, as teams gained more experience with the technology, they discovered that continuous feedback loops focused on the relevant documents produced much better results than training on random samples.
"Continuous Active Learning" eventually became the accepted process for predictive coding in eDiscovery. Technical myths, however, slowed the adoption as the market worked through the misinformation.
Advice for Informed AI Adoption
Technical myths emerge because they offer simple, appealing explanations to complex technology. It is best to filter the reliability of any statements you come across in this area by determining whether they have a technical basis and look for concrete evidence that the basis has been validated.
Make sure that vendors provide empirical evidence of their AI's capabilities. This might include research studies and performance metrics (not to be confused with marketing use case studies).
Get different perspectives. Follow independent experts, including academics with technical expertise, data scientists, and legal technologists who can provide unbiased assessments based on valid studies. They can help you understand what's realistic and what's overstated or misstated.
Don't take conventional wisdom at face value. What sounds good in theory needs to prove itself in practice. Help your team understand AI's basic principles. When people understand the technology's real capabilities and limitations, they make better decisions about using it.
Remember the technology is changing quickly. Understandings and beliefs held today, will likely be out date next year. Remain open to new ideas and constantly challenge existing beliefs.
Looking Forward
Being skeptical doesn't mean resisting progress – it means making informed decisions. As AI continues to change legal practice, understanding its real capabilities becomes increasingly important.
By carefully validating claims and basing strategy decisions on evidence rather than technical myths, we can accelerate the adoption of AI in the legal industry. The key is maintaining a balanced perspective – recognizing AI's significant potential while staying grounded in technical reality.