Namaste Yogis. Welcome to the Blockchain & AI Forum, where your blockchain and artificial intelligence technology questions are answered! Here no question is too mundane. As a bonus, a proverb is also included. Today’s question, was submitted by Esquire Nave and he wants to know if he should be worried if his lawyer uses artificial?

Attorney Nava, you came to the right place. Great question I found an interesting article on this subject, titled AI on Trial, Legal Models Hallucinate in 1 out of 6 Queries. https://hai.stanford.edu/news/ai-trial-legal-models-hallucinate-1-out-6-queries. The report was produced by Stanford’s Human AI Center. Let’s examine their findings and the implications for the legal profession, beginning with basic findings of fact.
Large language models such as ChatGPT, are not ready for prime time in the courthouse, is the headline. Stanford’s HAI Center study reports error rates were as high as 69%. Holy inadmissible evidence Batman! U.S. Supreme Court Chief Justice felt compelled to make a point regarding this problem in his 2023 Annual Report on the Judiciary https://www.supremecourt.gov/publicinfo/year-end/2023year-endreport.pdf
Is there a solution? People working on retrieval augmented generation (RAG) say yes, and they have it. According to RAG advocates, RAG reduces hallucinations in domain-specific contexts. RAG proponents claim they can avoid hallucinations and can guarantee legal citations. Essentially, RAG systems promise to deliver more accurate and trustworthy legal information by integrating a language model with a database of legal documents. Can RAG advocates deliver on these bold claims? Not yet says, Stanford’s HAI Center.
Stanford put two RAG providers to the test, Lexus-Nexus and Thomson-Reuters. Stanford found RAG providers are less prone to hallucinations when compared to ChatGPT, etc. However, given that non-RPG models produced an error rate of 69% that wasn’t much of an accomplishment. According to Stanford, RAG models hallucinated 17% (1 of 6) when prompted. Which brings us to the next question, what types of queries were asked of the RAG models. Answer, there were four basic types.
The Stanford HAI Center says they prompted the models with the following four types of queries:
1) general research questions (questions about doctrine, case holdings, or the bar exam);
2) jurisdiction or time-specific questions (questions about circuit splits and recent changes in the law);
3) false premise questions (questions that mimic a user having a mistaken understanding of the law); and
4) factual recall questions (questions about simple, objective facts that require no legal interpretation).
The RAG models fell into one of two types of hallucination errors.
The first was to describe the law incorrectly or make a straight factual error. The second type of error was mis-grounded. When mis-grounded errors were produced, the AI model describes the law correctly, but cited source which did not support its claims. Either way, a problem; a big problem. Mis-grounded hallucinations might be more troublesome because they undermine the point of AI in legal research; namely, AI reduces the time required to conduct research.
Stanford identified three challenges facing RAG model developers. First, legal retrieval is difficult because the “law” is more than verifiable facts; it also consists of rulings and interpretations. Second, the complexity of American jurisprudence means retrieval of inappropriate authoritative documents happens often. Last, sycophancy is a concern. Sycophancy is the tendency of RAG models to agree with the user’s incorrect assumptions.
Bottom line–RAG models are superior to ChatGPT but require greater transparency, refinement, and not yet ready for Broadway.
We end with a proverb from Estonia: One fool can ask more questions than 10 men can answer.
Until next time,
Yogi Nelson
