In May 2023, a New York attorney submitted a brief citing six cases that did not exist. They had been generated by ChatGPT. The attorney was sanctioned. The incident became the most-cited cautionary tale in legal AI adoption.
That story — and the dozen similar ones that followed — is not an argument against AI in legal research. It is an argument for understanding how different AI systems handle citations, and choosing tools that ground their output in verifiable sources.
What hallucination actually means
Hallucination in large language models refers to the generation of plausible-sounding but factually incorrect content. In general-purpose AI, this might mean a wrong date or a misattributed quote. In legal research, it means case citations that don't exist, holdings that misrepresent actual decisions, or statutory provisions that have been amended or repealed.
The problem is structural: generative models are trained to produce fluent, coherent text that fits the context of the prompt. They are not trained to be factually accurate in the way a database query is accurate. When asked for a case on a specific legal point, a model that has not been specifically constrained will produce a citation that fits the expected format — whether or not the underlying case exists.
Discover Whisperit
The AI workspace built for legal work
Dictate, draft, and organise your cases — with full data sovereignty and no prompt engineering required.
Try Whisperit free →The difference: retrieval-augmented vs. pure generation
The solution to hallucination in legal research is retrieval-augmented generation (RAG). In a RAG system, the model first retrieves relevant documents from a verified legal database, then generates its response grounded in those retrieved documents.
The distinction matters enormously in practice. A pure generative response to 'find me Swiss case law on vicarious liability' produces citations from the model's weights — which may be outdated, misremembered, or entirely fabricated. A retrieval-augmented response first queries a verified corpus of Swiss Federal Court decisions, then summarises the relevant cases it actually found.
- RAG systems can show you the source document for every claim
- Citations in RAG output correspond to real, retrievable documents
- Generative-only responses should never be cited in legal submissions without independent verification
- The gap in quality between RAG and pure generation is largest for jurisdiction-specific and recent case law
Evaluating a legal AI tool: the right questions
When evaluating any AI legal research tool, the critical question is not 'how good is the AI?' but 'how is the AI grounded?'
- What legal database does the system query? Is it current?
- Are citations traceable back to source documents?
- Can you view the underlying document for any case mentioned?
- Does the interface clearly distinguish between retrieved facts and generated commentary?
- What is the cutoff date for the training data and the legal database?
Professional responsibility implications
In Switzerland, as in most jurisdictions, the lawyer bears professional responsibility for the accuracy of submissions. The fact that an AI tool produced an incorrect citation is not a defence. Swiss Bar Association guidelines on the use of AI tools are still developing, but the underlying principle is clear: AI is a tool, and the lawyer using it is responsible for verifying its output.
This is not an argument for avoiding AI research tools — it is an argument for using them correctly. A well-designed, retrieval-grounded AI research tool, used with appropriate review, reduces research time and improves coverage. The key is selecting tools where verification is built into the workflow, not an afterthought.