LawEngine reference run
Deterministic verification lane
Deterministic verification against primary sources. No AI sits in the verification path.
A public citations benchmark built around one simple question: does the quoted passage match the cited authority? Each row allows four exact responses, and the grading key stays server-side.
On GitHub, we call the current release cite-bench-v1. More real-world benchmark releases are planned.
Reference runs
The LawEngine reference lane and the frontier-model baseline answer the same public benchmark differently. That is the point of the benchmark.
Deterministic verification lane
Deterministic verification against primary sources. No AI sits in the verification path.
GPT-5.4-mini on the same public pack
Roughly 3 out of 10 citation checks wrong on this task.
How it works
Pull the 500-row public citations benchmark and the submission template directly from the public repo.
Emit one exact response per row from the same four-label contract used by the public scorer.
LawEngine scores against the protected key and returns accuracy, F1, label counts, and a confusion matrix.
Submission contract
Each benchmark row contains a citation and quoted passage. Your system should output a CSV with exactly two columns, id and predicted_status. The code values below are exact. The plain-English titles explain the public meaning.
Use VERIFIED when the quoted passage appears in the cited authority and the citation is substantively correct.
Use NOT_FOUND when the quoted passage cannot be found in the cited authority or in the current public benchmark corpus.
Use MISATTRIBUTED when the quoted language is real but belongs to a different authority than the one provided.
Use CITATION_UNRESOLVED when the citation string itself cannot be tied to a live authority in the benchmark corpus.
Upload
Benchmark pack
Label contract
Accepted submission formats