An Open Invitation to the Legal Profession

TheLegalBenchInitiative

A community of legal experts defining the standard for evaluating AI agents — across legal practice, in-house teams, and commercial business.

An independent, community-led research initiative. Rigorous. Transparent. Profession-driven.

Express Your Interest Learn how it works ↓

The Challenge

The Problem No One Is Solving

AI agents are already drafting contracts, summarising case law, and producing legal memoranda. Firms are adopting them. Clients are expecting them. Regulators are watching.

But here is the uncomfortable question: who decides whether the output is actually any good? Right now, the answer is largely the technology companies themselves. The legal profession — the very community whose standards of care, professional judgment, and client obligations are at stake — has had almost no role in defining what “good” looks like for AI agents operating in legal and commercial contexts.

This is not a technology problem. It is a quality and accountability problem. And it belongs to the lawyers who understand the work.

The Initiative

What We’re Building

The Legal Bench Initiative is an open, community-led effort to create rigorous, lawyer-defined evaluation standards for AI agents used in legal work. This covers legal practice, in-house legal teams, and commercial business contexts — anywhere AI agents are generating, reviewing, or advising on legal documents. Think of it as the profession writing its own exam paper for AI — rather than letting AI grade itself.

As a participating evaluator, you will:

Review Real AI-Agent Output

Review documents generated by AI legal agents across practice areas and apply your expert judgment using a structured evaluation framework.

Contribute to Published Research

Contribute directly to published research that will become a reference point for how the profession evaluates AI agents.

Shape the Evaluation Criteria

Your insights on what matters in legal quality will inform how AI agent performance is measured across the industry.

This is not a training exercise. It is a genuine research collaboration in which your professional expertise is the essential ingredient.

Evaluators

Who We’re Looking For

We are assembling a deliberately small, carefully selected group of qualified legal professionals. The quality and credibility of the benchmark depends entirely on the calibre of the lawyers involved. We welcome qualified lawyers from any jurisdiction.

International firms, boutique practices, in-house legal teams, barristers’ chambers, and legal academics with practice experience are all welcome. What matters is the quality of your professional judgment, not the size of your practice.

Qualified and Practising

Active legal drafting experience — whether in transactional, advisory, contentious, or regulatory work.

Professional Scepticism

The ability to identify not only obvious errors but subtler failures: misplaced emphasis, legally plausible but practically dangerous phrasing, or silently omitted protections.

Independence of Judgment

We need honest assessment, not validation. If the output is poor, evaluators must be willing to score it accordingly.

Interest in AI's Role in Legal Practice

You do not need to be a technologist, but you should be engaged with the question of how AI agents are being adopted in the profession.

What matters is your judgment, not your letterhead.

How It Works

The Process

Application

Submit an expression of interest. We are selecting a small, curated cohort.

Orientation

Attend an online briefing session covering the research objectives, the evaluation framework, and practical instructions.

Evaluation

Receive AI-generated commercial contracts to assess independently at your own pace. Estimated time: approx. 2 hours.

Submission

Return completed scoring templates. Results are aggregated, analysed, and prepared for publication.

Publication

The benchmark paper is published with named credit for all contributing evaluators.

Time commitment

2–3 hours total

Format

Online orientation + structured evaluation

Cohort size

Strictly limited, curated group

Your role

Expert evaluator — your judgment is the data

For Participants

What You’ll Gain

This is a professional contribution — and we believe the value to participants is genuine and lasting.

Named Credit in Published Research

Every contributing evaluator is individually credited in the published benchmark paper. This is a permanent, citable record of your contribution to the field.

Practical AI Evaluation Expertise

Hands-on experience assessing AI-generated legal documents under a rigorous framework — not a curated demo or marketing presentation.

Founding Membership in a Professional Community

Early contributors become founding members of an ongoing initiative shaping how the legal profession holds AI agents to account.

Unfiltered Insight into AI Agent Capabilities

See how current AI legal agents actually perform when assessed by qualified lawyers under controlled conditions.

Questions

Frequently Asked Questions

TheLegalBench is an independent research initiative that recruits qualified lawyers to evaluate documents produced by AI legal agents under a structured framework. The results are published as benchmark research, establishing a profession-led standard for AI agent quality across legal practice, in-house teams, and commercial business.

TheLegalBench is led by a team of legal and AI professionals. Visit the Contributors page for details on the people behind the initiative. The evaluation framework, scoring methodology, and published findings are driven entirely by the participating legal professionals.

No. We are looking for legal expertise and professional judgment. If you can critically assess a legal document, you have everything required.

Approximately 2–3 hours in total, completed at your own pace within an agreed evaluation window.

This is a professional research contribution rather than a paid engagement. Evaluators receive named credit in the published paper, practical AI evaluation experience, and founding membership in the community.

No. As the purpose of this paper is transparency, names will be publicised within the research.

Yes. We welcome qualified lawyers from any jurisdiction worldwide. Cross-jurisdictional representation strengthens the benchmark.

All evaluators participate in an initial calibration session, a proportion of documents are scored by multiple evaluators independently, and inter-rater reliability is measured statistically using Cohen's kappa.

No. Only aggregated findings are published. Your individual scores are used for analysis but are not attributed to you individually in the paper.

The benchmark paper will be published openly and will be freely accessible. Publication details will be confirmed with evaluators before release.

Join the Cohort

Ready to Contribute?

Places are limited and selection is based on the quality and relevance of your professional experience.

Express Your Interest →

Not ready to contribute?

Be the first to know when results are published.

Join the waitlist for updates on the initiative and early access to published findings.