Multilingual Legal AI Benchmarks & Datasets

Multilingual legal datasets and human-audited benchmark infrastructures designed for enterprise AI evaluation, multilingual legal retrieval, regulatory grounding, and trustworthy legal AI systems.


Available now

EU AI & Data Governance — Gold Dataset & Benchmark Suite


  • 17 EU Digital Regulatory Frameworks

  • 12 Aligned EU Languages

  • ~60,000 Multilingual Legal Text Units

  • Human-Audited Benchmark Infrastructure

Why it matters

Most legal AI systems are still evaluated using general-purpose NLP benchmarks not designed for multilingual legal reasoning, cross-language regulatory consistency, or enterprise legal retrieval.


QA & Validation


  • Human-audited benchmark validation

  • Dual-stage annotation workflows

  • Multilingual alignment review

  • 92.54% audited accuracy

  • Structured QA governance

Upcoming Releases


  • CJEU Case Law

  • EU Competition Law

  • International Arbitration

  • ESG & Sustainability

  • Financial Regulation

About

THT Legal Data was created by François-Olivier Manson, PhD in Law.

The project was developed to support multilingual legal AI evaluation through structured legal datasets, human-audited benchmark infrastructures, and cross-language regulatory alignment workflows.

The objective is to contribute to more reliable, auditable, and trustworthy legal AI systems operating across multilingual regulatory environments.

© François-Olivier MANSON

14, rue des Malapets

65400 Beaucens

France


Hosting


Framer B.V.

Rozengracht 207B

1016 LZ Amsterdam