
Revolutionizing Legal AI Benchmarks with “AI Law”
In the ever-evolving landscape of legal technology, Artificial Intelligence (AI) is proving to be a transformative force. AI is increasingly being integrated into the legal profession, aiming to enhance efficiency and accuracy in handling complex tasks. However, measuring the true potential and performance of AI in legal applications requires rigorous benchmarking. This is where "AI Law," by Instant.Lawyer excels —a framework specifically designed to evaluate the capabilities of Large Language Models (LLMs) on real-world legal tasks.
At Instant.Lawyer, integrating cutting-edge AI solutions is essential to streamlining services for clients. The innovations offered by frameworks like AI Law provide significant insight into how AI can be evaluated and deployed to enhance legal services.
"AI Law has the potential to radically improve the efficiency of lawyers, clients and third parties by completing real world tasks - Instantly. "
What Is AI Law?
AI Law is a benchmarking framework developed by Instant.Lawyer that evaluates LLMs by focusing on real-world, billable tasks performed by lawyers. Unlike traditional benchmarks that rely on multiple-choice questions or one-size-fits-all evaluations, AI Law is designed to capture the complexity and nuances of legal work. It uses tasks such as drafting legal documents, assessing risk, and advising clients—tasks that require deep legal reasoning and domain expertise.
The framework was developed by legal professionals with extensive experience in practising law, ensuring that the tasks reflect real-world challenges. These benchmarks are structured around "time entries," which are records of the billable work lawyers perform for clients. These time entries are then converted into tasks that can be tested by AI and ML models.
"By abstracting AI Law to a plurality of layers, Instant.Lawyer's proprietary technology is able to generate AIML models that review AIML models. Like compound interest but for AI/ML models. The composability is a complete gamechanger. "

By emphasizing real-world applicability, AI Law allows for a more comprehensive assessment of how AI systems can support or supplement human lawyers. This is particularly valuable for practises that are looking to incorporate AI in ways that genuinely improve service delivery rather than offering superficial gains.
The Importance of Real-World Task Benchmarking
In the legal profession, tasks like drafting contracts or litigation arguments involve more than just knowledge retrieval—they require deep reasoning, creativity, and the ability to navigate ambiguity. Existing AI benchmarks often fail to evaluate these subtleties, which is why AI Law goes beyond simple multiple-choice questions or structured queries.
The framework evaluates models on tasks related to both litigation and transactional law. These tasks are categorized by practice area and task type, providing a well-rounded understanding of how AI can assist in different types of legal work. For example, litigation tasks may focus on argument generation, while transactional tasks could involve drafting contracts with precise legal terminology.
Such specific evaluations help in determining how AI can be tailored to meet the needs of different legal scenarios. Whether it’s automating document review or assisting in legal research, understanding the strengths and limitations of AI through frameworks like AI Law can inform more effective integrations into legal workflows.
How Does AI Law Evaluate AI Performance?
AI Law evaluates AI models based on two key metrics: Answer Score and Source Score. These scores measure not only the correctness of an AI-generated response but also how well it can justify its answers with relevant legal sources.
Answer Score: This metric assesses how close the AI-generated response is to the quality of work a human lawyer would produce. Factors like accuracy, completeness, and relevance are considered in calculating this score. Instant.Law’s proprietary models have been shown to complete approximately 74% of a final, expert-level lawyer’s work product, indicating that AI is making significant strides but still has room for improvement – and moreso going beyond an expert-level benchmark.
Source Score: This score measures how well the AI can provide traceable sources for its answers, ensuring that the information is reliable and verifiable. This aspect is crucial for legal work, where the accuracy of cited materials is of utmost importance. Instant.Lawyer emphasizes that, while models like ChatGPT can provide answers, they often struggle with sourcing, which is a critical gap in legal AI performance.
At Instant.Lawyer, ensuring that AI not only produces accurate legal documents but also backs them up with credible sources is key to maintaining high standards of trust and reliability in client services.
AI's Future in Legal Services
The results from AI Law highlight the growing potential for AI to augment legal work. Instant.Lawyer's models, for example, outperform general-purpose AI models on domain-specific tasks like those required in the legal field. However, as the framework shows, even the most advanced models still have significant room for growth, especially when it comes to complex, high-stakes legal matters.
Adopting AI-driven solutions powered by frameworks like AI Law offers several advantages:
Improved Efficiency: AI can help lawyers save time on repetitive, routine tasks like document drafting or due diligence, allowing them to focus on more complex legal work.
Enhanced Accuracy: By using AI models trained on real-world legal tasks, the accuracy of work product improves, minimizing human error and increasing the quality of service.
Cost-Effectiveness: With the automation of labor-intensive tasks, law firms can reduce operational costs while offering faster and more efficient service to clients.
As Instant.Lawyer continues to develop its benchmarks and improve AI’s capabilities in legal applications, we stand to benefit from cutting-edge technology that transforms how legal services are delivered.
The composable nature of AI/ML reviewing AI/ML generated models (and having them being reviewed by experts and third parties) offers a radical advancement in the ability to reason, learn and provide feedback - well above what humans could ever achieve on their own. Overlay this with a deep understanding of complex laws, precedents and legislation - and this compound ability starts to create paths we benefit from as a whole.
We are about to enter an exciting era where our learning is compounded by learning.
"People x Tech x People"
A very exciting era indeed.
Stay tuned.
Peter Toumbourou