Concept · Provenance and grounding

Every claim, traceable to a page and a bounding box.

Source traceability is what separates a usable legal-AI answer from a liability. KAOS treats it as an architectural property rather than a per-tool feature. The page number, bounding box, character offsets, and extractor confidence captured when a parser reads a source are preserved through retrieval, into the language-model call, and into the typed answer the model returns — so a downstream reviewer can walk a claim back to the line that justified it.

The problem worth solving

Most platforms keep source-location metadata on parsed content. The link breaks the moment a language model reads that content and writes a new output. The prompt reads the text; the response comes back as plain JSON with no back-reference. Re-establishing the link is either prompt-engineered per tool — fragile and duplicative — or built into the architecture.

KAOS builds it in. Every element in the document tree carries a typed source-location record. Every search hit carries a pointer back to the element it matched. Every language-model output that needs source grounding is declared with a typed wrapper that requires the model to attach the source spans it relied on, and a typed "insufficient evidence" state for when it can't. Every legal citation extracted from text is then verified against the actual source body — first by exact substring match, then by a textual-entailment model when the wording shifts. The trail does not break.

Metadata accumulates at every layer

Each layer adds without losing what came before. By the time an answer reaches the reader, every claim points at a specific page of a specific source.

Extraction page number, bounding box, character offsets, extractor confidence Retrieval plus the pointer back to the matched element and a relevance score Language-model call plus the typed cited-answer wrapper that carries the source spans Verification plus a verdict and the method — exact substring or textual entailment

Verifiable answers sit at the intersection

A grounded answer requires evidence from all three sides — a parser that recorded the source location, a retriever that kept the link, and a language-model output type that demands cited spans. Refusal when evidence is insufficient is a typed state, not a prompt-engineering hope.

Extraction page + bounding box Retrieval pointer + score Language-model call cited answer tree-grounded retrieval extracted and cited retrieval with grounding verifiable grounded answer

What it looks like in code

A grounded answer is a typed value. Either the model has enough source evidence to justify the claim and returns it with the cited spans attached, or it returns an explicit insufficient-evidence value with the spans it tried. There is no third state.

from kaos_llm_core.signatures import (
Answer, GroundedAnswer, InputField, InsufficientEvidence,
OutputField, Signature,
)
from kaos_llm_core.programs import Call
class ChangeOfControlTrigger(Signature):
"""Identify the change-of-control trigger in the supplied contract."""
question: str = InputField()
contract: str = InputField()
result: GroundedAnswer[str] = OutputField()
call = Call(ChangeOfControlTrigger, model="anthropic:claude-haiku-4-5")
result = await call(
question="What event triggers the counterparty's termination right on a change of control?",
contract=msa_text,
)
match result.result:
case Answer(value=trigger, claims=claims, confidence=conf):
print(f"trigger: {trigger} (confidence {conf:.2f})")
for claim in claims:
for span in claim.supporting_spans:
start, end = span.char_span
print(f" cited: {span.source_uri} @ {start}-{end}")
print(f" quote: {span.quote!r}")
case InsufficientEvidence(reason=reason, missing=missing):
print(f"insufficient evidence: {reason}")
print(f"would resolve with: {missing}")

Read next

The legal-intelligence page shows how the citations module closes the loop, matching extracted citations against the source by substring first and a textual-entailment model when the wording shifts. The benchmark coverage card lists the legal-AI tasks source traceability supports.

On learn-kaos: grounding-and-verification · the-audit-trail.