Concept · Provenance and grounding

Every claim, traceable to a page and a bounding box.

Source traceability is what separates a usable legal-AI answer from a liability. KAOS treats it as an architectural property rather than a per-tool feature. The page number, bounding box, character offsets, and extractor confidence captured when a parser reads a source are preserved through retrieval, into the language-model call, and into the typed answer the model returns — so a downstream reviewer can walk a claim back to the line that justified it.

The problem worth solving

Most platforms keep source-location metadata on parsed content. The link breaks the moment a language model reads that content and writes a new output. The prompt reads the text; the response comes back as plain JSON with no back-reference. Re-establishing the link is either prompt-engineered per tool — fragile and duplicative — or built into the architecture.

KAOS builds it in. Every element in the document tree carries a typed source-location record. Every search hit carries a pointer back to the element it matched. Every language-model output that needs source grounding is declared with a typed wrapper that requires the model to attach the source spans it relied on, and a typed "insufficient evidence" state for when it can't. Every legal citation extracted from text is then verified against the actual source body — first by exact substring match, then by a textual-entailment model when the wording shifts. The trail does not break.

Metadata accumulates at every layer

Each layer adds without losing what came before. By the time an answer reaches the reader, every claim points at a specific page of a specific source.

Verifiable answers sit at the intersection

A grounded answer requires evidence from all three sides — a parser that recorded the source location, a retriever that kept the link, and a language-model output type that demands cited spans. Refusal when evidence is insufficient is a typed state, not a prompt-engineering hope.

What it looks like in code

A grounded answer is a typed value. Either the model has enough source evidence to justify the claim and returns it with the cited spans attached, or it returns an explicit insufficient-evidence value with the spans it tried. There is no third state.

from kaos_llm_core.signatures import (
    Answer, GroundedAnswer, InputField, InsufficientEvidence,
    OutputField, Signature,
)
from kaos_llm_core.programs import Call


class ChangeOfControlTrigger(Signature):
    """Identify the change-of-control trigger in the supplied contract."""
    question: str = InputField()
    contract: str = InputField()
    result: GroundedAnswer[str] = OutputField()


call = Call(ChangeOfControlTrigger, model="anthropic:claude-haiku-4-5")
result = await call(
    question="What event triggers the counterparty's termination right on a change of control?",
    contract=msa_text,
)

match result.result:
    case Answer(value=trigger, claims=claims, confidence=conf):
        print(f"trigger: {trigger}  (confidence {conf:.2f})")
        for claim in claims:
            for span in claim.supporting_spans:
                start, end = span.char_span
                print(f"  cited: {span.source_uri} @ {start}-{end}")
                print(f"  quote: {span.quote!r}")
    case InsufficientEvidence(reason=reason, missing=missing):
        print(f"insufficient evidence: {reason}")
        print(f"would resolve with: {missing}")