Surface · LLM

One way to call any major model — with cost, citations, and types.

One Python interface speaks to OpenAI, Anthropic, Google, xAI, Groq, Mistral, and OpenRouter. Above it sits a typed-program layer: declare what comes in and what goes out, get back validated objects with cost, token usage, and source citations attached. Tune prompts and few-shot examples against a metric instead of editing strings by hand.

Terminal window
pip install kaos-llm-client kaos-llm-core

Typed programs, not raw prompts

A Signature declares the inputs and outputs of an LLM call as a Pydantic model. The docstring becomes the instruction; the field types do the validation; you never parse free-text JSON by hand. A Call is one validated invocation; a Program chains several together; an Optimizer searches over instructions and few-shot examples until a metric — accuracy, cost, refusal rate — improves.

The composition library ships the patterns most legal-AI work needs: a single call, chain-of-thought, best-of-N sampling with a selector, ensembles voting across models, a judge for evaluative scoring, retrieval-augmented generation, a Grounded pattern that returns claims with citations, and a program-of-thought form for code-aided reasoning. The optimizer family ports the canonical algorithms (Bootstrap, Instruction, MIPROv2) plus reflective, Pareto-multi-objective, and a model-selection optimizer for choosing cheaper models when accuracy holds.

What's different from DSPy: every answer carries provenance. Cost, token usage, and step-by-step traces are pulled through every layer. The same typed programs are exposed as MCP tools so an agent like Claude Code can invoke them over the wire. The full head-to-head with DSPy is in the kaos-llm-core repository.

One API, every major model provider

create_client("provider:model") returns a typed client; chat, chat_stream_async, json, pydantic, and embed work the same way across OpenAI, Anthropic, Google, xAI, Groq, Mistral, and OpenRouter.

OpenAI Anthropic Google xAI Groq Mistral OpenRouter kaos-llm-client one async API

Composition patterns ready to use

Mix any pattern with any optimizer against any supported provider. Verified against kaos_llm_core/programs/ and kaos_llm_core/optimization/.

Call one invocation ChainOfThought native thinking ReAct reason → tool → loop BestOfN parallel + selector Ensemble multi-model vote Refine produce → critique RAG retrieve → ground Judge evaluative scoring Grounded verifiable claims

A taste

A typed Signature executed as a Call: read an indemnification clause, decide whether the cap and basket leave the buyer over-exposed, and return a structured opinion the diligence memo can quote. Swap Call for ChainOfThought to add native thinking; for Ensemble to vote across three models; or run BootstrapOptimizer.optimize(call, train, val) against a labelled set of past deals to lift accuracy without touching the Signature.

from kaos_llm_core import Call, InputField, OutputField, Signature
class IndemnificationReview(Signature):
"""Review an SPA indemnification clause for buyer-side risk."""
clause_text: str = InputField(description="The indemnification clause, verbatim")
deal_value: float = InputField(description="Total deal consideration in USD")
cap: str = OutputField(description="Stated cap, or 'none' if uncapped")
basket: str = OutputField(description="Basket / deductible, or 'none'")
survival: str = OutputField(description="Survival period for general reps")
risk_flags: list[str] = OutputField(description="Issues a buyer-side reviewer should raise")
call = Call(IndemnificationReview, model="anthropic:claude-haiku-4-5")
result = await call(
clause_text=open("article-IX.txt").read(),
deal_value=850_000_000,
)
for flag in result.risk_flags:
print("•", flag)

How it compares

vs. DSPy. kaos-llm-core is the typed-program layer of KAOS, and a deliberate philosophical descendant of Stanford DSPy. The two abstractions every DSPy user recognizes — Signature and Program — are first-class here, and the optimizer family ports the canonical algorithms (Bootstrap, Instruction, MIPROv2). What's different: a pickle-free schema-validated JSON program format, an MCP server so the typed programs are agent-callable, a batch runner with a checkpointed workspace, and per-step provenance through Cited[T]. Pick DSPy for GEPA, SIMBA, and multi-predictor MIPROv2; pick kaos-llm-core for crash-safe batch over thousands of inputs with cost caps. The full head-to-head is in the kaos-llm-core repository.

vs. Pydantic-AI, Mirascope, Instructor. Those frameworks own one layer (typed structured-output prompting); kaos-llm-core owns two — a transport layer (kaos-llm-client, roughly DSPy's dspy.LM scope) and a programming layer (Signatures + Programs + optimizers + MCP server + batch + workspace). Useful when the program is multi-step, when batch matters, or when MCP-served typed programs are the deployment shape.

vs. proprietary "legal-tuned LLM" claims. kaos-llm-core is provider-agnostic by design. Bring your own model. The grounding and verification machinery does not depend on which weights answered. See /compare.

Get started

See the quickstart, browse all 18 packages, or read the docs.