System Architecture
Attest is a testing framework for AI agents built on a three-layer architecture: language-specific SDKs capture agent execution as traces, a Go engine evaluates assertions against those traces, and adapters bridge the gap between LLM providers or agent frameworks and the canonical trace format.
Architecture Diagram
Section titled “Architecture Diagram”flowchart TB subgraph SDKs["Language SDKs"] direction LR PY["Python SDK<br/><code>attest-ai</code>"] TS["TypeScript SDK<br/><code>@attest-ai/core</code>"] end
subgraph Adapters["Trace Capture Adapters"] direction LR subgraph Providers["Provider Adapters"] OAI[OpenAI] ANT[Anthropic] GEM[Gemini] OLL[Ollama] end subgraph Frameworks["Framework Adapters"] LC[LangChain] LI[LlamaIndex] ADK[Google ADK] CR[CrewAI] end subgraph Special["Special"] MAN[Manual] OTEL[OTel] end end
subgraph Engine["Go Engine (subprocess)"] direction TB INIT["initialize"] EVAL["evaluate_batch"] PIPE["8-Layer Assertion Pipeline"] SHUT["shutdown"] INIT --> EVAL --> PIPE --> SHUT end
subgraph Pipeline["Assertion Layers"] direction LR L1["L1 Schema"] L2["L2 Constraint"] L3["L3 Trace"] L4["L4 Content"] L5["L5 Embedding"] L6["L6 Judge"] L7["L7 Trace Tree"] L8["L8 Plugin"] end
Adapters --> SDKs SDKs -->|"JSON-RPC 2.0<br/>NDJSON / stdio"| Engine Engine --> PipelineDesign Principles
Section titled “Design Principles”Seven principles govern the framework’s design and API surface.
1. Test Outcomes, Not Paths
Section titled “1. Test Outcomes, Not Paths”Agents are non-deterministic. The same input can produce different tool call sequences that are all correct. Attest asserts on what was achieved and what constraints were respected, not on the specific execution path.
# Asserts that both tools were called and that eligibility# was checked before processing — regardless of other orderingexpect(result).to_call_tool("lookup_order")expect(result).to_call_tool("check_eligibility")expect(result).tool_called_before("check_eligibility", "process_refund")2. Cheapest Valid Assertion First
Section titled “2. Cheapest Valid Assertion First”The 8-layer pipeline is ordered by cost. Layers 1-4 are free and deterministic. Layer 5 costs ~$0.001. Layer 6 costs ~$0.01+. The API is designed so developers naturally reach for cheaper layers first.
In a well-engineered agent, 60-70% of the testable surface is deterministic — tool call schemas, guardrail enforcement, structured output validation, state machine transitions, cost budgets. LLM-as-judge is the last resort, not the default.
3. Soft Failures Are First-Class
Section titled “3. Soft Failures Are First-Class”LLM-based assertions produce scores, not booleans. Attest introduces soft failures:
| Score Range | Classification | CI Behavior |
|---|---|---|
< 0.5 | Hard fail | Block merge |
0.5 - 0.8 | Soft fail | Warn, allow merge if within budget |
> 0.8 | Pass | Continue |
Soft failure budgets are configurable per test suite and per CI pipeline.
4. Code-First, Not Config-First
Section titled “4. Code-First, Not Config-First”Tests are written in code (Python or TypeScript), not YAML or UI dashboards. The test file is the source of truth — version-controlled, reviewed in PRs, run in CI.
async def test_processes_eligible_refund(result): expect(result).output_matches_schema(refund_schema) expect(result).to_call_tool("process_refund") expect(result).output_contains("ORD-123456") expect(result).cost_under(0.10)5. Simulation Is Built-In
Section titled “5. Simulation Is Built-In”Set ATTEST_SIMULATION=1 to run tests without an engine binary or real LLM API calls. The SDK returns deterministic pass results for all assertions, enabling rapid iteration and CI pipeline validation without provider credentials.
6. Framework-Agnostic
Section titled “6. Framework-Agnostic”The integration surface is the trace. Any system that can produce a trace in Attest’s canonical format is testable. Adapters handle the translation from provider-specific or framework-specific events into the canonical Trace structure.
7. Cost Is a Test Metric
Section titled “7. Cost Is a Test Metric”Every test run tracks token consumption, API costs, and latency. Cost assertions sit alongside correctness assertions:
expect(result).cost_under(0.05)expect(result).total_tokens_under(3000)Component Overview
Section titled “Component Overview”Language SDKs
Section titled “Language SDKs”| SDK | Package | Runtime | Status |
|---|---|---|---|
| Python | attest-ai (PyPI) | Python 3.10+ | Stable (v0.4.2) |
| TypeScript | @attest-ai/core (npm) | Node 18+ | Stable (v0.4.2) |
Both SDKs provide the same core API surface: expect() fluent DSL, TraceBuilder, TraceTree, adapter base classes, tier system, and engine lifecycle management via EngineManager.
Go Engine
Section titled “Go Engine”A statically-compiled Go binary (attest-engine) that runs as a subprocess. The SDK communicates with it over JSON-RPC 2.0 transported as NDJSON over stdin/stdout.
The engine handles all assertion evaluation — from JSON Schema validation (Layer 1) through LLM-as-judge scoring (Layer 6). This architecture separates the hot evaluation path (Go) from the developer-facing API (Python/TypeScript).
Binary discovery chain:
ATTEST_ENGINE_PATHenvironment variablePATHlookup~/.attest/bin/shared cache (version-checked)- Monorepo dev layout
- Local
./bin/ - Auto-download from GitHub Releases
Assertion Pipeline
Section titled “Assertion Pipeline”Eight layers evaluated in order, cheapest first. Layers 1-4 and 7 are free and deterministic. Layer 5 uses embeddings ($0.001/call). Layer 6 uses an LLM judge ($0.01+/call). Layer 8 runs custom plugin logic.
See Assertion Pipeline for the full deep dive.
Adapter System
Section titled “Adapter System”Two tiers of adapters capture traces from different integration points:
- Provider adapters (OpenAI, Anthropic, Gemini, Ollama) wrap LLM client SDKs to capture individual API calls, token counts, and tool use.
- Framework adapters (LangChain, LlamaIndex, Google ADK, CrewAI) hook into agent orchestration frameworks to capture tool call sequences, agent delegation trees, and multi-step reasoning.
See Adapter System for the full architecture.
Trace Data Model
Section titled “Trace Data Model”The Trace is the central data structure. Every agent interaction produces a trace containing input, execution steps, output, and metadata.
classDiagram class Trace { +str trace_id +int schema_version +str agent_id +dict input +list~Step~ steps +dict output +TraceMetadata metadata +str parent_trace_id }
class Step { +str type +str name +dict args +dict result +Trace sub_trace +dict metadata +int started_at_ms +int ended_at_ms +str agent_id +str agent_role }
class TraceMetadata { +int total_tokens +float cost_usd +int latency_ms +str model +str timestamp }
Trace "1" --> "*" Step : steps Trace "1" --> "0..1" TraceMetadata : metadata Step "0..1" --> "0..1" Trace : sub_traceStep Types
Section titled “Step Types”| Type | Constant | Description |
|---|---|---|
| LLM Call | llm_call | A call to an LLM provider API |
| Tool Call | tool_call | Execution of a tool/function |
| Retrieval | retrieval | Document or vector search operation |
| Agent Call | agent_call | Delegation to a sub-agent (contains sub_trace) |
Temporal Fields
Section titled “Temporal Fields”Steps carry optional temporal metadata for multi-agent analysis:
started_at_ms/ended_at_ms— wall-clock timestamps for ordering and overlap detectionagent_id/agent_role— stable identifiers for agent-level assertions
Provider adapters populate timestamps from wall-clock around LLM calls. Framework adapters populate timestamps and agent identity from framework event metadata.
Tier System
Section titled “Tier System”Tests are tagged with tiers that map to assertion cost:
| Tier | Layers | Cost | Use Case |
|---|---|---|---|
TIER_1 | L1-L4, L7 | Free | Schema, constraint, trace, content, trace tree |
TIER_2 | L5 | ~$0.001/assertion | Embedding similarity |
TIER_3 | L6 | ~$0.01+/assertion | LLM-as-judge scoring |
from attest import tier, TIER_1, TIER_2, TIER_3
@tier(TIER_1)async def test_schema_compliance(result): expect(result).output_matches_schema(schema)
@tier(TIER_3)async def test_response_empathy(result): expect(result).judge_score("empathy", above=0.7)Run only free tests in development, escalate to full suite in CI:
# Development: fast, free tests onlyATTEST_MAX_TIER=1 pytest -m attest
# CI pipeline: full suite including LLM judgepytest -m attestMulti-SDK Strategy
Section titled “Multi-SDK Strategy”Both SDKs share the same engine binary and protocol. The engine is language-agnostic — any client that speaks JSON-RPC 2.0 over NDJSON/stdio can drive it.
flowchart LR PY["Python SDK"] -->|"JSON-RPC 2.0"| ENG["Go Engine<br/>v0.4.0"] TS["TypeScript SDK"] -->|"JSON-RPC 2.0"| ENG FUTURE["Future SDKs<br/>(Go, Rust, ...)"] -.->|"JSON-RPC 2.0"| ENGSDK versions and engine versions are decoupled. Both SDKs at v0.4.2 work with engine v0.4.0. The engine binary is cached in ~/.attest/bin/ with a .engine-version marker file for version validation.