Multi-Agent Testing
Attest provides first-class support for testing multi-agent systems through hierarchical trace trees, cross-agent assertions, and delegation tracking. All multi-agent assertions are Layer 7 — deterministic, free, and instant.
Trace Tree Model
Section titled “Trace Tree Model”A trace tree represents the execution hierarchy of a multi-agent system. Each agent produces a Trace. When an agent delegates to a sub-agent, the sub-agent’s trace is nested under the parent via an agent_call step with a sub_trace field.
orchestrator (root trace)├── llm_call: plan├── tool_call: fetch_context└── agent_call: researcher (sub_trace) ├── tool_call: search_web └── agent_call: writer (sub_trace) └── tool_call: write_docEach node in the tree is a Trace with its own agent_id, steps, output, and metadata. The tree is constructed automatically when using delegate().
delegate() Context Manager
Section titled “delegate() Context Manager”The delegate() context manager creates a child TraceBuilder linked to the current parent. On exit, it adds an agent_call step to the parent with the child’s built trace.
import attestfrom attest.delegate import delegatefrom attest.trace import TraceBuilderfrom attest.simulation._context import _active_builder
# Inside an agent's run context:parent = TraceBuilder(agent_id="orchestrator")parent.set_input(task="Process refund")
# Set parent as the active buildertoken = _active_builder.set(parent)
with delegate("researcher") as child: child.add_tool_call("search_web", args={"q": "refund policy"}) child.set_output(message="Policy found: 30-day window.")
with delegate("writer") as grandchild: grandchild.add_tool_call("write_doc", args={"title": "Refund Report"}) grandchild.set_output(message="Report drafted.")
parent.set_output(message="Refund processed.")_active_builder.reset(token)
trace = parent.build()How It Works
Section titled “How It Works”delegate(agent_id)reads the activeTraceBuilderfrom_active_buildercontext variable.- Creates a child
TraceBuilderwith the givenagent_idand setsparent_trace_idto the parent’s trace ID. - Sets the child as the new active builder (so nested
delegate()calls chain correctly). - On exit, resets the active builder to the parent and appends an
agent_callstep with the child’s built trace.
Error Handling
Section titled “Error Handling”delegate() raises RuntimeError if called outside an active TraceBuilder context:
# This raises RuntimeError:# "delegate() must be used within an Agent.run() context. No active TraceBuilder found."with delegate("sub-agent") as child: passTraceTree API
Section titled “TraceTree API”Build a TraceTree from an AgentResult or directly from a Trace:
from attest.trace_tree import TraceTree
# From AgentResulttree = result.trace_tree()
# From Trace directlytree = TraceTree(root=trace)Properties and Methods
Section titled “Properties and Methods”agents -> list[str]
Section titled “agents -> list[str]”All agent_id values in the tree, depth-first order:
tree.agents # ["orchestrator", "researcher", "writer"]depth -> int
Section titled “depth -> int”Maximum nesting depth. Root with no sub-agents = 0:
tree.depth # 2 (orchestrator -> researcher -> writer)delegations -> list[tuple[str, str]]
Section titled “delegations -> list[tuple[str, str]]”All (parent_agent_id, child_agent_id) delegation pairs:
tree.delegations# [("orchestrator", "researcher"), ("researcher", "writer")]find_agent(agent_id) -> Trace | None
Section titled “find_agent(agent_id) -> Trace | None”Find a sub-trace by agent_id:
researcher = tree.find_agent("researcher")researcher.output # {"message": "Policy found: 30-day window."}flatten() -> list[Trace]
Section titled “flatten() -> list[Trace]”All traces in depth-first order:
all_traces = tree.flatten()# [orchestrator_trace, researcher_trace, writer_trace]all_tool_calls() -> list[Step]
Section titled “all_tool_calls() -> list[Step]”All tool_call steps across the entire tree:
tools = tree.all_tool_calls()[t.name for t in tools] # ["fetch_context", "search_web", "write_doc"]Aggregate Metrics
Section titled “Aggregate Metrics”Sum metrics across all agents in the tree:
tree.aggregate_tokens # Total tokens across all agentstree.aggregate_cost # Total cost_usd across all agentstree.aggregate_latency # Total latency_ms across all agentsCross-Agent Assertions
Section titled “Cross-Agent Assertions”All multi-agent assertions use TYPE_TRACE_TREE (Layer 7) and are evaluated by the engine’s TraceTreeEvaluator.
agent_called(agent_id)
Section titled “agent_called(agent_id)”Assert a specific agent was invoked somewhere in the trace tree:
expect(result).agent_called("researcher")expect(result).agent_called("writer")delegation_depth(max_depth)
Section titled “delegation_depth(max_depth)”Assert the trace tree does not exceed a maximum nesting depth:
expect(result).delegation_depth(3) # Max 3 levels deepagent_output_contains(agent_id, value)
Section titled “agent_output_contains(agent_id, value)”Assert a sub-agent’s output contains a specific string:
expect(result).agent_output_contains("researcher", "policy found")expect(result).agent_output_contains("writer", "report", case_sensitive=False)cross_agent_data_flow(from_agent, to_agent, field)
Section titled “cross_agent_data_flow(from_agent, to_agent, field)”Assert that a field from one agent’s output appears in another agent’s input:
expect(result).cross_agent_data_flow( from_agent="researcher", to_agent="writer", field="findings",)The engine extracts field from from_agent’s output JSON, serializes it, and checks that the serialized value appears as a substring of to_agent’s input.
follows_transitions(transitions)
Section titled “follows_transitions(transitions)”Assert that all agent delegations in the trace tree match the allowed transition pairs:
expect(result).follows_transitions([ ("orchestrator", "researcher"), ("orchestrator", "writer"), ("researcher", "writer"),])The engine walks the trace tree depth-first, collects every (parent_agent_id, child_agent_id) delegation pair, and verifies each pair exists in the allowed list. Any delegation not in the list causes a failure.
This is useful for enforcing choreography — ensuring agents only delegate to approved sub-agents.
aggregate_cost_under(max_cost)
Section titled “aggregate_cost_under(max_cost)”Assert the total cost across all agents is under a threshold:
expect(result).aggregate_cost_under(0.10)aggregate_tokens_under(max_tokens)
Section titled “aggregate_tokens_under(max_tokens)”Assert the total token usage across all agents is under a threshold:
expect(result).aggregate_tokens_under(5000)Soft Failures
Section titled “Soft Failures”All multi-agent assertions support the soft parameter. Soft failures produce soft_fail status instead of hard_fail, allowing CI to warn without blocking:
expect(result).delegation_depth(2, soft=True)expect(result).aggregate_cost_under(0.05, soft=True)Choreography Validation
Section titled “Choreography Validation”Choreography validation ensures agents interact according to a defined protocol. Combine follows_transitions with other assertions to fully validate multi-agent workflows:
from attest import expect
def test_research_pipeline(result): tree = result.trace_tree()
# Structural validation expect(result).delegation_depth(3) expect(result).agent_called("orchestrator") expect(result).agent_called("researcher") expect(result).agent_called("writer")
# Choreography — only allowed delegation paths expect(result).follows_transitions([ ("orchestrator", "researcher"), ("orchestrator", "writer"), ("researcher", "writer"), ])
# Data flow — researcher findings reach writer expect(result).cross_agent_data_flow( from_agent="researcher", to_agent="writer", field="findings", )
# Output quality expect(result).agent_output_contains("writer", "report")
# Cost governance expect(result).aggregate_cost_under(0.10) expect(result).aggregate_tokens_under(5000)Complete Example
Section titled “Complete Example”End-to-end multi-agent test with simulation, delegation, and assertions:
import attestfrom attest.delegate import delegatefrom attest.trace import TraceBuilderfrom attest.result import AgentResultfrom attest.simulation import scenario, repeat, FRIENDLY_USER, MockToolRegistryfrom attest.simulation._context import _active_builderfrom attest import expect
def run_research_pipeline(query: str) -> AgentResult: """Simulate a 3-agent research pipeline.""" parent = TraceBuilder(agent_id="orchestrator") parent.set_input(query=query) token = _active_builder.set(parent)
try: parent.add_llm_call("plan", result={"plan": "research then write"})
with delegate("researcher") as researcher: researcher.add_tool_call( "search_web", args={"q": query}, result={"findings": "Test frameworks improve reliability."}, ) researcher.set_output( message="Research complete.", findings="Test frameworks improve reliability.", )
with delegate("writer") as writer: writer.add_tool_call( "write_doc", args={"title": "Report", "content": "Test frameworks improve reliability."}, ) writer.set_output(message="Report drafted successfully.")
parent.set_output(message="Pipeline complete. Report ready.") parent.set_metadata(total_tokens=1500, cost_usd=0.015, latency_ms=3000) finally: _active_builder.reset(token)
return AgentResult(trace=parent.build())
@repeat(n=5)@scenario(persona=FRIENDLY_USER, seed=42)def test_research_pipeline(persona): result = run_research_pipeline("AI testing frameworks")
# Layer 7: Multi-agent assertions expect(result).agent_called("researcher") expect(result).agent_called("writer") expect(result).delegation_depth(2) expect(result).follows_transitions([ ("orchestrator", "researcher"), ("orchestrator", "writer"), ]) expect(result).agent_output_contains("writer", "report") expect(result).aggregate_cost_under(0.10)
return result
# Run the testrepeat_result = test_research_pipeline()assert repeat_result.all_passed
# Inspect the trace tree from any individual runtree = repeat_result.results[0].trace_tree()print(f"Agents: {tree.agents}")print(f"Delegations: {tree.delegations}")print(f"Tool calls: {[t.name for t in tree.all_tool_calls()]}")print(f"Depth: {tree.depth}")print(f"Total cost: ${tree.aggregate_cost:.4f}")Expected output:
Agents: ['orchestrator', 'researcher', 'writer']Delegations: [('orchestrator', 'researcher'), ('orchestrator', 'writer')]Tool calls: ['search_web', 'write_doc']Depth: 1Total cost: $0.0150Temporal Assertions
Section titled “Temporal Assertions”Temporal assertions verify the execution order and timing of agents within a trace tree. They are Layer 7 — deterministic, free, and instant. Each assertion requires that Step objects carry started_at and ended_at timestamps (populated automatically by framework adapters and TraceBuilder methods that accept timing arguments).
agent_ordered_before(a, b)
Section titled “agent_ordered_before(a, b)”Assert that agent a completed before agent b started. Use this to enforce sequential execution guarantees.
from attest import expect
# researcher must complete before writer beginsexpect(result).agent_ordered_before("researcher", "writer")Fails if:
- Either agent is not present in the trace tree.
researcher.ended_at > writer.started_at(they overlapped or writer ran first).
agents_overlap(a, b)
Section titled “agents_overlap(a, b)”Assert that agents a and b executed concurrently — their time ranges intersect. Use this to verify parallel execution in fan-out pipelines.
# data-fetcher and context-loader must run in parallelexpect(result).agents_overlap("data-fetcher", "context-loader")Fails if:
- Either agent is not present in the trace tree.
- The agents ran sequentially with no overlap.
agent_wall_time_under(agent_id, max_ms)
Section titled “agent_wall_time_under(agent_id, max_ms)”Assert that a specific agent’s wall-clock duration is under max_ms milliseconds. Use this to enforce per-agent performance budgets.
# researcher must complete within 3 secondsexpect(result).agent_wall_time_under("researcher", max_ms=3000)
# writer must complete within 2 seconds (soft — warn, don't block CI)expect(result).agent_wall_time_under("writer", max_ms=2000, soft=True)Wall time is ended_at - started_at for the agent’s root trace.
ordered_agents(groups)
Section titled “ordered_agents(groups)”Assert that agents executed in a defined pipeline order. groups is a list of lists; agents within the same group ran concurrently, and each group completed before the next group started.
# Sequential pipeline: orchestrator → [researcher, context-loader] → writerexpect(result).ordered_agents([ ["orchestrator"], ["researcher", "context-loader"], # these two run in parallel ["writer"],])This combines ordering and overlap assertions in a single call:
- Agents within a group must all overlap each other (or be the only member).
- The last agent in group
nmust finish before the first agent in groupn+1starts.
Temporal Assertion Example
Section titled “Temporal Assertion Example”End-to-end test combining structural, choreography, and temporal assertions:
from attest import expectfrom attest.simulation import scenario, repeat, FRIENDLY_USER
@repeat(n=3)@scenario(persona=FRIENDLY_USER, seed=42)def test_pipeline_timing(persona): result = run_research_pipeline("AI testing frameworks")
# Structural expect(result).agent_called("researcher") expect(result).agent_called("context-loader") expect(result).agent_called("writer") expect(result).delegation_depth(2)
# Choreography expect(result).follows_transitions([ ("orchestrator", "researcher"), ("orchestrator", "context-loader"), ("orchestrator", "writer"), ])
# Temporal — execution order expect(result).agent_ordered_before("orchestrator", "researcher") expect(result).agent_ordered_before("orchestrator", "context-loader") expect(result).agent_ordered_before("researcher", "writer") expect(result).agent_ordered_before("context-loader", "writer")
# Temporal — parallel execution expect(result).agents_overlap("researcher", "context-loader")
# Temporal — performance budgets expect(result).agent_wall_time_under("researcher", max_ms=5000) expect(result).agent_wall_time_under("writer", max_ms=3000)
# Temporal — pipeline ordering (concise form) expect(result).ordered_agents([ ["orchestrator"], ["researcher", "context-loader"], ["writer"], ])
# Cost governance expect(result).aggregate_cost_under(0.10)
return result
repeat_result = test_pipeline_timing()assert repeat_result.all_passed