Python Expect DSL
Fluent assertion API for testing AI agent outputs across 8 validation layers.
Overview
Section titled “Overview”The ExpectChain class provides a fluent interface for chaining assertions. Each assertion returns the chain for further chaining.
from attest import expect
result = agent.run("question")
# Assertions can be chained(expect(result) .output_contains("expected") .cost_under(0.05) .latency_under(2000) .passes_judge("Is correct?"))Basic Usage
Section titled “Basic Usage”Creating an Expect Chain
Section titled “Creating an Expect Chain”from attest import expect
result = agent_function()chain = expect(result)The result object contains:
output— The agent’s text outputcost— Token cost in dollarslatency_ms— Execution time in millisecondstrace— Full execution tracemetadata— Custom metadata
Assertion Layers
Section titled “Assertion Layers”Attest assertions work across 8 layers:
| Layer | Methods | What it validates |
|---|---|---|
| 1. Schema | matches_schema() | JSON schema validation |
| 2. Constraints | cost_under(), latency_under() | Performance metrics |
| 3. Trace | trace_contains_model(), trace_contains_tool() | Execution path |
| 4. Content | output_contains(), output_matches() | Text content |
| 5. Embedding | semantically_similar_to() | Semantic meaning |
| 6. LLM Judge | passes_judge() | Domain-specific evaluation |
| 7. Trace Tree | trace_tree_valid(), tool_calls_valid() | Execution structure |
| 8. Simulation | all_pass(), success_rate_above() | Multi-agent scenarios |
Layer 1: Schema Validation
Section titled “Layer 1: Schema Validation”Validate output structure against a schema.
expect(result).matches_schema({ "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "number"} }, "required": ["name", "age"]})Layer 2: Constraints
Section titled “Layer 2: Constraints”Check performance metrics and limits.
Cost Constraints
Section titled “Cost Constraints”# Cost in dollarsexpect(result).cost_under(0.10)expect(result).cost_equals(0.05)expect(result).cost_between(0.01, 0.10)Latency Constraints
Section titled “Latency Constraints”# Latency in millisecondsexpect(result).latency_under(5000)expect(result).latency_equals(1000)expect(result).latency_between(100, 5000)Layer 3: Trace Content
Section titled “Layer 3: Trace Content”Inspect what models and tools the agent used.
# Check model usageexpect(result).trace_contains_model("gpt-4o-mini")expect(result).trace_contains_model("claude-3-sonnet")
# Check tool usageexpect(result).trace_contains_tool("google_search")expect(result).trace_contains_tool("calculator")
# Verify no unexpected toolsexpect(result).trace_contains_only_tools(["calculator", "wikipedia"])Layer 4: Content Matching
Section titled “Layer 4: Content Matching”Check the output text.
Exact Content
Section titled “Exact Content”# Contains substringexpect(result).output_contains("hello")
# Does not containexpect(result).output_not_contains("error")
# Exact matchexpect(result).output_equals("exact output")
# Case insensitiveexpect(result).output_contains("HELLO", case_sensitive=False)Pattern Matching
Section titled “Pattern Matching”import re
# Regex patternexpect(result).output_matches(r"^\d{4}-\d{2}-\d{2}$") # Date format
# Contains all substringsexpect(result).output_contains_all(["hello", "world"])
# Contains any substringexpect(result).output_contains_any(["yes", "correct"])
# Starts/ends withexpect(result).output_starts_with("The")expect(result).output_ends_with("?")Word Count
Section titled “Word Count”expect(result).word_count_equals(100)expect(result).word_count_between(50, 200)expect(result).word_count_under(500)Layer 5: Semantic Similarity
Section titled “Layer 5: Semantic Similarity”Check semantic meaning using embeddings.
# Semantically similar to reference textexpect(result).semantically_similar_to( "This is a greeting", threshold=0.85)
# Semantically different from referenceexpect(result).semantically_different_from( "This is an error message", threshold=0.85)Layer 6: LLM-as-Judge
Section titled “Layer 6: LLM-as-Judge”Use an LLM to evaluate domain-specific correctness.
Basic Judge
Section titled “Basic Judge”expect(result).passes_judge( prompt="Is this response helpful?")With Custom Model
Section titled “With Custom Model”expect(result).passes_judge( prompt="Is the math correct?", model="gpt-4o", scoring="binary" # binary, scale_0_10, or enum)Multiple Judges
Section titled “Multiple Judges”expect(result).passes_judges([ ("Is this helpful?", "gpt-4o-mini"), ("Is this accurate?", "gpt-4o"),])Judge with Rubric
Section titled “Judge with Rubric”expect(result).passes_judge( prompt="Grade this response", rubric={ "clarity": "Is the explanation clear?", "accuracy": "Are the facts correct?", "completeness": "Does it answer fully?" }, threshold=0.8)Layer 7: Trace Tree Validation
Section titled “Layer 7: Trace Tree Validation”Validate the structure of the execution trace.
# Verify trace structure is validexpect(result).trace_tree_valid()
# Verify specific tool callsexpect(result).tool_calls_valid()
# Check tool call countexpect(result).tool_call_count_equals(3)expect(result).tool_call_count_between(1, 5)
# Verify no infinite loopsexpect(result).trace_depth_under(10)Layer 8: Simulation Results
Section titled “Layer 8: Simulation Results”Validate multi-agent scenario results.
from attest import simulate
scenario = simulate.scenario()scenario.add_agent(agent1)scenario.add_agent(agent2)results = scenario.run(repeat=5)
# All agents passedexpect(results).all_pass()
# Success rateexpect(results).success_rate_above(0.95)expect(results).success_rate_equals(1.0)
# Average metricsexpect(results).avg_cost_under(0.10)expect(results).avg_latency_under(2000)
# Agent-specificexpect(results).agent_success_rate("agent_1", above=0.90)Soft Failures
Section titled “Soft Failures”Collect all failures instead of stopping at the first one.
from attest import soft_fail
with soft_fail(): expect(result).output_contains("hello") # Failure recorded expect(result).cost_under(0.01) # Failure recorded expect(result).passes_judge("...") # Still executes # After context, all 3 failures reportedError Messages
Section titled “Error Messages”When an assertion fails, you get detailed error information:
✗ Assertion failed: output_contains("goodbye") Expected output to contain: goodbye Actual output: hello world Suggestion: Check if the prompt was clearCommon Patterns
Section titled “Common Patterns”Soft Failures
Section titled “Soft Failures”Continue testing after failures to collect all issues:
from attest import soft_fail
with soft_fail(): expect(result).output_contains("hello") # May fail expect(result).cost_under(0.01) # May fail # Both will run, collecting all failuresCustom Judges
Section titled “Custom Judges”Use LLM evaluation for semantic correctness:
(expect(result) .passes_judge( prompt="Is the response grammatically correct?", model="gpt-4o", scoring="binary" # binary, scale_0_10, or enum ))Framework Integration
Section titled “Framework Integration”Test agents built with popular frameworks:
from attest.adapters import langchain, crewai, llamaindex
# LangChain agentsagent = langchain.create_agent(...)
# CrewAI taskstask = crewai.create_task(...)
# LlamaIndex query enginesengine = llamaindex.create_query_engine(...)Related
Section titled “Related”- Python Adapters Reference — Provider integrations
- Framework Adapters Guide — Adapter architecture