Migrating from Manual Testing
Automate agent QA with Attest’s assertion framework.
Why Automate?
Section titled “Why Automate?”Manual QA is slow and error-prone. Attest lets you encode QA checklists as executable tests:
- Repeatability — Run the same checks instantly
- Regression detection — Catch changes after model updates
- CI/CD integration — Fail builds if quality drops
- Historical tracking — See quality trends over time
- Cost visibility — Monitor token usage and latency
Common Manual Checks to Attest Assertions
Section titled “Common Manual Checks to Attest Assertions”Check 1: Output Contains Specific Text
Section titled “Check 1: Output Contains Specific Text”Manual QA Checklist
Section titled “Manual QA Checklist”[ ] Response mentions the answer[ ] Response doesn't mention errors[ ] Response is not emptyAttest Assertions
Section titled “Attest Assertions”from attest import expect
result = agent.run("question")
(expect(result) .output_contains("answer") .output_not_contains("error") .output_not_contains(""))Check 2: Response Structure
Section titled “Check 2: Response Structure”Manual QA Checklist
Section titled “Manual QA Checklist”[ ] Response is valid JSON[ ] Contains required fields (name, email)[ ] No null values in important fieldsAttest Assertions
Section titled “Attest Assertions”from attest import expectimport json
result = agent.run("Generate user")
(expect(result) .matches_schema({ "type": "object", "properties": { "name": {"type": "string"}, "email": {"type": "string"} }, "required": ["name", "email"] }))Check 3: Performance Metrics
Section titled “Check 3: Performance Metrics”Manual QA Checklist
Section titled “Manual QA Checklist”[ ] Response completes in under 5 seconds[ ] API costs less than $0.10[ ] Uses correct model/providerAttest Assertions
Section titled “Attest Assertions”from attest import expect
result = agent.run("question")
(expect(result) .latency_under(5000) .cost_under(0.10) .trace_contains_model("gpt-4o-mini"))Check 4: Semantic Correctness
Section titled “Check 4: Semantic Correctness”Manual QA Checklist
Section titled “Manual QA Checklist”[ ] Answer is relevant to question[ ] Answer is grammatically correct[ ] Answer is coherent and well-writtenAttest Assertions
Section titled “Attest Assertions”from attest import expect
result = agent.run("question")
(expect(result) .semantically_similar_to("expected answer", threshold=0.85) .passes_judge("Is this grammatically correct?") .passes_judge("Is this coherent and helpful?"))Checklist to Test Suite Conversion
Section titled “Checklist to Test Suite Conversion”Step 1: List All Manual Checks
Section titled “Step 1: List All Manual Checks”Document what you currently check:
Response Quality: [ ] Contains answer to question [ ] No error messages [ ] Proper capitalization
Performance: [ ] Completes in <5s [ ] Costs <$0.10 [ ] Uses GPT-4o-mini
Semantic: [ ] Answers the actual question [ ] Grammatically correct [ ] Professional toneStep 2: Map to Assertions
Section titled “Step 2: Map to Assertions”Convert each check to an assertion:
from attest import expect
result = agent.run("What is 2+2?")
# Response Quality(expect(result) .output_contains("4") .output_not_contains("error") .output_starts_with(result.output[0].upper()))
# Performanceexpect(result).latency_under(5000)expect(result).cost_under(0.10)expect(result).trace_contains_model("gpt-4o-mini")
# Semantic(expect(result) .semantically_similar_to("The answer is 4") .passes_judge("Is this grammatically correct?") .passes_judge("Does this sound professional?"))Step 3: Organize into Test Cases
Section titled “Step 3: Organize into Test Cases”Group assertions by scenario:
def test_math_question(): """Agent should answer math questions.""" result = agent.run("What is 2+2?")
(expect(result) .output_contains("4") .output_not_contains("error") .latency_under(5000) .cost_under(0.10) .passes_judge("Is the answer correct?"))
def test_history_question(): """Agent should answer history questions.""" result = agent.run("When was the Declaration of Independence?")
(expect(result) .output_contains("1776") .passes_judge("Is this historically accurate?") .semantically_similar_to("July 4, 1776"))Step 4: Run Tests Locally
Section titled “Step 4: Run Tests Locally”python -m pytest test_agent.py -vStep 5: Integrate into CI/CD
Section titled “Step 5: Integrate into CI/CD”Add to GitHub Actions:
name: Agent QA
on: [push, pull_request]
jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v4 - run: pip install attest-ai pytest - run: pytest test_agent.pyComplete Example
Section titled “Complete Example”Manual QA Workflow (Before)
Section titled “Manual QA Workflow (Before)”1. Run agent manually in notebook2. Read output line by line3. Check: - Does it answer the question? - Any errors or warnings? - Is the output grammatical? - How long did it take? - How much did it cost?4. Document findings in Google Sheet5. Notify team if anything brokeAutomated Test Suite (After)
Section titled “Automated Test Suite (After)”from attest import expectimport pytest
@pytest.fixturedef agent(): from app import create_agent return create_agent()
def test_factual_questions(agent): """Test agent on factual questions.""" test_cases = [ ("What is the capital of France?", "Paris"), ("When was WWII?", "1939-1945"), ("Who is the President of the US?", "Biden") ]
for question, expected in test_cases: result = agent.run(question)
(expect(result) .output_contains(expected) .output_not_contains("error") .latency_under(5000) .cost_under(0.10) .passes_judge("Is this factually correct?"))
def test_response_quality(agent): """Test response quality metrics.""" result = agent.run("What is 2+2?")
(expect(result) .output_not_contains("") .passes_judge("Is this grammatically correct?") .passes_judge("Is this professionally written?"))
def test_performance(agent): """Test performance constraints.""" result = agent.run("complex question")
(expect(result) .latency_under(5000) .cost_under(0.25))Run it:
pytest test_agent.py -vOutput:
test_agent.py::test_factual_questions[Paris] PASSEDtest_agent.py::test_factual_questions[1939-1945] PASSEDtest_agent.py::test_factual_questions[Biden] PASSEDtest_agent.py::test_response_quality PASSEDtest_agent.py::test_performance PASSED
======================== 5 passed in 12.34s ========================Benefits
Section titled “Benefits”1. Consistency
Section titled “1. Consistency”Same checks every run, no human error:
# This always passes or fails consistentlyexpect(result).output_contains("hello")2. Speed
Section titled “2. Speed”Run full QA in seconds instead of hours:
# Runs 100 test cases in <60spytest test_agent.py --tb=short3. Regression Detection
Section titled “3. Regression Detection”Fail immediately if quality drops:
# Catches unexpected changesexpect(result).cost_under(0.10) # Fails if cost increasedexpect(result).latency_under(5000) # Fails if agent is slow4. Documentation
Section titled “4. Documentation”Tests serve as specification:
def test_agent_responds_to_math(): """Agent should correctly answer math questions.""" result = agent.run("What is 2+2?") expect(result).output_contains("4")How do I migrate an existing QA spreadsheet?
- Export checklist items from your spreadsheet
- Map each item to an assertion
- Organize into test functions
- Run
pytestto validate
What about edge cases I haven’t tested?
Add them as you find them:
def test_edge_case_empty_input(agent): """Agent should handle empty input gracefully.""" result = agent.run("") expect(result).output_not_contains("error")Can I test in the cloud?
Yes, run pytest in CI/CD. Attest works everywhere Python runs.
How do I monitor quality over time?
Check CI logs to see pass rates by date. For dashboards, integrate with your monitoring platform.
Related
Section titled “Related”- Expect DSL Reference — All assertion methods
- Quickstart — Quick start guide
- Migration from PromptFoo — Migrate from another framework