EchoStash
Docs

Eval

Test prompts with eval suites

Prompt Evaluation

The eval system lets you write test suites for your prompts. Define assertions about rendered output, LLM responses, performance, and semantic similarity — then run them from the CLI or via PLP.

Quick Example

Create eval/tests/smoke.eval:

yaml
suite: "Welcome Email Quality"
config:
  target: prompts/welcome-email.pdk
  model: gpt-4o-mini
tests:
  - name: "Includes greeting"
    given:
      name: "Alice"
      tier: "pro"
    expect_render:
      - contains: "Alice"
      - length: { min: 50, max: 500 }

  - name: "Helpful response"
    given:
      name: "Bob"
      tier: "free"
    expect_llm:
      - llm_judge: "Is the response welcoming and professional?"
      - sentiment: positive
      - token_count: { max: 200 }

Running Evals

bash
echopdk eval eval/tests/smoke.eval
echopdk eval --filter "polite*" --reporter json
echopdk eval --record  # Record LLM responses as golden baseline

Assertion Types (16)

Text assertions (in expect_render):

  • contains, not_contains, equals, matches (regex)
  • starts_with, ends_with
  • length, word_count, json_valid

Semantic assertions (in expect_llm):

  • llm_judge — LLM answers a yes/no question about the response
  • similar_to — Embedding similarity to a golden response
  • sentiment — positive, negative, neutral, or helpful

Performance assertions:

  • latency — Max milliseconds
  • token_count — Token count range
  • cost — Max estimated USD

Datasets (.dset)

Dataset files provide reusable test data and golden responses:

yaml
name: "Customer Feedback Dataset"
golden:
  response: |
    The customer's concern has been acknowledged
    with a helpful solution provided.
  model: gpt-4o-mini
  recorded_at: "2024-02-12T10:00:00Z"
parameters:
  - name: case_1
    customer_msg: "I can't log in"
  - name: case_2
    customer_msg: "Your product broke my computer"

Reporters

  • console (default) — Human-readable with colors
  • json — Structured JSON for programmatic use
  • junit — XML format for CI/CD integration