Eval Viewer¶
A CLI tool to run evaluations and display results in a TUI (Terminal User Interface).
Installation¶
From Local Repository (Recommended)¶
cd cmd/evalviewer
go install
After installation, the evalviewer command will be available in your PATH.
Usage¶
Run Command (Recommended)¶
Run go test with GOEVALS=1 environment variable set, then automatically find and display the evaluation results in a TUI.
All arguments after ‘run’ are passed directly to ‘go test’.
# Run evaluations for conversations package
evalviewer run ./conversations
# Run all evaluations
evalviewer run -v ./...
# Run with test coverage
evalviewer run -cover ./conversations
The run command will:
Clean up any existing evals.jsonl file
Execute go test with GOEVALS=1
Search for evals.jsonl in current and parent directories
Launch the TUI to display results
View Command¶
Display evaluation results from an existing evals.jsonl file in a TUI.
# View existing results (defaults to evals.jsonl in current directory)
evalviewer view
# View results from specific file
evalviewer view -file evals.jsonl
evalviewer view -f /path/to/evals.jsonl
# Show only failed evaluations
evalviewer view -failures-only
Check Command (CI Mode)¶
Run evaluations and check results without the interactive TUI. Exits with status code 1 if any evaluations fail. This is designed for CI/CD pipelines.
All arguments after ‘check’ are passed directly to ‘go test’.
# Run and check evaluations (no TUI)
evalviewer check ./conversations
# Run and check all evaluations
evalviewer check ./...
# Run with verbose output
evalviewer check -v ./...
The check command will:
Clean up any existing evals.jsonl file
Execute go test with GOEVALS=1
Display test output in real-time
Print a summary of evaluation results
Exit with status code 1 if any evaluations failed
Environment Variables¶
The evalviewer and evaluation tests support multiple LLM providers. Configure which providers to use and their settings with these environment variables:
Provider Selection¶
LLM_PROVIDER: Choose which provider(s) to run evaluations withValues:
openai,anthropic,azure,all, or comma-separated (e.g.,openai,azure)Default:
all(runs all providers)
OpenAI Configuration¶
OPENAI_API_KEY: Your OpenAI API key (required for OpenAI)OPENAI_MODEL: Model to use (overrides code default)
Anthropic Configuration¶
ANTHROPIC_API_KEY: Your Anthropic API key (required for Anthropic)ANTHROPIC_MODEL: Model to use (overrides code default)
Azure OpenAI Configuration¶
AZURE_OPENAI_API_KEY: Your Azure OpenAI API key (required for Azure)AZURE_OPENAI_ENDPOINT: Your Azure OpenAI endpoint URL (required for Azure)AZURE_OPENAI_MODEL: Model deployment name to use (overrides code default)
Grader Configuration¶
The grader LLM is used to evaluate the quality of responses from the main LLM. By default, it uses OpenAI. You can configure a different provider or model for grading:
GRADER_LLM_PROVIDER: Provider to use for grading (e.g.,openai,anthropic,azure)Default:
openaiUses the same API key environment variables as the main LLM (e.g.,
OPENAI_API_KEY,ANTHROPIC_API_KEY)
GRADER_LLM_MODEL: Model to use for gradingFor each provider, uses its code default model unless specified
Examples¶
# Run with only OpenAI
LLM_PROVIDER=openai OPENAI_API_KEY=sk-... evalviewer run ./conversations
# Run with only Anthropic
LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=sk-ant-... evalviewer run ./conversations
# Run with OpenAI and Azure (skip Anthropic)
LLM_PROVIDER=openai,azure evalviewer run ./conversations
# Run with all providers (default)
evalviewer run ./conversations
# Use a specific model for OpenAI
OPENAI_MODEL=gpt-4-turbo evalviewer run ./conversations
# Use Anthropic for the main LLM and OpenAI for grading (default grader)
LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=sk-ant-... OPENAI_API_KEY=sk-... evalviewer run ./conversations
# Use OpenAI for main LLM and Anthropic for grading
LLM_PROVIDER=openai GRADER_LLM_PROVIDER=anthropic OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... evalviewer run ./conversations
# Use a specific model for grading
GRADER_LLM_MODEL=gpt-4o evalviewer run ./conversations
If a provider’s API key is not set, an error will be thrown.