EvalView - AI Agent Testing
Version updated for https://github.com/hidai25/eval-view to version v0.5.4.
- This action is used across all versions by 0 repositories.
Action Type
This is a Composite action.
Go to the GitHub Marketplace to find the latest changes.
Action Summary
EvalView is a GitHub Action designed for regression testing of AI agents by automating behavior snapshotting, change detection, and quality monitoring. It identifies regressions such as altered tool paths, degraded output quality, or silent errors after model updates by comparing execution traces against golden baselines. This tool streamlines testing workflows and ensures AI agents maintain consistent and reliable performance before deployment.
Release notes
What’s New
Python API
gate()andgate_async()— programmatic regression checks, no CLI neededgate(quick=True)— skip LLM judge for free, sub-second checksfrom evalview import gate, DiffStatus— clean top-level imports- Typed results:
GateResult,TestDiff,GateSummary
Terminal Dashboard
- Scorecard panel with colored health bar, streak tracker, and gauge
- Unicode sparkline trends from drift history
- Confidence scoring on each verdict (z-score based signal vs noise)
- Smart accept suggestions when changes look intentional
HTML Report Dashboard
- SVG health gauge with pass/fail breakdown
- Chart.js trend lines for output similarity over time
- Confidence badges on diff rows
- Accept suggestion boxes with copy-paste commands
OpenClaw Integration
evalview openclaw install— install gate skill into claw workspaceevalview openclaw check— run gate with auto-revertgate_or_revert()/check_and_decide()Python helpers- Built-in SKILL.md for autonomous agent loops
MCP Server
run_checkrewired to callgate()directly (no subprocess)- Fallback to subprocess on error
Other
evalview snapshot --preview— dry-run before saving baselinespython -m evalviewsupport- Centralized model defaults (
DEFAULT_MODELS,DEFAULT_JUDGE_MODEL) - Updated all defaults from gpt-4o-mini to gpt-5.4-mini
- 22 new API tests (1147 total passing)
- mypy clean (166 source files, 0 errors)
Install
pip install evalview==0.5.4