EvalView - AI Agent Testing

March 29, 2026

Version updated for https://github.com/hidai25/eval-view to version v0.6.1.

This action is used across all versions by 0 repositories.

Action Type

This is a Composite action.

Go to the GitHub Marketplace to find the latest changes.

Action Summary

EvalView is an open-source GitHub Action designed to detect and prevent silent regressions in AI agents, such as changes in behavior that occur without breaking health checks or code. It automates the process of regression testing by tracking drift in outputs, tool usage, model updates, and runtime fingerprints, distinguishing provider/model changes from actual system regressions. Additionally, EvalView offers capabilities like classification of changes, inspection of drift, and auto-healing of flaky failures to ensure AI agents continue behaving correctly.

What’s Changed

What’s new

Full MCP feature parity — all CLI flags now exposed via MCP tools (heal, strict, statistical, budget, tags, variants, and more)
New MCP tools: compare_agents (A/B test two endpoints) and replay (trajectory diff viewer)
33 MCP regression tests — protocol, schema contracts, flag wiring, routing, timeouts

Fixes

Stable JSON response contract on run_check regardless of flags
--report no longer opens browser from MCP server
Replay timeout increased to 120s
Subprocess calls use stdin=DEVNULL to prevent hangs

Install / Upgrade

pip install --upgrade evalview

Full changelog: https://github.com/hidai25/eval-view/blob/main/CHANGELOG.md