EvalView - AI Agent Testing

March 19, 2026

Version updated for https://github.com/hidai25/eval-view to version v0.5.3.

This action is used across all versions by 0 repositories.

Action Type

This is a Composite action.

Go to the GitHub Marketplace to find the latest changes.

Action Summary

EvalView is a GitHub Action designed for regression testing AI agents by automating the detection of behavioral changes and performance degradations. It captures and compares agent outputs, tool usage, and execution sequences against golden baselines, allowing developers to identify silent regressions such as altered tool paths or reduced output quality. This action ensures broken agents are caught before production, providing robust monitoring and traceability.

Release notes

HTML Report Redesign

Overview tab — Compact KPI strip replaces 6-card hero grid. Removed duplicate Agent Model/Token Usage cards and Distribution donut. Full-width score chart. No-judge notice when hallucination/safety checks are skipped.

Execution Trace tab — Adaptive collapse: ≤4 tests all expanded, 5+ only first. Larger chevron buttons.

Diffs tab — Collapsible items (passed collapsed, changes expanded). Removed duplicate tool tags. Lazy-rendered trajectory diagrams behind toggle. Baseline→current score display (86.0 → 87.5 +1.5). Tooltips on lexical/semantic similarity.

Timeline tab — KPI summary strip. Side-by-side latency + cost charts. Color-coded bars by test.

All tabs — Larger Mermaid diagram fonts. Removed SVG max-height cap.