EvalView - AI Agent Testing

February 27, 2026

Version updated for https://github.com/hidai25/eval-view to version v0.3.2.

This action is used across all versions by 0 repositories.

Action Type

This is a Composite action.

Go to the GitHub Marketplace to find the latest changes.

Action Summary

EvalView is an open-source testing framework for AI agents that automates regression detection by comparing changes in agent behavior against a saved golden baseline. It helps developers identify and prevent unintended behavioral changes when modifying prompts, models, or tools, making it a critical CI/CD layer for AI development. Key features include automatic behavioral drift detection, compatibility with various frameworks and APIs, and the ability to operate fully offline without requiring LLM-based evaluations.

Release notes

What’s fixed

claude-code adapter: auth failure in MCP context

The adapter was failing immediately (~3-4s) with “Invalid API key” when invoked through the MCP chain. Root cause: Claude Code sets ANTHROPIC_API_KEY to a session-scoped token in its subprocess environment, which the inner claude --print inherited and the Anthropic API rejected.

Fix: Strip ANTHROPIC_API_KEY from the adapter’s env so the inner claude falls back to ~/.claude.json credentials (stored by claude auth login).

custom adapter: works for OAuth users (no API key needed)

The demo runner.py used the Anthropic SDK directly, which requires ANTHROPIC_API_KEY. Claude Code OAuth users don’t have this env var set.

Fix: Rewrote runner to use claude --print subprocess (same auth path as the claude-code adapter).

MCP server: skill test timeout raised to 600s

Multi-test suites (10 tests × ~15s each) were hitting the previous 120s timeout.

Other improvements

Non-interactive mode for generate-tests (--auto / no TTY)
Better first-snapshot and first-check celebration panels with CI integration steps
60s asyncio timeout on LLM calls in test generator
Actionable hints when skill dependencies (e.g. mcporter) are missing