sensei-eval
Version updated for https://github.com/CodeJonesW/sensei-eval to version v0.8.0.
- This action is used across all versions by 0 repositories.
Action Type
This is a Composite action.
Go to the GitHub Marketplace to find the latest changes.
Action Summary
The sensei-eval GitHub Action and TypeScript library streamline the evaluation of AI-generated educational content by performing deterministic checks and leveraging LLM scoring. It automates the detection of content quality regressions in CI workflows, enabling teams to maintain consistent prompt quality. Key features include baseline generation, regression detection, deterministic quick checks, and integration with CI pipelines to ensure scalable and cost-efficient quality control.
Release notes
Summary
- Judge usage tracking:
Judge.score()now returns optionalusage(input_tokens,output_tokens) alongside score results.createJudgepasses throughresponse.usagefrom the Anthropic API. - EvalResult aggregation:
EvalRunneraggregates token usage across all judge-scored criteria (including inline rubrics) intoEvalResult.usage. Omitted when no judge calls are made (e.g.quickCheck). - Default model change:
createJudgedefault model changed fromclaude-sonnet-4toclaude-haiku-4-5-20251001— better default for cost-sensitive eval workloads. Callers can still override viaopts.model.
Test plan
- Existing judge tests updated with
usagein mock responses - New runner tests verify usage aggregation from LLM criteria
- New runner test verifies usage aggregation from inline rubrics
- New runner test verifies
usageis undefined for deterministic-only evals - All 216 tests pass
🤖 Generated with Claude Code