Skill Probe - AI Agent Skill Auditor
Version updated for https://github.com/HystonKayange/skill-probe to version v0.9.0.
- This action is used across all versions by 0 repositories.
Action Type
This is a Composite action.
Go to the GitHub Marketplace to find the latest changes.
What’s Changed
skill-probe in your pipeline, gating on statistics instead of vibes
The CI question isn’t “are my skills perfect?” — it’s “did this PR make any skill worse?” Activation is stochastic, so raw-rate comparisons make CI flaky. v0.9.0 makes the gate honest.
Baseline regression gating
skill-probe --config probe.config.json --save-baseline baselines/main.json # once, on main
skill-probe --config probe.config.json --baseline baselines/main.json # every PR
Each case is compared against the baseline with Fisher’s exact test, Benjamini-Hochberg corrected — noise passes, significant drops fail (exit 1). Drift in config/runtime/model since the baseline warns (matched cases still gated); new/missing cases are reported, not gated; a side with no valid probes is NOT COMPARABLE, never a silent pass. Significant improvements are flagged as a nudge to re-save the baseline.
Run manifest
Every --json result now stamps who/what/when: tool version, command, ISO date, runtime, model, k/threshold/conf, and a config hash (cwd excluded — the same library on a laptop and a CI runner is the same experiment). Two runs are comparable; a report is citable.
GitHub Action
- uses: HystonKayange/skill-probe@main
with:
config: probe.config.json
args: --baseline baselines/main.json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
Installs skill-probe + the runtime CLI (Claude Code by default, swappable) and runs any subcommand (audit/context/diagnose/doctor).
Plus: repo CI (typecheck + tests, no API calls). 83 tests. Baseline save→gate loop proven live.