Evaluation Skill 🧠

Community

Enables systematic testing of agent performance using outcome-focused approaches accounting for non-determinism. Covers rubric design, LLM-as-judge evaluation, human review processes, test set construction, and continuous evaluation pipelines for quality gates and regression detection.

Source
muratcankoylan/Agent-Skills-for-Context-Engineering
Added
2025-12-22

How to Use This Skill

  1. Click "View SKILL.md" to see the full skill definition
  2. Copy the contents of the SKILL.md file
  3. In Claude, go to Project Knowledge and paste the skill
  4. Start a new conversation and Claude will use the skill automatically

Leave a Comment

Related Skills