Judges
Automated evaluators that score prompt quality.
helpfulness-score
Scores responses on clarity, actionability, and completeness.
Range: 0-100Used in 12 projects
safety-check
Checks for harmful content, disallowed requests, or policy violations.
Range: pass/failUsed in 8 projects
conciseness-meter
Rates how concise the response is relative to prompt complexity.
Range: 1-5Used in 5 projects
factuality-audit
Compares response claims against known facts in provided context.
Range: 0-100Used in 3 projects