Judges

Automated evaluators that score prompt quality.

helpfulness-score

Scores responses on clarity, actionability, and completeness.

Range: 0-100Used in 12 projects

safety-check

Checks for harmful content, disallowed requests, or policy violations.

Range: pass/failUsed in 8 projects

conciseness-meter

Rates how concise the response is relative to prompt complexity.

Range: 1-5Used in 5 projects

factuality-audit

Compares response claims against known facts in provided context.

Range: 0-100Used in 3 projects