AI Judge

Some Things You Can't Write a Rule For

AI Judge evaluates your data against plain-English criteria using LLM evaluation. The third layer — after schema validation and business rules have already passed.

LLM-powered evaluation
Human-in-the-loop fallback
Prompt injection defense
Three-Layer Validation

AI Judge Only Runs When It Should

Schema validation and business rules run first. AI Judge only evaluates data that already passed structural and deterministic checks — no wasted LLM cost on invalid payloads.

Schema

Structural — types, constraints, required fields

Business Rules

Deterministic — cross-field logic, lookups

AI Judge

Semantic — LLM evaluation of plain-English criteria

Criteria

Plain English, Not Regular Expressions

Write criteria in natural language. Assign specialist personas for different checks. The LLM evaluates each criterion independently.

Simple Criteria
Just a string — the LLM uses its general knowledge.
[
  "Functions should follow SRP",
  "No hardcoded credentials",
  "Variable names are descriptive"
]
Criteria with Personas
Different specialist lenses for different checks.
[
  {
    "criterion": "Functions follow SRP",
    "persona": "Senior software architect"
  },
  {
    "criterion": "No SQL injection risks",
    "persona": "Application security engineer"
  },
  "Variable names are descriptive"
]
How It Evaluates
1

Each criterion is evaluated independently by the LLM

2

Per-criterion persona overrides the gate-level default persona

3

Each criterion returns a verdict (pass/fail), confidence (0-1), and reasoning

4

Configurable threshold (default 1.0 = all must pass)

Example Output
{
  "criterion": "Functions follow SRP",
  "verdict": "fail",
  "confidence": 0.85,
  "reasoning": "The processOrder()
    function handles validation,
    payment, and notification —
    three distinct responsibilities."
}
Confidence & Routing

When AI Is Uncertain, Humans Decide

Every criterion gets a confidence score. Low confidence automatically routes to human review. You control the threshold and the fallback behavior.

High Confidence
All criteria pass with high confidence. The run is automatically approved. No human intervention needed.
Low Confidence
One or more criteria have low confidence. The run is routed to a human reviewer via magic link for a final decision.
Clear Failure
Criteria fail with high confidence. Configurable per gate: onFail: "reject" or onFail: "review" for human override.
Human-in-the-Loop

AI Verdict + Human Judgment

When AI Judge is uncertain, it routes to a human reviewer. The reviewer sees the AI's verdict, confidence, and reasoning — then makes the final call.

Magic Links

Reviewers don't need a Rynko account. They receive a magic link via email, click it, and see the full review context.

  • One-click access to review interface
  • 2-hour link expiry for security
  • Resend capability if link expires
Audit Trail

Every decision — AI and human — is permanently logged. Both judgments are captured side by side.

  • AI verdict, confidence, and reasoning
  • Reviewer identity, timestamp, comment
  • EU AI Act Article 14 compliance
Security

Three-Layer Prompt Injection Defense

AI Judge processes untrusted data from your agents. The security model assumes every payload is adversarial.

Input Sanitization
Payloads are sanitized before reaching the LLM. Known injection patterns are stripped. Payload size is capped at 50KB.
Structured Output
The LLM is constrained to return a specific JSON structure. Only the expected fields are extracted — everything else is discarded.
Type Coercion
Verdicts are coerced to boolean. Confidence is coerced to a number between 0 and 1. No string injection can escape the output schema.
Pricing & Limits

Pay for What You Use

AI Judge runs cost 5x standard run credits, reflecting the underlying LLM cost. Available on paid tiers only.

TierAI Judge Runs / MonthMax Criteria / Gate
FreeNot available-
Starter500/mo10 per gate
Growth5,000/mo20 per gate
Scale25,000/mo50 per gate

Each AI Judge run consumes 5 standard run credits. Cost visibility is available in the dashboard.

Use Cases

What Can AI Judge Evaluate?

Anything that requires judgment, context, or domain expertise that you cannot reduce to a deterministic rule.

Code Review
Evaluate functions for SRP adherence, security vulnerabilities, naming conventions, and documentation quality.
Trade Document Validation
Check that product descriptions are specific enough for customs classification and HS codes are plausible for the goods described.
Content Quality
Evaluate marketing copy, product descriptions, and user-facing text for clarity, tone, and brand consistency.
Compliance Checks
Screen outputs for regulatory language adherence, policy compliance, and disclosure requirements.

Ready to Add AI Evaluation?

Set up criteria in plain English, let the LLM evaluate, and route uncertain cases to humans. Three layers of validation, one API call.