AI Judge

Some Things You Can't Write a Rule For

AI Judge evaluates your data against plain-English criteria using LLM evaluation. The third layer — after schema validation and business rules have already passed.

LLM-powered evaluation

Human-in-the-loop fallback

Prompt injection defense

Try AI Judge See Code Review Demo

Three-Layer Validation

AI Judge Only Runs When It Should

Schema validation and business rules run first. AI Judge only evaluates data that already passed structural and deterministic checks — no wasted LLM cost on invalid payloads.

Schema

Structural — types, constraints, required fields

Business Rules

Deterministic — cross-field logic, lookups

AI Judge

Semantic — LLM evaluation of plain-English criteria

Criteria

Plain English, Not Regular Expressions

Write criteria in natural language. Assign specialist personas for different checks. The LLM evaluates each criterion independently.

Simple Criteria

Just a string — the LLM uses its general knowledge.

[
  "Functions should follow SRP",
  "No hardcoded credentials",
  "Variable names are descriptive"
]

Criteria with Personas

Different specialist lenses for different checks.

[
  {
    "criterion": "Functions follow SRP",
    "persona": "Senior software architect"
  },
  {
    "criterion": "No SQL injection risks",
    "persona": "Application security engineer"
  },
  "Variable names are descriptive"
]

How It Evaluates

Each criterion is evaluated independently by the LLM

Per-criterion persona overrides the gate-level default persona

Each criterion returns a verdict (pass/fail), confidence (0-1), and reasoning

Configurable threshold (default 1.0 = all must pass)

Example Output

{
  "criterion": "Functions follow SRP",
  "verdict": "fail",
  "confidence": 0.85,
  "reasoning": "The processOrder()
    function handles validation,
    payment, and notification —
    three distinct responsibilities."
}

Confidence & Routing

When AI Is Uncertain, Humans Decide

Every criterion gets a confidence score. Low confidence automatically routes to human review. You control the threshold and the fallback behavior.

High Confidence

All criteria pass with high confidence. The run is automatically approved. No human intervention needed.

Low Confidence

One or more criteria have low confidence. The run is routed to a human reviewer via magic link for a final decision.

Clear Failure

Criteria fail with high confidence. Configurable per gate: onFail: "reject" or onFail: "review" for human override.

Human-in-the-Loop

AI Verdict + Human Judgment

When AI Judge is uncertain, it routes to a human reviewer. The reviewer sees the AI's verdict, confidence, and reasoning — then makes the final call.

Magic Links

Reviewers don't need a Rynko account. They receive a magic link via email, click it, and see the full review context.

One-click access to review interface
2-hour link expiry for security
Resend capability if link expires

Audit Trail

Every decision — AI and human — is permanently logged. Both judgments are captured side by side.

AI verdict, confidence, and reasoning
Reviewer identity, timestamp, comment
EU AI Act Article 14 compliance

Security

Three-Layer Prompt Injection Defense

AI Judge processes untrusted data from your agents. The security model assumes every payload is adversarial.

Input Sanitization

Payloads are sanitized before reaching the LLM. Known injection patterns are stripped. Payload size is capped at 50KB.

Structured Output

The LLM is constrained to return a specific JSON structure. Only the expected fields are extracted — everything else is discarded.

Type Coercion

Verdicts are coerced to boolean. Confidence is coerced to a number between 0 and 1. No string injection can escape the output schema.

Pricing & Limits

Pay for What You Use

AI Judge runs cost 5x standard run credits, reflecting the underlying LLM cost. Available on paid tiers only.

Tier	AI Judge Runs / Month	Max Criteria / Gate
Free	Not available	-
Starter	500/mo	10 per gate
Growth	5,000/mo	20 per gate
Scale	25,000/mo	50 per gate

Each AI Judge run consumes 5 standard run credits. Cost visibility is available in the dashboard.

Use Cases

What Can AI Judge Evaluate?

Anything that requires judgment, context, or domain expertise that you cannot reduce to a deterministic rule.

Code Review

Evaluate functions for SRP adherence, security vulnerabilities, naming conventions, and documentation quality.

Trade Document Validation

Check that product descriptions are specific enough for customs classification and HS codes are plausible for the goods described.

Content Quality

Evaluate marketing copy, product descriptions, and user-facing text for clarity, tone, and brand consistency.

Compliance Checks

Screen outputs for regulatory language adherence, policy compliance, and disclosure requirements.

Ready to Add AI Evaluation?

Set up criteria in plain English, let the LLM evaluate, and route uncertain cases to humans. Three layers of validation, one API call.

Try AI Judge See Code Review Demo