Sgraal's preflight returns more than verdict. Eleven derived diagnostics surface the why behind every decision and reveal where memory is drifting before it becomes a failure mode.
What it measures: projected runway before memory drift forces a BLOCK decision, with confidence interval.
Why it matters: early-warning signal — schedule maintenance ahead of operational impact.
Customer action: if < 7 days, queue refresh / heal / re-source.
What it measures: whether stated confidence matches actual reliability of the underlying memory.
Why it matters: overconfident memory is the leading silent failure mode for autonomous agents.
Customer action: on OVERCONFIDENT flag, require explicit user confirmation before irreversible action.
What it measures: wall-clock age of the memory entries the decision relied on.
Why it matters: reasoning quality decays with stale data. Some domains tolerate weeks; others minutes.
Customer action: tune per-domain freshness threshold; alert when crossed.
What it measures: which single memory entry yields the largest decision improvement if healed.
Why it matters: healing is not free; the metric prioritizes the one entry where intervention buys the most safety.
Customer action: automate refresh of the flagged entry; defer others.
What it measures: how this agent's memory profile compares to the fleet aggregate distribution.
Why it matters: outliers are early signals of either compromise or genuinely novel context.
Customer action: investigate agents whose distance grows monotonically; consider quarantine.
What it measures: direction and rate of change in the structural complexity of recent memory.
Why it matters: rising complexity without rising relevance indicates accumulation without curation.
Customer action: trigger MVMem retention sweep when trend stays positive across 7+ calls.
What it measures: relative cost of false-allow vs false-block for the current action and domain.
Why it matters: a 1% false-allow on a $5M wire transfer is not the same as 1% on a chat reply.
Customer action: auto-tune your local approval workflow to match.
What it measures: the entry whose removal would change the decision most — and the score quantifying it.
Why it matters: a decision driven by one source is fragile. Diversification is the fix.
Customer action: add corroborating sources before acting; or accept the risk consciously.
What it measures: diversity of provenance + structural independence of supporting evidence.
Why it matters: three sources that all trace back to the same root agent are one source wearing three hats.
Customer action: on HIGH, require human approval before irreversible action.
What it measures: compact narrative explanation of the decision in plain English.
Why it matters: compliance teams, end users, and operators all need an explainability surface — not raw scores.
Customer action: include in audit trail; surface to end-user UI on ASK_USER decisions.
What it measures: uncertainty bounds on the days-until-block estimate.
Why it matters: a point estimate of "5 days" with ±0.5 day CI is actionable; ±10 day CI is not.
Customer action: only act on the projection when CI width is below your domain threshold.
No additional API calls. No additional latency. Open the response, read the metrics that matter for your domain.