TrustEd Nav is an AI-assisted review tool designed to accelerate expert judgment — not to replace it. This page documents what the audit actually does, where its rubric comes from, and where it can be wrong.
What the audit is — and is not
It is a structured second opinion that surfaces likely issues faster than a manual read-through.
It is not a quality certification, accreditation, or substitute for an instructional coach, content expert, or district adoption review.
Trust scores are model-generated estimates of confidence — treat them as triage signals, not verdicts.
The rubric
Each material is scored across six dimensions. Every dimension maps to one or more recognized frameworks. The full rubric version applied to your audit is recorded with the audit record.
Standards Alignment
standards
Which standards are addressed, explicitly or implicitly, and what gaps exist?
Frameworks
Common Core State Standards (CCSS)
Next Generation Science Standards (NGSS)
State frameworks
Evidence basis
Vector similarity search against an indexed standards corpus, plus LLM rationale.
Cognitive Load
cognitive_load
Is the material appropriately demanding for the stated grade band?
Frameworks
Cognitive Load Theory (Sweller, 1988)
Cognitive Theory of Multimedia Learning (Mayer, 2001)
Each rubric dimension is evaluated by a large language model with a fixed instruction prompt.
For Standards Alignment, the material is also embedded and compared against a vector index of standards to surface likely matches.
Per-dimension output includes a 0–100 score, a self-reported confidence, a summary, and granular findings with severity and (where possible) the exact location in the source.
The overall trust score is a weighted aggregate of dimension scores, biased toward dimensions where the model expressed higher confidence.
Known limitations
Not yet independently validated. We have not published an inter-rater reliability study comparing engine output to expert reviewers. Calibration work is in progress with pilot districts.
LLM variance. Even at temperature 0, model responses can drift across versions. The model identifier is recorded with each audit so results are traceable.
Coverage gaps. Standards alignment is strongest for CCSS/NGSS and weakens for niche state frameworks not yet in our corpus.
No images yet. The current audit reasons over text. Images, diagrams, and scanned PDFs are summarized but not deeply analyzed.
Bias detection is conservative. The model is more likely to miss subtle bias than to over-flag it. Human review remains essential for equity-critical adoption decisions.
How we improve reliability
Human-in-the-loop review. Reviewers can mark findings as accurate or incorrect; aggregate agreement rates feed our calibration metrics.
Versioned rubric. Every audit records the rubric version that produced it. Changes are documented in our changelog so prior audits remain interpretable.
Pinned model settings. Temperature, system prompt, and model identifier are recorded per audit run.
External validation. We are partnering with districts to score the same materials with both expert reviewers and the engine. Results will be published here.