Audit Methodology

TrustEd Nav is an AI-assisted review tool designed to accelerate expert judgment — not to replace it. This page documents what the audit actually does, where its rubric comes from, and where it can be wrong.

What the audit is — and is not

It is a structured second opinion that surfaces likely issues faster than a manual read-through.
It is not a quality certification, accreditation, or substitute for an instructional coach, content expert, or district adoption review.
Trust scores are model-generated estimates of confidence — treat them as triage signals, not verdicts.

The rubric

Each material is scored across six dimensions. Every dimension maps to one or more recognized frameworks. The full rubric version applied to your audit is recorded with the audit record.

Standards Alignment

standards

Which standards are addressed, explicitly or implicitly, and what gaps exist?

Frameworks

Common Core State Standards (CCSS)
Next Generation Science Standards (NGSS)
State frameworks

Evidence basis

Vector similarity search against an indexed standards corpus, plus LLM rationale.

Cognitive Load

cognitive_load

Is the material appropriately demanding for the stated grade band?

Frameworks

Cognitive Load Theory (Sweller, 1988)
Cognitive Theory of Multimedia Learning (Mayer, 2001)
Academic Vocabulary Tiers (Beck, McKeown & Kucan)

Evidence basis

Sentence-length distribution, vocabulary tier estimation, working-memory chunk count, and qualitative scaffolding review.

Accessibility & UDL

accessibility

Can all learners access the material, including ELL and IEP students?

Frameworks

Universal Design for Learning (CAST UDL Guidelines 2.2)
WIDA English Language Development Standards
WCAG 2.2 (for described visual content)

Evidence basis

Reading-level analysis, plain-language review, multiple-representation check, alt-text gap detection.

Assessment Quality

assessment_quality

Are assessment items valid, aligned, and well-constructed?

Frameworks

Webb's Depth of Knowledge (DOK)
Bloom's Revised Taxonomy (Anderson & Krathwohl)
Standards for Educational and Psychological Testing (AERA/APA/NCME)

Evidence basis

Item-stem clarity, distractor plausibility, alignment to stated objectives, DOK estimation.

Factual Reliability

reliability

Are factual claims accurate, current, and supported?

Frameworks

Independent fact verification heuristics
Source-of-truth comparison where domain corpora exist

Evidence basis

LLM critique with cited passages; flagged claims include the exact location in the source material.

Bias & Inclusivity

bias

Does the material avoid stereotypes, exclusionary language, or cultural assumptions?

Frameworks

Culturally Responsive Teaching (Gay, 2010; Hammond, 2015)
Learning for Justice — Social Justice Standards

Evidence basis

Representation review, stereotype detection, inclusive-language check.

How a single audit runs

The material is normalized to text and chunked.
Each rubric dimension is evaluated by a large language model with a fixed instruction prompt.
For Standards Alignment, the material is also embedded and compared against a vector index of standards to surface likely matches.
Per-dimension output includes a 0–100 score, a self-reported confidence, a summary, and granular findings with severity and (where possible) the exact location in the source.
The overall trust score is a weighted aggregate of dimension scores, biased toward dimensions where the model expressed higher confidence.

Known limitations

Not yet independently validated. We have not published an inter-rater reliability study comparing engine output to expert reviewers. Calibration work is in progress with pilot districts.
LLM variance. Even at temperature 0, model responses can drift across versions. The model identifier is recorded with each audit so results are traceable.
Coverage gaps. Standards alignment is strongest for CCSS/NGSS and weakens for niche state frameworks not yet in our corpus.
No images yet. The current audit reasons over text. Images, diagrams, and scanned PDFs are summarized but not deeply analyzed.
Bias detection is conservative. The model is more likely to miss subtle bias than to over-flag it. Human review remains essential for equity-critical adoption decisions.

How we improve reliability

Human-in-the-loop review. Reviewers can mark findings as accurate or incorrect; aggregate agreement rates feed our calibration metrics.
Versioned rubric. Every audit records the rubric version that produced it. Changes are documented in our changelog so prior audits remain interpretable.
Pinned model settings. Temperature, system prompt, and model identifier are recorded per audit run.
External validation. We are partnering with districts to score the same materials with both expert reviewers and the engine. Results will be published here.

Have a question or want to participate in a calibration study?support@exceluplab.com