The Risk Score is one of the four observability primitives in SADAR. It is a single real-valued scalar that synthesizes the risk-significant events of an invocation chain into one number that downstream consumers and auditors can reason about. This document covers what the Risk Score is, the data model of the adjustments that drive it, how categorization through reason IRIs supports analysis, how implementations choose accumulation algorithms within bounded constraints, the floor-clamping property that protects against “goodness debt,” the two propagation forms (baggage adjustments list plus accumulated scalar in SCT), and the disambiguation against the related-but-distinct Impact Score.
The normative basis lives across multiple sections. Scope §5.1.18 defines Risk Score as a first-class concept. The 8. Risk Score Specification companion document is the full normative source — adjustments schema, the twelve starter reasons, accumulation requirements, and the IRI namespace. The Website Narrative §Risk Score provides the orientation framing this page builds on. NIST SP 800-30 Rev 1Appendix H is referenced informatively (non-normatively) for risk-event taxonomy alignment.
Audience: implementers building accumulation algorithms or risk-policy engines; architects scoping risk-modeling integration; and audit/compliance teams using Risk Score values in governance decisions.
The Risk Score is a real-valued scalar in the closed interval [0.0, 1.0] maintained across the invocation chain. At any boundary crossing, the score reflects the cumulative risk state of the flow up to that point — a synthesis of every risk-significant event that has occurred, weighted by the implementation's accumulation algorithm and bounded by the spec's clamping properties.
The score serves two complementary roles:
• As an observability primitive: auditors, dashboards, and post-hoc analysis can read the score at each boundary to characterize how risk evolved across the chain. By walking the SCT chain of an invocation and inspecting the score at each link, an investigator can see exactly where and why risk accumulated.
• As a control primitive: downstream consumers can refuse calls whose Risk Score has crossed a configured threshold, apply additional gating (re-authorization, human review, hold-and-investigate), or escalate to a higher-trust path. The same scalar serves both roles, which is the design point — observation and action are not separate concerns when the goal is governed delegation.
The Risk Score is the synthesis: dozens or hundreds of underlying signals collapsed into a single value that consumers can act on without re-doing the underlying analysis. The value of a synthesized scalar is not that it captures everything (the adjustments list, retained for audit, captures the full history) but that it gives every consumer a uniform basis for action.
The Risk Score is built from a sequence of adjustments. Each adjustment is a structured event emitted by some component during the invocation lifecycle (an agent, a tool, the orchestrator,or application code via the Helper API). Adjustments are the atomic unit of risk recording. The data model is canonical across all SADAR-conformant implementations:
Adjustments are immutable once emitted: their fields cannot be rewritten by the entity that produced them or by any downstream component. This is the same architectural property as phase-immutability for Telemetry Records (see Telemetry Record) — once an event is recorded into the audit fabric, it cannot be retroactively edited. New information produces new adjustments; it does not amend old ones.
Each adjustment carries a reason field identifying the categorical reason for the adjustment. Reasons are IRIs (Internationalized Resource Identifiers, RFC 3987), which provides extensibility, global uniqueness, and disambiguation across organizational namespaces. Implementations can register their own reasons in private namespaces while drawing on the standard set for cross-organizational interoperability.
The spec defines twelve starter reasons in a SADAR-managed IRI namespace. These cover the categorical territory commonly relevant to governed agentic flows: adversarial input handling, trust-model events, governance-driven elevations (such as human-confirmation triggers), authentication state changes, anomaly detection, and policy-related events. The full enumeration with normative IRIs lives in 8.Risk Score Specification; this page describes the structural pattern.
Reason categorization matters because accumulation algorithms typically weight by category. An adjustment categorized as a hard authentication failure is conventionally weighted more heavily than one categorized as an operational latency anomaly, even if the raw delta values are similar. By making categorization an explicit field rather than implicit in the delta, the Risk Score model lets organizations build accumulation algorithms that reflect their own risk-weighting frameworks while remaining cross-organizationally interoperable.
The twelve starter reasons are aligned with the threat-event taxonomy in NIST SP 800-30 Rev 1, Guide for Conducting Risk Assessments, Appendix H. The reference is informative rather than normative — implementations are not required to use NIST's framework, and the SADAR taxonomy may diverge as the spec evolves. The reference exists to ground the conceptual foundation in established risk-management practice and to give implementers a starting point for organization-specific extension reasons.
Implementations using other established risk frameworks — FAIR, ISO 27005, or organization-interna taxonomies — remain conformant. The taxonomy alignment is a convenience, not a constraint. What the spec mandates is the structural pattern: reasons are IRIs, the standard set is enumerated, extension namespaces are permitted, and accumulation algorithms may use reason as a categorization input.
How adjustments combine into the cumulative scalar is implementer-chosen, not specified. Different organizations have different risk models — a financial services firm may want strictly conservative accumulation; a research environment may tolerate more latitude with steeper decay; a healthcare deployment may weight authentication events much more heavily than operational ones. A spec-mandated single algorithm would impose one risk model on all implementers; leaving the choice to implementations preserves their domain expertise.
What the spec mandates regardless of accumulation algorithm:
• Range. The resulting score is in the closed interval [0.0, 1.0]. Out-of-range scores are non-conformant.
• Floor and ceiling clamping. Computations that would produce values outside [0.0, 1.0] are clamped to the nearest endpoint. See the next section.
• Cryptographic parity. The accumulated scalar attested in the SCT chain link claim at any boundary MUST agree with what would be computed from the baggage adjustments at that boundary using the implementation's algorithm. Tampering with either the adjustments list or the attested scalar produces a parity violation.
• Reproducibility within an implementation. Given the same adjustments list, the same algorithm, and the same evaluation point, any two evaluators produce the same scalar. This is what makes the score auditable.
Within those constraints, common accumulation patterns include simple summation with category weights, weighted averaging, soft-max combination, exponentially-weighted moving averages, and category-specific clamping. Implementations document their algorithm so consuming parties can interpret scores correctly.
The floor at 0.0 is enforced by clamping. Any computation that would produce a negative score is clamped to 0.0. This is the property informally called “no banking goodness”:adjustments cannot accumulate negative state that a future risk event would have to overcome before producing an actionable signal.
Consider what would happen without floor clamping. A long stretch of low-risk operations with negative adjustments would push the score deeply negative. A subsequent risky event would emit a positive adjustment — but the score would still be negative, perhaps far below 0.0, and a control system thresholding on (say) Risk Score > 0.7 would not trigger. The risky event has occurred, but the accumulated “goodness debt” from the prior clean operation is masking it. Risk visibility is destroyed by the very mechanism intended to preserve favorable history.
Floor clamping prevents this. Each new risk event is evaluated against a baseline of 0.0, not against arbitrarily negative state. A positive adjustment after a long clean stretch produces a score that reflects that single event, not a number reduced by years of “nothing happened.” Risk signals are immediate, not delayed by accumulated favorable history.
The ceiling at 1.0 is symmetrical. Once the score reaches 1.0, additional risk-increasing adjustments do not push it higher — though they remain in the adjustments list for audit purposes. The score saturates at 1.0; further adjustments preserve their information in the audit trail without escalating the headline value beyond the bounded range.
The Risk Score travels in two complementary forms during invocation execution:
• As an adjustments list in OTel baggage. The full l;ist of adjustments propagates in-flight along with the call, carried in OTel baggage under the key urn:sadar:baggage:v1:risk_adjustments.Each adjustment is the structured object described in the data model above. Receivers can iterate the list and recompute the score themselves using their own accumulation algorithm — preserving the option for receivers with different risk models or weighting choices.
• As an accumulated scalar in the SCT chain linkclaim. At each chain link issuance, the issuer computes the current Risk Score and attests to the resulting scalar as a claim in the new SCT chain link. The scalar is the issuer's authoritative computation at that boundary; downstream parties that trust the issuer's algorithm can use the scalar directly without iterating the adjustments list.
The two forms are bound by cryptographic parity (§5.1.17). The accumulated scalar attested in the SCT MUST agree with what the issuer's accumulation algorithm would produce from the baggage adjustments list at that boundary. Disagreement is a parity violation, reported as a structured error in the urn:sadar:error:v1 namespace.
Why two forms? The list lets receivers preserve their own algorithmic autonomy — a receiving organization may have its own risk model and may want to interpret raw events under its own weights. The scalar lets receivers that trust the issuer's algorithm avoid recomputation overhead — a reasonable choice for receivers with no independent risk-modeling stake. Cryptographic parity ties the two together so that receivers that recompute can detect tampering with the attested scalar, and receivers that consume the scalar can detect tampering with the adjustments list.
Risk Score is sometimes confused with the related-but-distinct Impact Score, which is a separate Protocol NFR. The two concepts share a domain (risk modeling) but answer different questions and operate at different points in the invocation lifecycle.
The two concepts compose in governance decisions but do so at different points. Impact Score is evaluated at discovery: a requester comparing candidate targets considers the Impact Score of each as part of bilateral matching. Risk Score is evaluated at every boundary during execution: at each step, a consumer considers the running Risk Score in deciding whether to proceed, gate, or refuse.
A simple rule of thumb captures the relationship. Impact Score asks “how bad could this be?” — a property of the operation type and the entity performing it, set in advance. Risk Score asks “how risky has it been so far?” — a property of what has actually happened in this specific flow. A high-impact, low-risk-so-far operation is one to proceed with caution; a low-impact, high-risk-so-far operation may warrant investigation despite low potential damage; a high-impact, high-risk-so-far operation typically warrants strong gating or refusal.
Both inform governance, but neither replaces the other. The spec keeps them as distinct concepts so that operational policies can reason about each independently.
This page covers Risk Score asan architectural concept. Several adjacent topics are intentionally out ofscope here and live in dedicated documents:
• Observability Overview — the cross-cutting summary of SADAR observability; how Risk Score composes with Telemetry Records, OTel baggage, and Repatriation as one of the four observability primitives.
• SADAR Context Token — the SCT chain link in which the accumulated Risk Score scalar is attested.
• Telemetry Record — the per-invocation audit artifact whose risk_score_state section records Risk Score evolution across the lifecycle.
• Helper API — the application-facing surface; add_risk_adjustment is how application code contributes adjustments to the Risk Score.
• Repatriation — the cross-trust-boundary mechanism; Risk Score values and adjustments are among the artifacts that travel via Repatriation.
• 2. Scope §5.1.18 — Risk Score as a first-class concept at the spec level.
• 2. Scope §5.1.17 — Cryptographic Parity; the requirement that ties the baggage adjustments list to the SCT chain link scalar.
• 3. NFR Schema — full normative content for Protocol NFRs including the Impact Score NFR.
• 8. Risk Score Specification — the full normative source: adjustments schema, the twelve starter reasons, IRI namespace structure, accumulation requirements.
• NIST SP 800-30 Rev 1, Appendix H — informative reference for risk-event taxonomy alignment; the conceptual grounding for the standard reason set.