Risk Score

May 1, 2026

Draft

Purpose

The Risk Score is one of the four observability primitives in SADAR. It is a single real-valued scalar that synthesizes the risk-significant events of an invocation chain into one number that downstream consumers and auditors can reason about. This document covers what the Risk Score is, the data model of the adjustments that drive it, how categorization through reason IRIs supports analysis, how implementations choose accumulation algorithms within bounded constraints, the floor-clamping property that protects against “goodness debt,” the two propagation forms (baggage adjustments list plus accumulated scalar in SCT), and the disambiguation against the related-but-distinct Impact Score.

The normative basis lives across multiple sections. Scope §5.1.18 defines Risk Score as a first-class concept. The 8. Risk Score Specification companion document is the full normative source — adjustments schema, the twelve starter reasons, accumulation requirements, and the IRI namespace. The Website Narrative §Risk Score provides the orientation framing this page builds on. NIST SP 800-30 Rev 1Appendix H is referenced informatively (non-normatively) for risk-event taxonomy alignment.

Audience: implementers building accumulation algorithms or risk-policy engines; architects scoping risk-modeling integration; and audit/compliance teams using Risk Score values in governance decisions.

What the Risk Score Is

The Risk Score is a real-valued scalar in the closed interval [0.0, 1.0] maintained across the invocation chain. At any boundary crossing, the score reflects the cumulative risk state of the flow up to that point — a synthesis of every risk-significant event that has occurred, weighted by the implementation's accumulation algorithm and bounded by the spec's clamping properties.

The score serves two complementary roles:

• As an observability primitive: auditors, dashboards, and post-hoc analysis can read the score at each boundary to characterize how risk evolved across the chain. By walking the SCT chain of an invocation and inspecting the score at each link, an investigator can see exactly where and why risk accumulated.

• As a control primitive: downstream consumers can refuse calls whose Risk Score has crossed a configured threshold, apply additional gating (re-authorization, human review, hold-and-investigate), or escalate to a higher-trust path. The same scalar serves both roles, which is the design point — observation and action are not separate concerns when the goal is governed delegation.

The Risk Score is the synthesis: dozens or hundreds of underlying signals collapsed into a single value that consumers can act on without re-doing the underlying analysis. The value of a synthesized scalar is not that it captures everything (the adjustments list, retained for audit, captures the full history) but that it gives every consumer a uniform basis for action.

The Adjustments Data Model

The Risk Score is built from a sequence of adjustments. Each adjustment is a structured event emitted by some component during the invocation lifecycle (an agent, a tool, the orchestrator,or application code via the Helper API). Adjustments are the atomic unit of risk recording. The data model is canonical across all SADAR-conformant implementations:

Field	Description
`adjustment_id`	Unique identifier for this adjustment. Enables deduplication, idempotency, and cross-reference from other audit artifacts (Telemetry Records, SCT chain link audit logs) back to the specific adjustment that drove a Risk Score change.
`delta`	The change applied to the cumulative score. Positive values increase risk; negative values decrease it, subject to floor clamping at 0.0 (see Accumulation). Magnitude is the implementer's domain; the spec constrains the resulting score, not the individual delta.
`ttl_seconds`	How long the adjustment remains in effect before aging out of contribution to the score. After the TTL elapses, the adjustment no longer affects the running score (though it remains in the audit record).
`timestamp`	When the adjustment was emitted. Basis for decay computation, TTL evaluation, and temporal correlation across audit artifacts. Cryptographic parity requirements apply: the timestamp here must be consistent with the timestamp on the corresponding event in spans, SCT chain links, and Telemetry Record.
`decay`	How the adjustment's contribution changes over time within its TTL — for example, linear decay, exponential decay, step (constant until TTL elapses, then zero). The decay specification is part of the adjustment so receivers can reproduce the score evolution without separate metadata.
`reason`	An IRI identifying the categorical reason for this adjustment. Reasons are drawn from the standard set defined in the spec (twelve starter reasons; see below) or from extension namespaces. Reason categorization enables accumulation algorithms to weight by category.
`context`	Structured domain-specific metadata accompanying the adjustment. Free-form within the schema's permitted shape; used by accumulation algorithms that consider context-dependent weighting and by auditors investigating specific events.

Adjustments are immutable once emitted: their fields cannot be rewritten by the entity that produced them or by any downstream component. This is the same architectural property as phase-immutability for Telemetry Records (see Telemetry Record) — once an event is recorded into the audit fabric, it cannot be retroactively edited. New information produces new adjustments; it does not amend old ones.

Reasons and the Reason IRI Namespace

Each adjustment carries a reason field identifying the categorical reason for the adjustment. Reasons are IRIs (Internationalized Resource Identifiers, RFC 3987), which provides extensibility, global uniqueness, and disambiguation across organizational namespaces. Implementations can register their own reasons in private namespaces while drawing on the standard set for cross-organizational interoperability.

The spec defines twelve starter reasons in a SADAR-managed IRI namespace. These cover the categorical territory commonly relevant to governed agentic flows: adversarial input handling, trust-model events, governance-driven elevations (such as human-confirmation triggers), authentication state changes, anomaly detection, and policy-related events. The full enumeration with normative IRIs lives in 8.Risk Score Specification; this page describes the structural pattern.

Reason categorization matters because accumulation algorithms typically weight by category. An adjustment categorized as a hard authentication failure is conventionally weighted more heavily than one categorized as an operational latency anomaly, even if the raw delta values are similar. By making categorization an explicit field rather than implicit in the delta, the Risk Score model lets organizations build accumulation algorithms that reflect their own risk-weighting frameworks while remaining cross-organizationally interoperable.

Informative Reference: NIST SP 800-30 Rev 1Appendix H

The twelve starter reasons are aligned with the threat-event taxonomy in NIST SP 800-30 Rev 1, Guide for Conducting Risk Assessments, Appendix H. The reference is informative rather than normative — implementations are not required to use NIST's framework, and the SADAR taxonomy may diverge as the spec evolves. The reference exists to ground the conceptual foundation in established risk-management practice and to give implementers a starting point for organization-specific extension reasons.

Implementations using other established risk frameworks — FAIR, ISO 27005, or organization-interna taxonomies — remain conformant. The taxonomy alignment is a convenience, not a constraint. What the spec mandates is the structural pattern: reasons are IRIs, the standard set is enumerated, extension namespaces are permitted, and accumulation algorithms may use reason as a categorization input.

Accumulation: From Adjustments to Score

How adjustments combine into the cumulative scalar is implementer-chosen, not specified. Different organizations have different risk models — a financial services firm may want strictly conservative accumulation; a research environment may tolerate more latitude with steeper decay; a healthcare deployment may weight authentication events much more heavily than operational ones. A spec-mandated single algorithm would impose one risk model on all implementers; leaving the choice to implementations preserves their domain expertise.

What the spec mandates regardless of accumulation algorithm:

• Range. The resulting score is in the closed interval [0.0, 1.0]. Out-of-range scores are non-conformant.

• Floor and ceiling clamping. Computations that would produce values outside [0.0, 1.0] are clamped to the nearest endpoint. See the next section.

• Cryptographic parity. The accumulated scalar attested in the SCT chain link claim at any boundary MUST agree with what would be computed from the baggage adjustments at that boundary using the implementation's algorithm. Tampering with either the adjustments list or the attested scalar produces a parity violation.

• Reproducibility within an implementation. Given the same adjustments list, the same algorithm, and the same evaluation point, any two evaluators produce the same scalar. This is what makes the score auditable.

Within those constraints, common accumulation patterns include simple summation with category weights, weighted averaging, soft-max combination, exponentially-weighted moving averages, and category-specific clamping. Implementations document their algorithm so consuming parties can interpret scores correctly.

Floor Clamping at 0.0 — “No Banking Goodness”

The floor at 0.0 is enforced by clamping. Any computation that would produce a negative score is clamped to 0.0. This is the property informally called “no banking goodness”:adjustments cannot accumulate negative state that a future risk event would have to overcome before producing an actionable signal.

Consider what would happen without floor clamping. A long stretch of low-risk operations with negative adjustments would push the score deeply negative. A subsequent risky event would emit a positive adjustment — but the score would still be negative, perhaps far below 0.0, and a control system thresholding on (say) Risk Score > 0.7 would not trigger. The risky event has occurred, but the accumulated “goodness debt” from the prior clean operation is masking it. Risk visibility is destroyed by the very mechanism intended to preserve favorable history.

Floor clamping prevents this. Each new risk event is evaluated against a baseline of 0.0, not against arbitrarily negative state. A positive adjustment after a long clean stretch produces a score that reflects that single event, not a number reduced by years of “nothing happened.” Risk signals are immediate, not delayed by accumulated favorable history.

The ceiling at 1.0 is symmetrical. Once the score reaches 1.0, additional risk-increasing adjustments do not push it higher — though they remain in the adjustments list for audit purposes. The score saturates at 1.0; further adjustments preserve their information in the audit trail without escalating the headline value beyond the bounded range.

Propagation

The Risk Score travels in two complementary forms during invocation execution:

• As an adjustments list in OTel baggage. The full l;ist of adjustments propagates in-flight along with the call, carried in OTel baggage under the key urn:sadar:baggage:v1:risk_adjustments.Each adjustment is the structured object described in the data model above. Receivers can iterate the list and recompute the score themselves using their own accumulation algorithm — preserving the option for receivers with different risk models or weighting choices.

• As an accumulated scalar in the SCT chain linkclaim. At each chain link issuance, the issuer computes the current Risk Score and attests to the resulting scalar as a claim in the new SCT chain link. The scalar is the issuer's authoritative computation at that boundary; downstream parties that trust the issuer's algorithm can use the scalar directly without iterating the adjustments list.

The two forms are bound by cryptographic parity (§5.1.17). The accumulated scalar attested in the SCT MUST agree with what the issuer's accumulation algorithm would produce from the baggage adjustments list at that boundary. Disagreement is a parity violation, reported as a structured error in the urn:sadar:error:v1 namespace.

Why two forms? The list lets receivers preserve their own algorithmic autonomy — a receiving organization may have its own risk model and may want to interpret raw events under its own weights. The scalar lets receivers that trust the issuer's algorithm avoid recomputation overhead — a reasonable choice for receivers with no independent risk-modeling stake. Cryptographic parity ties the two together so that receivers that recompute can detect tampering with the attested scalar, and receivers that consume the scalar can detect tampering with the adjustments list.

Risk Score vs Impact Score

Risk Score is sometimes confused with the related-but-distinct Impact Score, which is a separate Protocol NFR. The two concepts share a domain (risk modeling) but answer different questions and operate at different points in the invocation lifecycle.

Property	Risk Score	Impact Score
Nature	Dynamic, cumulative real-valued state	Static potential-severity property
Range	[0.0, 1.0]	Implementation-defined (typically scalar or category)
Where it lives	OTel baggage adjustments list + accumulated scalar in SCT chain link	Declared in the entity manifest as a Protocol NFR
When evaluated	Continuously, at every boundary crossing during the invocation chain	At discovery, as part of bilateral NFR matching
Question answered	"How risky has this flow been so far?"	"How bad could it get if this entity misbehaved?"
Updates over time?	Yes — adjustments accumulate, decay, age out	No — manifest republication required to change

The two concepts compose in governance decisions but do so at different points. Impact Score is evaluated at discovery: a requester comparing candidate targets considers the Impact Score of each as part of bilateral matching. Risk Score is evaluated at every boundary during execution: at each step, a consumer considers the running Risk Score in deciding whether to proceed, gate, or refuse.

A simple rule of thumb captures the relationship. Impact Score asks “how bad could this be?” — a property of the operation type and the entity performing it, set in advance. Risk Score asks “how risky has it been so far?” — a property of what has actually happened in this specific flow. A high-impact, low-risk-so-far operation is one to proceed with caution; a low-impact, high-risk-so-far operation may warrant investigation despite low potential damage; a high-impact, high-risk-so-far operation typically warrants strong gating or refusal.

Both inform governance, but neither replaces the other. The spec keeps them as distinct concepts so that operational policies can reason about each independently.

Boundaries — What's Not Here

This page covers Risk Score asan architectural concept. Several adjacent topics are intentionally out ofscope here and live in dedicated documents:

Topic	Where it lives
Canonical Risk Score field schemas	Wire form of adjustments, exact field types, validation rules, the full enumeration of the twelve starter reason IRIs, and the IRI namespace structure for extension reasons. See 8. Risk Score Specification.
Accumulation algorithm requirements	Spec-level requirements that any implementer's accumulation algorithm must satisfy: parity correctness, floor and ceiling clamping behavior, monotonicity properties at extremes. See 8. Risk Score Specification.
Impact Score Protocol NFR specification	The full schema for the Impact Score NFR, including how it is declared in manifests and how it participates in bilateral matching. See 3. NFR Schema and 2. Scope §5.1.18.
Cryptographic parity for Risk Score	The normative requirement that the accumulated scalar in the SCT chain link claim must agree with what would be computed from the baggage adjustments at the same boundary. See 2. Scope §5.1.17.
Risk Score thresholds and gating policy	How downstream consumers should interpret Risk Score values, how thresholds are set, and how gating decisions (refuse, hold, escalate, proceed) are governed. The spec defines the score; operational policy determines what to do with it.
Helper API contributions to Risk Score	How application code contributes `risk_adjustments` via the Helper API's `add_risk_adjustment` method. See Helper API (sibling page).

Where to Learn More

• Observability Overview — the cross-cutting summary of SADAR observability; how Risk Score composes with Telemetry Records, OTel baggage, and Repatriation as one of the four observability primitives.

• SADAR Context Token — the SCT chain link in which the accumulated Risk Score scalar is attested.

• Telemetry Record — the per-invocation audit artifact whose risk_score_state section records Risk Score evolution across the lifecycle.

• Helper API — the application-facing surface; add_risk_adjustment is how application code contributes adjustments to the Risk Score.

• Repatriation — the cross-trust-boundary mechanism; Risk Score values and adjustments are among the artifacts that travel via Repatriation.

• 2. Scope §5.1.18 — Risk Score as a first-class concept at the spec level.

• 2. Scope §5.1.17 — Cryptographic Parity; the requirement that ties the baggage adjustments list to the SCT chain link scalar.

• 3. NFR Schema — full normative content for Protocol NFRs including the Impact Score NFR.

• 8. Risk Score Specification — the full normative source: adjustments schema, the twelve starter reasons, IRI namespace structure, accumulation requirements.

• NIST SP 800-30 Rev 1, Appendix H — informative reference for risk-event taxonomy alignment; the conceptual grounding for the standard reason set.

‍