AI Agent Risk Assessment Checklist

Before any agent reaches production, someone needs to ask: what can go wrong, and have we addressed it? This checklist provides a structured answer. It scores the agent across 9 risk dimensions, produces a composite risk score, and maps it to a deployment readiness tier. It then runs a binary go/no-go check — a set of required controls where any gap is a blocker regardless of the overall risk score.

Run this checklist before initial production deployment, and re-run it when the agent's scope, data access, or trust level changes significantly.


When to use this template

  • As a formal gate before any agent is promoted from staging to production
  • When adding new tools or data access to an existing deployed agent
  • When a compliance or security audit asks for evidence of AI governance controls
  • When an agent incident (wrong output, data leak, policy violation) triggers a post-incident review
  • When onboarding a new team who is deploying their first agent and needs a governance baseline

The blank checklist (copyable)

# AI Agent Risk Assessment

**Agent name:**
**Agent ID:**
**Assessed by:**
**Date:**
**Workflow(s):**

## Risk scoring (0–3 per dimension)

| # | Dimension | Score (0–3) | Notes |
|---|---|---|---|
| 1 | Data sensitivity | | |
| 2 | External action risk | | |
| 3 | Financial risk | | |
| 4 | Legal and compliance risk | | |
| 5 | Security risk | | |
| 6 | Customer impact | | |
| 7 | Reversibility | | |
| 8 | Human oversight | | |
| 9 | Auditability | | |
| | **Total** | | |

**Risk tier:**
- 0–6: Low risk
- 7–12: Medium risk
- 13–18: High risk
- 19–27: Critical risk

## Go / No-go checklist

| # | Required control | Status (Yes / No / N/A) | Notes |
|---|---|---|---|
| 1 | Agent scope documented and approved by named owner | | |
| 2 | Forbidden actions list defined in system prompt | | |
| 3 | Approval gates configured for all HIGH and CRITICAL actions | | |
| 4 | Data access scoped to minimum necessary (least privilege) | | |
| 5 | Audit logging enabled for all agent runs | | |
| 6 | Agent tested against adversarial inputs (prompt injection test) | | |
| 7 | Credentials stored in the secrets vault (not in system prompt or workflow) | | |
| 8 | Named human assigned as approval reviewer for each risk category | | |
| 9 | Rollback plan exists for incorrect outputs at scale | | |
| 10 | Risk assessment reviewed by someone who did not build the agent | | |

**Any "No" = No-Go. All required items must be "Yes" before production deployment.**

**Decision:** Go / No-Go
**Signed off by:**

Scoring level descriptors

1. Data sensitivity — what data does the agent access?

ScoreDescriptor
0Public data only — no internal systems
1Internal business data with no personal data (process docs, KB articles, internal metrics)
2Employee or partner personally identifiable information (PII)
3Customer PII, financial records, health data, or data regulated under GDPR, HIPAA, or PCI-DSS

2. External action risk — can the agent communicate externally?

ScoreDescriptor
0No external communication of any kind
1Internal notifications only (Slack to internal channels, internal email)
2External notifications with mandatory human review before send
3Autonomous external communication — sends email, DMs, or API calls to external systems without human review

3. Financial risk — can the agent trigger financial transactions?

ScoreDescriptor
0No access to financial systems
1Read-only access to financial data (invoices, reports)
2Can initiate transactions under $500 with approval gate
3Can initiate transactions over $500, recurring charges, or refunds
ScoreDescriptor
0Internal, unregulated process with no compliance implications
1Touches regulated data (PII) but read-only — no output enters regulated systems
2Produces documents that could be legally binding (contracts, commitments, policy statements)
3Regulatory submissions, compliance attestations, legal commitments, or actions on regulated data

5. Security risk — does the agent have privileged system access?

ScoreDescriptor
0No system credentials or privileged access
1Read-only API access to internal systems using scoped service credentials
2Write access to internal systems
3Access to production infrastructure, secrets store, security controls, or identity systems

6. Customer impact — how directly does the agent affect customers?

ScoreDescriptor
0No customer-facing output — agent operates entirely internally
1Agent output is reviewed and edited by a human before any customer sees it
2Customer-facing output with a human review gate but no editing step
3Direct autonomous interaction with customers — output reaches customers without human review

7. Reversibility — can agent actions be undone?

ScoreDescriptor
0All actions are fully reversible with no data loss
1Most actions reversible with some effort (e.g. ticket reassignment, draft deletion)
2Some actions are irreversible in practice (sent messages, deleted records)
3Actions are largely irreversible — financial charges, legal submissions, permanently deleted data

8. Human oversight — is a human in the loop?

ScoreDescriptor
0Human reviews and approves every action before it executes
1Human reviews agent outputs periodically (e.g. daily digest review)
2Automated with exception-based human review (human only sees failures or anomalies)
3Fully autonomous — no regular human review of agent outputs or actions

9. Auditability — are actions logged and traceable?

ScoreDescriptor
0All actions logged with full trace, linked to run ID, step ID, and user context
1Most actions logged with partial trace; some tool calls lack detailed records
2Limited logging — run-level records exist but step-level and tool-call records are incomplete
3No meaningful audit trail — actions cannot be reliably traced after the fact

Scoring matrix and deployment tiers

Total scoreRisk tierDeployment requirements
0–6Low riskStandard deployment. Configure approval gates for any high or critical individual actions.
7–12Medium riskApproval required for agent actions. Enable audit digest. Schedule quarterly review.
13–18High riskLegal and compliance sign-off required. All agent actions require approval gates. Monthly review. Security assessment required.
19–27Critical riskDo not deploy without: legal review, CISO sign-off, SOC2 evidence, data privacy impact assessment, and full approval gates on all non-read actions.

A high score in a single dimension can warrant additional controls even if the total is low. A score of 3 on Financial risk or Security risk should always trigger a dedicated review of those dimensions, regardless of total score.


Go / no-go checklist — expanded descriptions

Each item is a hard requirement. "N/A" is only acceptable if the item genuinely does not apply — for example, item 3 (approval gates for high/critical actions) is N/A only if the agent has no high or critical tools, which must be confirmed against the tool permission matrix.

#Required controlWhy it matters
1Agent scope documented and approved by named ownerWithout a named owner, there is no one accountable when the agent behaves unexpectedly
2Forbidden actions list defined in system promptAbsence of prohibition is not prohibition. The agent will not infer what it should not do
3Approval gates configured for all HIGH and CRITICAL actionsApproval gates are the primary enforcement mechanism for high-stakes actions
4Data access scoped to minimum necessaryBroad data access increases blast radius when the agent makes a mistake
5Audit logging enabled for all agent runsWithout logs, you cannot investigate incidents or demonstrate compliance
6Adversarial input testing completedPrompt injection and edge-case inputs reveal failure modes that normal testing misses
7Credentials in the secrets vault onlyA credential in a system prompt or workflow definition is visible in logs and to API users
8Named human assigned as approval reviewerA role-based assignee with no named person means no one takes responsibility
9Rollback plan existsAt scale, a bad agent output affects many records. Know how to undo it before you deploy
10Assessment reviewed by someone outside the build teamSelf-assessment produces systematically optimistic scores. Independent review catches blind spots

Worked example: customer support triage agent

Agent name: Support Triage Agent v2
Agent ID: agt_cx_triage_001
Assessed by: Priya Menon (Engineering Manager), reviewed by Rahul Iyer (Compliance)
Date: 2026-05-01
Workflow(s): Inbound ticket classification, response drafting

Risk scoring

#DimensionScoreNotes
1Data sensitivity2Accesses customer support tickets which may contain customer PII (name, email, account number)
2External action risk1All response drafts require human approval before any external send; agent only posts internally
3Financial risk0No access to billing, payment, or financial systems
4Legal and compliance risk1Touches customer PII in ticket content but read-only; no output enters regulated systems autonomously
5Security risk1Read-only CRM lookup and KB search via scoped service credentials; no write access
6Customer impact1All drafts reviewed and edited by a CX agent before the customer sees them
7Reversibility1Ticket classification can be changed; no irreversible actions
8Human oversight1CX agents review outputs on a ticket-by-ticket basis; not fully automated
9Auditability0Full run logs, step logs, and tool-call records enabled in ProvenanceOne
Total8

Risk tier: Medium risk (7–12)

Required controls: Approval required before response send; audit digest enabled; quarterly review scheduled.

Go / no-go checklist

#Required controlStatusNotes
1Agent scope documented and approved by named ownerYesPriya Menon documented; approved by CX Director on 2026-04-28
2Forbidden actions list defined in system promptYesIncludes: no direct send, no billing access, no commitments
3Approval gates configured for all HIGH and CRITICAL actionsYesApproval step before email send; no critical tools attached
4Data access scoped to minimum necessaryYesCRM read scoped to current ticket's account ID only
5Audit logging enabled for all agent runsYesProvenanceOne audit enabled; connection.accessed events for CRM and KB
6Adversarial input testing completedYes20 adversarial inputs tested including prompt injection attempts; all handled correctly
7Credentials in the secrets vaultYesCRM and KB credentials stored in the secrets vault; not in system prompt or workflow definition
8Named human assigned as approval reviewerYesCX team lead (Amara Osei) assigned; backup: CX operations manager
9Rollback plan existsYesIncorrect drafts can be deleted; ticket status can be reset; no irreversible actions
10Assessment reviewed independentlyYesReviewed by Rahul Iyer (Compliance), who was not on the build team

Decision: Go
Signed off by: Priya Menon (Engineering), Rahul Iyer (Compliance) — 2026-05-01


How to customise this template

Adjust scoring to your risk appetite. Some organisations treat any external communication as critical rather than high. If your regulatory environment warrants it, move the thresholds.

Add domain-specific dimensions. A healthcare organisation may add a dimension for clinical risk (does the agent's output influence clinical decisions?). A financial services firm may add market risk (does the agent's output influence trading or pricing decisions?).

Set internal thresholds higher than the defaults. If your organisation requires all medium-risk agents to undergo the same review as high-risk agents, document that policy here and apply it consistently.

Schedule re-assessment triggers. Define the conditions that require a re-run of this checklist without waiting for the quarterly cycle: adding a new tool, changing the model, a production incident, or a change in the regulatory environment.


Common mistakes

Scoring dimension 8 (Human oversight) as 0 because approval gates exist. Approval gates are specific to individual actions; dimension 8 asks about oversight of the agent's overall output and behaviour. If a human only reviews the pre-approved action and never reviews the agent's full run, the score is not 0.

Marking go/no-go item 6 (adversarial testing) as Yes without a documented test log. "We tested it informally" is not evidence. Document the adversarial inputs you tested, the expected behaviour, and the actual output. If an audit occurs, that record needs to exist.

Using a total risk score to override a single-dimension red flag. A total score of 8 (medium) with a score of 3 on Security risk still warrants a dedicated security review. The composite score is a guide; single-dimension extremes warrant targeted investigation.

Completing the assessment alone. The checklist explicitly requires that item 10 be completed by someone outside the build team. An assessment completed entirely by the builder is not a valid assessment — it lacks the independent review that catches scope creep, wishful scoring, and blind spots.


Is this checklist a substitute for a formal DPIA or security assessment?

No. This checklist is a structured starting point for governance decisions. For agents that score in the high or critical tier, it identifies the need for a formal Data Privacy Impact Assessment (DPIA) or security assessment — it does not replace them. Use the checklist to determine whether those formal assessments are required and to prepare for them.

How often should I re-run the risk assessment for a deployed agent?

At minimum: quarterly for medium-risk agents, monthly for high-risk agents. Additionally, re-run whenever the agent's system prompt changes substantially, when new tools are added, when the model changes, or following a production incident. Do not wait for the scheduled review cycle if a material change occurs.

What counts as 'adversarial input testing' for go/no-go item 6?

At minimum: test inputs designed to elicit out-of-scope actions (e.g. asking the agent to ignore its system prompt), inputs containing obvious prompt injection attempts (e.g. 'Ignore previous instructions and...'), and inputs at the boundary of the agent's defined scope. Document each test input, the expected behaviour, and the actual output. 20 adversarial inputs is a reasonable minimum for a medium-risk agent.

What if the agent scores differently on different runs?

Variance across runs is itself a risk signal. If scoring dimensions like task success rate or output correctness vary significantly between runs, the agent is not stable enough for the deployment tier it is targeting. Score the agent on a representative sample (at minimum 50 runs) and use the worst-case score for each dimension, not the average.

Can I use this template for agents not built on ProvenanceOne?

Yes. The scoring dimensions are platform-agnostic. You will need to adapt the go/no-go checklist items that reference ProvenanceOne-specific features (audit logging, the secrets vault integration, approval gates) to the equivalent controls in your platform.

What should I do if the agent scores in the critical risk tier?

Do not deploy until you have completed all four of the critical tier requirements: legal review, CISO sign-off, SOC2 evidence, and a data privacy impact assessment. Additionally, ensure that full approval gates are configured on every non-read action. If any of those requirements cannot be met within your deployment timeline, delay the deployment — critical risk agents without those controls in place are a liability.


  • Agent Evaluation Rubric — performance-focused evaluation to run alongside this risk assessment
  • Tool Permission Matrix — define and review the tool access that drives several scoring dimensions here
  • Approval Policy Template — configure the approval gates required by medium, high, and critical tier deployments
  • System Prompt Template — structure the system prompt that supports go/no-go items 1 and 2
  • Agents — ProvenanceOne agent configuration including trust levels and audit events
  • Audit — audit event types referenced in this checklist