Workflow Runs
A run is a single execution of a published workflow. When a trigger fires — a webhook arrives, a schedule ticks, a user clicks Run — ProvenanceOne creates a run record, snapshots the workflow version that is active at that moment, and begins executing the steps in the DAG. The run is the live instance; the workflow is the blueprint it executes against.
Runs are the primary debugging surface in ProvenanceOne. Every run exposes per-step inputs, outputs, logs, tool calls, and error detail so you can understand exactly what happened, where it went wrong, and what to fix.
When to use this page
Refer to this page when you need to:
- Understand the status of a run and what it means
- Debug a failed or stuck run
- Cancel a run that is no longer needed
- Replay a run after fixing an underlying issue
- Understand token and cost accounting
Key concepts
Run vs. workflow — the workflow defines what should happen; the run records what did happen. A workflow can have many runs. Each run is immutable once it ends.
Run ID — every run has a unique identifier prefixed with run_ (e.g. run_01abc...). Use this ID when calling the API or when searching logs.
Workflow version — each run captures the workflowVersion active when the run started. If you publish a new version while a run is in progress, that run continues on the old version.
Step ID — step identifiers are zero-padded and include the step kind: 0001_st_trigger, 0002_st_agent, etc. The zero-padding ensures steps sort in execution order.
Run statuses
| Status | Meaning |
|---|---|
running | The run is actively executing. At least one step has started and has not yet completed. |
succeeded | All steps completed without error. The run is finished. |
failed | One or more steps encountered an unrecoverable error, or a step's error was not handled by a downstream branch. |
approval | A step of kind approval is waiting for a human decision. The run is paused. Execution resumes when the approval is actioned. |
canceled | The run was stopped by a user or by a DELETE /runs/{runId} or POST /runs/{runId}/cancel call before it completed. |
How it works
Run creation
A run is created by:
- A trigger event matching a
deployedworkflow - A
POST /runsAPI call referencing the workflow ID - The Run button in the UI (for
manualtrigger workflows)
On creation, ProvenanceOne records workflowId, workflowVersion, trigger, actor, environment, startedAt, and sets status to running.
Step execution
Each step in the DAG executes in the order defined by the workflow edges. Parallel branches execute concurrently. For each step, ProvenanceOne records:
stepId— zero-padded identifiernameandkindstatus—pending,running,succeeded,failed,approval, orskippedstartedAt,endedAt,durationMsinputs— the data passed into the stepoutputs— the data the step producedlogs— structured log messages emitted during executiontoolCalls— for agent steps, the list of tool calls the agent madeerror— if the step failed, the structured error record
Tool calls
Agent steps emit a toolCalls array. Each tool call records:
| Field | Description |
|---|---|
name | The tool name the agent invoked |
args | The arguments the agent passed to the tool |
result | The result returned to the agent |
durationMs | How long the tool call took |
ok | Whether the call succeeded |
error | Error detail if ok is false |
Step errors
When a step fails, the error field is populated with a structured StepError:
| Field | Description |
|---|---|
code | Machine-readable error code |
message | Human-readable description of the error |
probableCause | The platform's best assessment of what caused the error |
suggestedFix | A suggested remediation action |
retryable | Whether retrying the step (or replaying the run) is likely to succeed |
If retryable is true, the error is likely transient (network timeout, rate limit, temporary service unavailability). If false, the error requires a configuration or logic change before retrying will help.
Step logs
Each step emits structured log messages accessible in the run debugger:
| Field | Description |
|---|---|
level | info | warn | error | debug |
at | ISO timestamp of the log message |
message | The log message text |
Run fields reference
| Field | Type | Description |
|---|---|---|
runId | string | Unique run identifier (run_*) |
workflowId | string | ID of the workflow that generated this run |
workflowName | string | Display name of the workflow at run time |
workflowVersion | string | Version active when the run started (e.g. v3) |
status | enum | running | succeeded | failed | approval | canceled |
trigger | enum | How the run was initiated |
actor | string | User or system that triggered the run |
environment | enum | production | staging | development |
stepsTotal | integer | Total number of steps in the workflow version |
stepsComplete | integer | Number of steps that have finished (any terminal status) |
tokensUsed | integer | Total LLM tokens consumed across all agent steps in this run |
costUsd | number | Estimated cost in USD for all LLM calls in this run |
startedAt | timestamp | When the run started |
endedAt | timestamp | When the run ended (null if still running) |
durationMs | integer | Total elapsed time in milliseconds |
Canceling and replaying runs
Cancel
Send POST /runs/{runId}/cancel or click Cancel in the run detail view. The run transitions to canceled. Steps currently executing may take a short time to terminate. Steps that have not yet started are marked skipped.
Warning: Canceling a run does not undo actions already taken by completed steps. If a step has already created a record in an external system, that action is not reversed.
Replay
Send POST /runs/{runId}/replay or click Replay in the run detail view. Replay creates a new run using the same input payload and the currently deployed workflow version. It does not re-use the run ID of the original run.
Note: If you replay a failed run after publishing a new workflow version, the replay will use the new version, not the version the original run used. Review the diff between versions before replaying.
Configuration options
Run behavior is primarily determined by the workflow definition. The following fields are set at run creation and are immutable:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
workflowId | string | Yes | — | The workflow to execute |
environment | enum | Yes | — | Environment context for the run |
| Trigger payload | object | Varies | {} | Input data for the run, passed to the trigger step |
Examples
Inspecting what an agent did
Open a run in the debugger. Navigate to the agent step. The toolCalls panel lists every tool the agent invoked during that step, including the arguments it passed and the result it received. Cross-reference with the outputs panel to see what the agent ultimately returned to the next step.
Diagnosing a retryable error
A run failed on a skill step with error code ERR_TIMEOUT. The StepError shows retryable: true and suggestedFix: "The upstream service may be temporarily unavailable. Retry in a few minutes." Wait for the service to recover, then click Replay.
Tracking cost across runs
The costUsd field on each run records the estimated cost of all LLM calls in that run. Use GET /runs with date filters to sum costs across a time window and attribute spend to specific workflows.
Common mistakes
- Replaying a run without checking if the underlying issue is fixed. If the error is not
retryableand you have not changed the workflow or the external service, replay will fail again. - Canceling a run that is paused in
approval. Canceling is final. If you want the run to continue, action the approval instead. - Assuming
costUsdis the invoice amount.costUsdis an estimate based on token counts and list pricing at the time of the run. Actual billing depends on your plan and any negotiated rates. - Not checking
stepsCompletevs.stepsTotalfor a stuck run. If a run has been inrunningstatus for longer than expected, the ratio ofstepsCompletetostepsTotalshows you how far it has progressed.
Troubleshooting
Run stuck in approval — an approval step is waiting for a human decision. Navigate to the Approvals page, find the pending approval linked to this run ID, and action it. If the SLA has already breached, the platform has emitted an approval.sla_breach event and notified the assignees.
Run stuck in running for an unusually long time — open the run debugger and find the step in running status. Check the step logs for timeout messages. If the step is an agent step, it may be waiting for a tool call to return. Check whether the downstream service (a skill or MCP server) is healthy.
Run failed with retryable: true — the error is likely transient. Check the suggestedFix field for guidance, wait for the external service to recover if applicable, and replay the run.
Run failed with retryable: false — a configuration, logic, or data error requires attention. Read the probableCause and suggestedFix fields, correct the workflow or input, and start a new run.
Security and permissions
editorandadmincan cancel, replay, and view all run details including inputs, outputs, and logs.viewercan view run status and high-level metrics but can view step-level detail as well (read-only).- Run data including inputs, outputs, and logs may contain sensitive information. Ensure your workspace members have appropriate roles.
- Audit events
run.started,run.completed,run.failed,run.canceled, andrun.deletedare emitted for each lifecycle transition.
Related pages
FAQ
Can I replay a failed run?▾
Yes. Use POST /runs/{runId}/replay or the Replay button in the run detail view. Replay creates a new run with the same input payload against the currently deployed workflow version. If you have published a new version since the original run, the replay uses the new version.
What does costUsd measure?▾
costUsd is the estimated cost in US dollars for all LLM API calls made by agent steps in the run. It is calculated from token counts and list pricing at the time of execution. It is an estimate — actual billing depends on your plan.
Why is my run stuck in approval?▾
An approval step in the workflow is waiting for a human decision. Go to the Approvals page, find the pending approval associated with your run ID, and either approve or reject it. If the SLA timer has expired, the platform has already emitted an approval.sla_breach event and notified the assignees.
How do I see what an agent did during a run?▾
Open the run in the debugger and navigate to the agent step. The toolCalls panel shows every tool the agent invoked, the arguments it passed, and the result it received. The outputs panel shows what the agent returned to the workflow.
What is the difference between cancel and delete?▾
POST /runs/{runId}/cancel stops a run that is in progress and marks it canceled. DELETE /runs/{runId} removes the run record entirely. Canceling is reversible in the sense that the run record is retained; deletion is permanent.