Workflow Runs

A run is a single execution of a published workflow. When a trigger fires — a webhook arrives, a schedule ticks, a user clicks Run — ProvenanceOne creates a run record, snapshots the workflow version that is active at that moment, and begins executing the steps in the DAG. The run is the live instance; the workflow is the blueprint it executes against.

Runs are the primary debugging surface in ProvenanceOne. Every run exposes per-step inputs, outputs, logs, tool calls, and error detail so you can understand exactly what happened, where it went wrong, and what to fix.

When to use this page

Refer to this page when you need to:

Understand the status of a run and what it means
Debug a failed or stuck run
Cancel a run that is no longer needed
Replay a run after fixing an underlying issue
Understand token and cost accounting

Key concepts

Run vs. workflow — the workflow defines what should happen; the run records what did happen. A workflow can have many runs. Each run is immutable once it ends.

Run ID — every run has a unique identifier prefixed with run_ (e.g. run_01abc...). Use this ID when calling the API or when searching logs.

Workflow version — each run captures the workflowVersion active when the run started. If you publish a new version while a run is in progress, that run continues on the old version.

Step ID — step identifiers are zero-padded and include the step kind: 0001_st_trigger, 0002_st_agent, etc. The zero-padding ensures steps sort in execution order.

Run statuses

Status	Meaning
`running`	The run is actively executing. At least one step has started and has not yet completed.
`succeeded`	All steps completed without error. The run is finished.
`failed`	One or more steps encountered an unrecoverable error, or a step's error was not handled by a downstream branch.
`approval`	A step of kind `approval` is waiting for a human decision. The run is paused. Execution resumes when the approval is actioned.
`canceled`	The run was stopped by a user or by a `DELETE /runs/{runId}` or `POST /runs/{runId}/cancel` call before it completed.

How it works

Run creation

A run is created by:

A trigger event matching a deployed workflow
A POST /runs API call referencing the workflow ID
The Run button in the UI (for manual trigger workflows)

On creation, ProvenanceOne records workflowId, workflowVersion, trigger, actor, environment, startedAt, and sets status to running.

Step execution

Each step in the DAG executes in the order defined by the workflow edges. Parallel branches execute concurrently. For each step, ProvenanceOne records:

stepId — zero-padded identifier
name and kind
status — pending, running, succeeded, failed, approval, or skipped
startedAt, endedAt, durationMs
inputs — the data passed into the step
outputs — the data the step produced
logs — structured log messages emitted during execution
toolCalls — for agent steps, the list of tool calls the agent made
error — if the step failed, the structured error record

Tool calls

Agent steps emit a toolCalls array. Each tool call records:

Field	Description
`name`	The tool name the agent invoked
`args`	The arguments the agent passed to the tool
`result`	The result returned to the agent
`durationMs`	How long the tool call took
`ok`	Whether the call succeeded
`error`	Error detail if `ok` is false

Step errors

When a step fails, the error field is populated with a structured StepError:

Field	Description
`code`	Machine-readable error code
`message`	Human-readable description of the error
`probableCause`	The platform's best assessment of what caused the error
`suggestedFix`	A suggested remediation action
`retryable`	Whether retrying the step (or replaying the run) is likely to succeed

If retryable is true, the error is likely transient (network timeout, rate limit, temporary service unavailability). If false, the error requires a configuration or logic change before retrying will help.

Step logs

Each step emits structured log messages accessible in the run debugger:

Field	Description
`level`	`info` \| `warn` \| `error` \| `debug`
`at`	ISO timestamp of the log message
`message`	The log message text

Run fields reference

Field	Type	Description
`runId`	string	Unique run identifier (`run_*`)
`workflowId`	string	ID of the workflow that generated this run
`workflowName`	string	Display name of the workflow at run time
`workflowVersion`	string	Version active when the run started (e.g. `v3`)
`status`	enum	`running` \| `succeeded` \| `failed` \| `approval` \| `canceled`
`trigger`	enum	How the run was initiated
`actor`	string	User or system that triggered the run
`environment`	enum	`production` \| `staging` \| `development`
`stepsTotal`	integer	Total number of steps in the workflow version
`stepsComplete`	integer	Number of steps that have finished (any terminal status)
`tokensUsed`	integer	Total LLM tokens consumed across all agent steps in this run
`costUsd`	number	Estimated cost in USD for all LLM calls in this run
`startedAt`	timestamp	When the run started
`endedAt`	timestamp	When the run ended (null if still running)
`durationMs`	integer	Total elapsed time in milliseconds

Canceling and replaying runs

Cancel

Send POST /runs/{runId}/cancel or click Cancel in the run detail view. The run transitions to canceled. Steps currently executing may take a short time to terminate. Steps that have not yet started are marked skipped.

Warning: Canceling a run does not undo actions already taken by completed steps. If a step has already created a record in an external system, that action is not reversed.

Replay

Send POST /runs/{runId}/replay or click Replay in the run detail view. Replay creates a new run using the same input payload and the currently deployed workflow version. It does not re-use the run ID of the original run.

Note: If you replay a failed run after publishing a new workflow version, the replay will use the new version, not the version the original run used. Review the diff between versions before replaying.

Configuration options

Run behavior is primarily determined by the workflow definition. The following fields are set at run creation and are immutable:

Field	Type	Required	Default	Description
`workflowId`	string	Yes	—	The workflow to execute
`environment`	enum	Yes	—	Environment context for the run
Trigger payload	object	Varies	`{}`	Input data for the run, passed to the trigger step

Examples

Inspecting what an agent did

Open a run in the debugger. Navigate to the agent step. The toolCalls panel lists every tool the agent invoked during that step, including the arguments it passed and the result it received. Cross-reference with the outputs panel to see what the agent ultimately returned to the next step.

Diagnosing a retryable error

A run failed on a skill step with error code ERR_TIMEOUT. The StepError shows retryable: true and suggestedFix: "The upstream service may be temporarily unavailable. Retry in a few minutes." Wait for the service to recover, then click Replay.

Tracking cost across runs

The costUsd field on each run records the estimated cost of all LLM calls in that run. Use GET /runs with date filters to sum costs across a time window and attribute spend to specific workflows.

Common mistakes

Replaying a run without checking if the underlying issue is fixed. If the error is not retryable and you have not changed the workflow or the external service, replay will fail again.
Canceling a run that is paused in approval. Canceling is final. If you want the run to continue, action the approval instead.
Assuming costUsd is the invoice amount. costUsd is an estimate based on token counts and list pricing at the time of the run. Actual billing depends on your plan and any negotiated rates.
Not checking stepsComplete vs. stepsTotal for a stuck run. If a run has been in running status for longer than expected, the ratio of stepsComplete to stepsTotal shows you how far it has progressed.

Troubleshooting

Run stuck in approval — an approval step is waiting for a human decision. Navigate to the Approvals page, find the pending approval linked to this run ID, and action it. If the SLA has already breached, the platform has emitted an approval.sla_breach event and notified the assignees.

Run stuck in running for an unusually long time — open the run debugger and find the step in running status. Check the step logs for timeout messages. If the step is an agent step, it may be waiting for a tool call to return. Check whether the downstream service (a skill or MCP server) is healthy.

Run failed with retryable: true — the error is likely transient. Check the suggestedFix field for guidance, wait for the external service to recover if applicable, and replay the run.

Run failed with retryable: false — a configuration, logic, or data error requires attention. Read the probableCause and suggestedFix fields, correct the workflow or input, and start a new run.

Security and permissions

editor and admin can cancel, replay, and view all run details including inputs, outputs, and logs.
viewer can view run status and high-level metrics but can view step-level detail as well (read-only).
Run data including inputs, outputs, and logs may contain sensitive information. Ensure your workspace members have appropriate roles.
Audit events run.started, run.completed, run.failed, run.canceled, and run.deleted are emitted for each lifecycle transition.

FAQ

Can I replay a failed run?▾

Yes. Use POST /runs/{runId}/replay or the Replay button in the run detail view. Replay creates a new run with the same input payload against the currently deployed workflow version. If you have published a new version since the original run, the replay uses the new version.

What does costUsd measure?▾

costUsd is the estimated cost in US dollars for all LLM API calls made by agent steps in the run. It is calculated from token counts and list pricing at the time of execution. It is an estimate — actual billing depends on your plan.

Why is my run stuck in approval?▾

An approval step in the workflow is waiting for a human decision. Go to the Approvals page, find the pending approval associated with your run ID, and either approve or reject it. If the SLA timer has expired, the platform has already emitted an approval.sla_breach event and notified the assignees.

How do I see what an agent did during a run?▾

Open the run in the debugger and navigate to the agent step. The toolCalls panel shows every tool the agent invoked, the arguments it passed, and the result it received. The outputs panel shows what the agent returned to the workflow.

What is the difference between cancel and delete?▾

POST /runs/{runId}/cancel stops a run that is in progress and marks it canceled. DELETE /runs/{runId} removes the run record entirely. Canceling is reversible in the sense that the run record is retained; deletion is permanent.