Forced dry-run / simulate-first for Agents | Policy-as-Code Control

Forced dry-run (simulate-first) means an agent must produce a preview of intended tool actions and outputs before it is allowed to execute any irreversible steps. In Claw EA running OpenClaw agents, you implement this by enforcing a Work Policy Contract (WPC) that initially permits only read-only and planning tools, then requires an explicit policy change before write, exec, or external side effects are enabled.

This has to live in a permissioned execution layer (policy-as-code), not in prompts, because prompts are not a security boundary. Tool invocation, sandbox mode, and model egress must be machine-enforced so prompt injection or mis-specification cannot silently escalate to real changes.

Step-by-step runbook

Define two WPC states: DRY_RUN (read-only) and EXECUTE (write enabled). Keep DRY_RUN as the default for new jobs and new agents.

In DRY_RUN, deny tools that write files, run shell commands, apply patches, send messages, or call external change APIs. Allow tools that only read or inspect, plus a “plan” output requirement.
Publish the WPC to the WPC registry (served by clawcontrols) and use its hash as the stable identifier. Distribute only the hash to runtime components so agents cannot swap policies by name.

Configure your OpenClaw baseline runtime with a restrictive tool profile and sandboxing so the local boundary matches DRY_RUN intent (for example, Docker sandbox for tools and a minimal tool allowlist).
Issue a CST (scoped token) from clawscope for each job. Use CST scope hash and enable optional policy hash pinning so the token is only valid for the intended WPC hash.

Bind the CST to the job to prevent replay across runs (marketplace anti-replay binding). Treat “job” as the unit of audit and the unit of rollback.
Route model traffic through clawproxy so you get Gateway receipts for each model call. If you use OpenRouter via fal, keep it routed through clawproxy so receipts remain consistent across providers.

Fail closed if receipts are missing for calls that could influence tool selection or output. This keeps “simulate-first” verifiable instead of a best-effort convention.
Run the agent in DRY_RUN and require a structured preview output: intended tool calls, target paths, diffs, and any external identifiers (ticket id, repo, tenant). The preview must be specific enough that a reviewer can compare “preview” to “execution.”

Store the preview as an artifact attached to the job so it can be checked later against evidence.
Promote to EXECUTE by swapping to a new WPC hash (or a controlled WPC revision) that enables only the specific write tools needed. Re-issue a CST pinned to the EXECUTE WPC hash, and start a new job run for the execution phase.

Keep the execution phase narrow: shortest TTL, fewest tools, and sandbox still on. If something goes wrong, rollback is “disable execute policy and revoke CST,” not “ask the agent to stop.”

Threat model

Threat	What happens	Control (simulate-first)
Prompt injection via retrieved content	The agent is tricked into running a destructive command or making an external change while “explaining” it as harmless.	DRY_RUN WPC denies exec/write tools; OpenClaw tool policy and sandboxing enforce the boundary even if the model asks for escalation.
Unreviewed file edits	The agent edits configs, infra code, or policy files and causes an outage or opens access.	DRY_RUN requires a diff preview but blocks apply/patch tools; EXECUTE phase only enables write tools after explicit WPC promotion.
Silent external side effects	The agent posts messages, triggers deployments, or updates records without a clear change request trail.	DRY_RUN denies outbound “send” tools; execution requires WPC change plus a job-scoped CST so actions are attributable to a specific approved run.
Policy bypass by prompt-only “rules”	The model follows a new instruction that conflicts with enterprise policy, and nothing stops tool calls.	WPC plus CST policy hash pinning turns “rules” into enforced permissions; prompts become guidance, not authorization.
Disputed audit trail	After an incident, teams cannot prove what the model saw or which model calls drove decisions.	Gateway receipts (from clawproxy) plus a proof bundle tie model calls to the job, token scope, and WPC hash for verification.

OpenClaw already treats tool policy and sandboxing as the local safety boundary. Forced dry-run composes with that boundary by making “write and exec” an explicit permission transition instead of a conversational suggestion.

Policy-as-code example

This is a JSON-like sketch of a WPC that enforces dry-run first. The key idea is that the DRY_RUN policy denies side-effect tools and requires a preview artifact, while the EXECUTE policy is a separate hash with narrowly enabled tools.

{
  "wpc_version": "v1",
  "policy_name": "forced-dry-run-simulate-first",
  "modes": {
    "DRY_RUN": {
      "openclaw": {
        "sandbox": { "mode": "all", "workspaceAccess": "ro" },
        "tools": {
          "allow": ["read_file", "list_dir", "search", "http_get_metadata"],
          "deny": ["exec", "write_file", "apply_patch", "send_message", "http_post", "http_put", "http_delete"]
        }
      },
      "requirements": {
        "preview": {
          "must_include": ["tool_plan", "targets", "expected_outputs", "risk_notes"],
          "format": "structured"
        }
      }
    },
    "EXECUTE": {
      "openclaw": {
        "sandbox": { "mode": "all", "workspaceAccess": "rw" },
        "tools": {
          "allow": ["read_file", "apply_patch"],
          "deny": ["exec", "send_message", "http_delete"]
        }
      }
    }
  }
}

Operationally, you do not “toggle a flag” inside the same trust context. You run DRY_RUN under one WPC hash and CST, then start a new execution run under the EXECUTE WPC hash and a new CST pinned to that policy.

What proof do you get?

Each run produces Gateway receipts for model calls emitted by clawproxy. These receipts let you verify which model was called (as seen by the gateway), with what request metadata, and under which job context.

Claw EA then packages receipts, the WPC hash used for the run, and CST scope hash details into a proof bundle. You can store and view the resulting artifact in Trust Pulse, so audit teams can inspect the dry-run preview alongside the execution-phase evidence.

Because the CST can be job-scoped and optionally pinned to a specific WPC hash, you can prove that “execution permissions” were not available during dry-run. This is the core property auditors usually want: the agent could not have performed the side effect until the policy transition occurred.

Rollback posture

Action	Safe rollback	Evidence you can point to
Stop further executions immediately	Revoke the job CST in clawscope and require new issuance for any follow-on run.	Token revocation record plus the last Gateway receipts timestamp in the proof bundle.
Prevent write tools from being used again	Move back to the DRY_RUN WPC hash for subsequent jobs; keep EXECUTE policy unreferenced or tightly controlled.	WPC hashes in proof bundles show which runs had write permission and which did not.
Investigate whether execution matched the preview	Compare dry-run preview artifact to tool outputs recorded by your harness and correlate with Gateway receipts for the decision-making calls.	Proof bundle (receipts plus metadata) plus stored preview artifact in Trust Pulse.
Reduce future blast radius	Narrow the EXECUTE WPC tool allowlist, keep sandbox mode “all,” and shorten CST TTL for execute runs.	Updated WPC hash and new proof bundles showing tighter permissions over time.

If you need network egress allowlists enforced outside clawproxy, that can be implemented as an enterprise buildout. Treat it as a separate control, because forced dry-run is about permission transitions and verifiable intent, not full network containment.

FAQ

How is “dry-run” different from just asking the agent to explain its plan?

A plan in chat is not enforceable. Forced dry-run makes “no side effects” a policy decision enforced by WPC, CST restrictions, and OpenClaw tool policy, so the agent cannot execute even if prompted to.

Do I need two separate runs for dry-run and execution?

For high assurance, yes. Use one job run with a DRY_RUN WPC and a job-scoped CST, then start a second run with an EXECUTE WPC hash and a new CST pinned to that policy.

What prevents the agent from reusing an execution token later?

Use marketplace anti-replay binding with job-scoped CST binding so a CST cannot be replayed across jobs. If you also pin the CST to a WPC hash, the token is not valid under a different policy.

How does this work with OpenClaw’s sandbox and tool policy?

OpenClaw provides the local enforcement point: sandbox decides where tools run and tool policy decides which tools are callable. Forced dry-run supplies the portable, signed WPC and token binding so the same intent is enforced consistently across environments.

What do auditors actually get at the end of a run?

You get Gateway receipts for model calls, plus a proof bundle that ties receipts to the job context, WPC hash, and CST scope hash. Store that in Trust Pulse so reviewers can open the artifact and verify the run’s permission posture.

Forced dry-run / simulate-first for Agents | Policy-as-Code Control

Step-by-step runbook

Threat model

Policy-as-code example

What proof do you get?

Rollback posture

FAQ

How is “dry-run” different from just asking the agent to explain its plan?

Do I need two separate runs for dry-run and execution?

What prevents the agent from reusing an execution token later?

How does this work with OpenClaw’s sandbox and tool policy?

What do auditors actually get at the end of a run?

Sources

See how this applies to your environment

Forced dry-run / simulate-first for Agents | Policy-as-Code Control

Step-by-step runbook#

Threat model#

Policy-as-code example#

What proof do you get?#

Rollback posture#

FAQ#

How is “dry-run” different from just asking the agent to explain its plan?#

Do I need two separate runs for dry-run and execution?#

What prevents the agent from reusing an execution token later?#

How does this work with OpenClaw’s sandbox and tool policy?#

What do auditors actually get at the end of a run?#

Sources#

See how this applies to your environment

Related

Step-by-step runbook

Threat model

Policy-as-code example

What proof do you get?

Rollback posture

FAQ

How is “dry-run” different from just asking the agent to explain its plan?

Do I need two separate runs for dry-run and execution?

What prevents the agent from reusing an execution token later?

How does this work with OpenClaw’s sandbox and tool policy?

What do auditors actually get at the end of a run?

Sources