Step-up approvals (human-in-the-loop) for Agents | Policy-as-Code Control

Step-up approvals (human-in-the-loop) are a policy-as-code gate that pauses an OpenClaw-run agent right before a high-risk tool action and requires an explicit human “approve/deny” decision. The approval is enforced at the execution layer, so the agent cannot bypass it with prompting, tool rephrasing, or multi-step indirection.

In Claw EA, the gate is expressed as a WPC and bound to the run using a CST. When the agent calls models through clawproxy, gateway receipts and a proof bundle capture what was requested, what was approved, and what was actually executed.

Step-by-step runbook

This runbook assumes OpenClaw is your baseline agent runtime and you want approvals on specific tool classes (payments, identity changes, production deploys, outbound email, destructive file actions). You can implement the approver UI via your existing ticketing or chat workflow via official API, via MCP server, or via enterprise buildout.

Pick the “step-up” boundary. Define which tool calls require approval (for example: exec with write access, outbound network posts, “send email”, “create user”, “delete”, “refund”). Keep the rule tool-centric, not prompt-centric, because tools are the point where impact happens.
Write the approval gate into a WPC. Store the WPC in the WPC registry so it is signed and hash-addressed. Treat the WPC hash as the immutable control reference you can point audits at.
Issue a CST that pins the policy. Use a CST (issued by clawscope) with scope hash and optional policy hash pinning so the run is cryptographically bound to the WPC. If the policy changes later, old runs still verify against the pinned hash.
Enforce the gate inside the tool boundary. In OpenClaw, combine tool policy and sandboxing: restrict which tools exist and where they run, then add a step-up hook for the “dangerous” tools. Avoid “prompt-only approvals” (for example: “Ask the user before deleting”) because an injected prompt can suppress or simulate consent.
Route model calls through clawproxy. When the agent uses OpenRouter via fal routed through clawproxy, you get gateway receipts for model calls. These receipts let you prove what the model saw and returned at each decision point that led up to an approval request.
Record approval decisions as run artifacts. Store the approver identity, timestamp, decision, and the exact “approval payload” (the structured request describing what will happen if approved). Include a hash of the payload in the run log so approvals are bound to specific actions and cannot be replayed for different actions.
Bundle and verify after the run. Emit a proof bundle that includes the WPC reference, CST metadata, gateway receipts, and the approval artifacts. Optionally publish the resulting audit view as a Trust Pulse for reviewers.

Threat model

Step-up approvals are not about making agents “polite”. They are about preventing a model, a plugin, or an operator mistake from turning into an irreversible side effect without human intent.

Threat	What happens	Control
Prompt injection triggers a destructive tool action	The agent is convinced to run delete, rotate secrets, exfiltrate data, or message external parties.	Execution-layer approval gate on the tool call, not on the prompt. OpenClaw tool policy restricts tool availability; sandboxing reduces blast radius.
Tool indirection and multi-step “approval bypass”	The agent avoids the “delete” tool by using exec to run a deletion command, or uses a different API path.	Gate by capability class (exec, write, network post, identity mutation) and enforce at the tool boundary. Prefer deny-by-default tool policy plus explicit allowlist.
Replay of an old approval on a new action	An attacker reuses a prior “approved” decision blob to authorize a different action.	Bind approvals to a hashed approval payload and job-scoped CST binding (marketplace anti-replay binding). Store the payload hash in the proof bundle.
Human approves without seeing the real delta	Approver clicks “yes” on a vague request, missing sensitive scope such as production target or recipient list.	Require structured approval payloads: exact command, exact recipients, target resource identifiers, and a computed diff preview when possible. Reject approvals that lack required fields.
Dispute after the fact	Team cannot prove what the model was asked, what it responded, and why an approval was requested.	Gateway receipts from clawproxy for model calls plus a proof bundle that ties receipts, WPC hash, and approval artifacts together.

Policy-as-code example

This is a compact JSON-like sketch of an approval gate WPC. It shows a deny-by-default approach where certain tool classes require step-up approval, and approvals must reference a specific payload hash.

{
  "wpc_version": "v1",
  "policy_name": "approval-gates-prod",
  "tool_policy": {
    "default": "deny",
    "allow": [
      "read", "search", "safe_http_get"
    ],
    "step_up_approval_required": [
      { "tool": "exec", "constraints": { "workspaceAccess": "rw" } },
      { "tool": "write" },
      { "tool": "http_post" },
      { "tool": "send_email" },
      { "tool": "identity_mutation" }
    ]
  },
  "approval_gate": {
    "required_fields": ["action_type", "target", "parameters", "payload_hash"],
    "approval_ttl_seconds": 900,
    "bind_to_job": true
  }
}

In practice, you store the signed artifact as a WPC and distribute only the hash reference. The CST can optionally pin the WPC hash so the runtime must fetch and verify the exact policy artifact before executing.

What proof do you get?

For every gated action, you want evidence that (1) the policy required approval, (2) a human made a decision on a specific payload, and (3) the runtime executed only what was approved. Claw EA focuses on artifacts you can verify later, without trusting a single app log.

Concretely, you get gateway receipts for model calls emitted by clawproxy, which let you reconstruct the model interaction leading to an approval request. You also get a proof bundle that ties together the WPC reference, CST metadata, approval request and decision artifacts, and the relevant gateway receipts for audit and verification.

If you publish to Trust Pulse, reviewers get a consistent place to view the policy hash, the approval trail, and the verified receipt chain. This helps answer operational questions like “which approver authorized this outbound message?” and “did the model output change after approval was granted?”.

Rollback posture

Approvals reduce the chance of an incident, but you still need a clean rollback plan. The key is to design your tools so high-impact operations are reversible or at least bounded, then require step-up approval for the irreversible ones.

Action	Safe rollback	Evidence
Outbound email or chat message	Prefer “draft only” mode by default; require approval to send. If sent, send a follow-up correction and open an incident ticket.	Approval payload includes recipients and message body hash; proof bundle shows the approval decision and the tool call that actually sent.
File write or config change	Write to a branch or staging path; require approval to merge or apply. If applied, revert via known-good snapshot.	Approval payload includes path list and diff hash; OpenClaw sandbox mode plus tool policy reduces host impact; proof bundle ties it together.
Exec on host or elevated mode	Avoid by default; require explicit step-up and narrow command allowlist. If executed, rotate credentials and isolate affected hosts.	Approval payload captures command, args, working directory, and environment redaction policy; gateway receipts show the model plan that led to escalation.
Identity or permission mutation	Use time-bound changes where possible; otherwise log the “before” state and provide a revert playbook.	Approval payload includes target principal and permission identifiers; proof bundle records who approved and when.

FAQ

Why not just tell the agent “always ask before doing X”?

Because that is a prompt instruction, not a permission boundary. Prompt injection and tool indirection can cause the agent to skip, fake, or reinterpret the request, while an execution-layer gate can block the tool call until a real approval artifact exists.

Where should the approval gate be enforced?

Enforce it at the point where the tool would execute, not in the chat UI. OpenClaw already distinguishes tool policy and sandboxing from “skills” text, and that separation is where approval checks belong.

How do you bind an approval to the exact action being taken?

Use a structured approval payload and include a payload hash in both the approval artifact and the run record. Pair that with job-scoped CST binding to reduce replay risk across runs.

What do auditors get that they can independently verify?

They can verify the WPC hash reference, validate gateway receipts for model calls emitted by clawproxy, and review a proof bundle that links receipts and approval decisions to the run. This is stronger than a screenshot or a single system log because it preserves the chain of custody across the model boundary.

Can this work with Microsoft approval experiences?

Yes, you can route the human decision through a Microsoft approval flow via official API, then store the decision artifact and payload hash back into the run record. Keep the enforcement in the agent execution layer so the external approval system is an input, not the control boundary.

Step-up approvals (human-in-the-loop) for Agents | Policy-as-Code Control

Step-by-step runbook

Threat model

Policy-as-code example

What proof do you get?

Rollback posture

FAQ

Why not just tell the agent “always ask before doing X”?

Where should the approval gate be enforced?

How do you bind an approval to the exact action being taken?

What do auditors get that they can independently verify?

Can this work with Microsoft approval experiences?

Sources

See how this works for your stack

Step-up approvals (human-in-the-loop) for Agents | Policy-as-Code Control

Step-by-step runbook#

Threat model#

Policy-as-code example#

What proof do you get?#

Rollback posture#

FAQ#

Why not just tell the agent “always ask before doing X”?#

Where should the approval gate be enforced?#

How do you bind an approval to the exact action being taken?#

What do auditors get that they can independently verify?#

Can this work with Microsoft approval experiences?#

Sources#

See how this works for your stack

Related

Step-by-step runbook

Threat model

Policy-as-code example

What proof do you get?

Rollback posture

FAQ

Why not just tell the agent “always ask before doing X”?

Where should the approval gate be enforced?

How do you bind an approval to the exact action being taken?

What do auditors get that they can independently verify?

Can this work with Microsoft approval experiences?

Sources