A kill switch is a policy-enforced emergency halt that stops an agent run from taking new actions, even if the agent keeps generating text. In Claw EA, the practical control is: deny tool execution and deny model egress at the execution layer, using a WPC and CST enforcement, rather than relying on a “please stop” prompt.
OpenClaw is the baseline agent runtime: it already separates tool policy (what tools are callable) from sandboxing (where tools run). The kill switch works when the runtime and proxy are permissioned by policy-as-code, so a compromised prompt cannot talk its way around the control.
Step-by-step runbook
-
Define what “halt” means for your agents. Most teams choose: no tool calls, no outbound model calls, and a visible “halted” status for operators.
Be explicit whether read-only tools (for example, file read) are allowed during halt. Default to deny-all tools to stop data movement during incidents.
-
Author a WPC that can express a global deny for tools and a proxy-side deny for model calls. Store the WPC in the WPC registry so it is signed and hash-addressed.
Use a “policy hash” you can pin to runs, so the runtime can fetch and verify the exact WPC content via clawcontrols.
-
Issue CSTs from clawscope with optional policy hash pinning. This makes the token unusable if the policy changes, which is helpful during incident response.
For high-risk agents, keep CST TTL short so operator actions take effect quickly without waiting on long-lived credentials.
-
Route model calls through clawproxy so you get gateway receipts for every model request. In practice, this is where you can fail closed if the WPC indicates “halt.”
OpenRouter via fal routed through clawproxy is available if you want a consistent receipting path for model egress.
-
Enforce tool policy inside OpenClaw so tool execution stops locally even if the agent is still “thinking.” Keep sandboxing on for non-main sessions or all sessions based on your exposure surface.
Do not rely on skills or prompts for enforcement. Skills are documentation injected into the prompt; they are not a security boundary.
-
Trigger the kill switch by updating which WPC a job uses (or switching the WPC to deny-all), and require that new CSTs pin to the “halt” policy hash. For in-flight jobs, revoke CSTs to force the next authorization check to fail.
When you need additional containment, you can also disable inbound chat triggers using OpenClaw channel policies (for example, lock rooms down to allowlists or require mention).
-
Collect evidence. Export gateway receipts and assemble a proof bundle for each impacted job run, then store the relevant artifacts in Trust Pulse for review.
Use the proof bundle as the definitive record of what model calls occurred before the halt and what was blocked after.
Threat model
The main risk is that a prompt-only “stop” instruction is not binding. If the agent is compromised by prompt injection, it can ignore instructions and keep calling tools or exfiltrating via model egress unless execution is permissioned.
Kill switches should assume the model can be adversarial. The control point must be outside the model, enforced by the runtime and the proxy using WPC and CST checks.
| Threat | What happens | Control |
|---|---|---|
| Prompt injection attempts to keep acting | The agent continues issuing tool calls or “just one more” action despite an operator halt request. | Tool calls denied by policy-as-code at the runtime boundary (OpenClaw tool policy). Model egress denied by clawproxy using WPC and CST checks. |
| Model egress used for data exfiltration | Sensitive snippets are sent out in model prompts, even if local tools are locked down. | Route model calls through clawproxy to fail closed and to emit gateway receipts for review. Use CST policy hash pinning so “halt” is enforced consistently. |
| Operator confusion during incident response | Multiple teams disagree on whether an agent was halted, when, and under which policy version. | WPC is hash-addressed and signed, so “what policy was in force” is unambiguous. Proof bundles tie together receipts and run metadata for later verification. |
| Replay or cross-job token reuse | A CST captured from one job is reused to keep making calls after a halt elsewhere. | Marketplace anti-replay binding (job-scoped CST binding) reduces reuse across jobs. Revoke CSTs during the incident to stop authorization checks from passing. |
| Sandbox escape or host-level blast radius | Misconfiguration allows elevated execution on the host, widening impact during a compromise. | Use OpenClaw sandboxing for tool execution and minimize “elevated” tool paths. Audit configuration regularly and treat elevated execution as an exception. |
Policy-as-code example
This example shows the shape of a kill switch policy as code. The WPC is the signed, hash-addressed artifact; the important part is that enforcement happens outside the prompt, and the CST can pin to the WPC hash.
{
"wpc": {
"name": "emergency-halt",
"version": "2026-02-11",
"mode": "halt"
},
"enforcement": {
"tools": {
"default": "deny",
"allow": []
},
"model_egress": {
"default": "deny"
}
},
"token_requirements": {
"cst": {
"require_scope_hash": true,
"optional_policy_hash_pinning": true
}
},
"audit": {
"require_gateway_receipts": true,
"emit_proof_bundle": true
}
}
Operationally, you roll this out by making new jobs reference the WPC hash for “emergency-halt,” and by revoking CSTs associated with runs you must stop. If your process requires approvals, you can gate WPC publication and CST issuance through your existing change control.
What proof do you get?
When model calls go through clawproxy, you receive gateway receipts for each call. Those receipts are designed for later verification: they let you show what was called, when, and under which authorization context, without asking engineers to reconstruct logs by hand.
Claw EA can bundle receipts plus run metadata into a proof bundle. You can store the artifact in Trust Pulse so security and audit teams have a stable object to view during review and post-incident writeups.
In an emergency halt scenario, the proof you want is both positive and negative. Positive proof shows the last allowed calls before the kill switch; negative proof shows subsequent calls were blocked due to WPC and CST enforcement (or that the agent was unable to obtain a valid CST after revocation).
Rollback posture
A kill switch is only useful if “halt” is reversible without guesswork. Rollback should be a controlled re-authorization: publish or select a non-halt WPC, mint new CSTs pinned to that policy, and re-enable the minimum tool profile needed to resume work.
Do not “unhalt” by telling the agent it is safe again. Prompt-only unhalts are indistinguishable from attacker instructions, and they do not restore the permission boundary cleanly.
| Action | Safe rollback | Evidence |
|---|---|---|
| Halt all tools for a job | Switch job to a normal-operation WPC hash and re-issue CSTs for that job. Re-enable only the required OpenClaw tool allowlist profile. | WPC hashes before and after, plus proof bundle showing the halt window and subsequent allowed calls. |
| Halt model egress | Update WPC to allow model egress again and issue new CSTs pinned to that policy hash. Validate clawproxy path is still emitting gateway receipts. | Gateway receipts resume with the new policy context; the proof bundle boundary demonstrates when egress reopened. |
| Revoke tokens during incident | After root cause is addressed, mint fresh CSTs with short TTL and job binding. Avoid reusing older tokens even if they are unexpired. | Token issuance and revocation audit events from clawscope, linked to the job metadata in the proof bundle. |
| Lock down inbound triggers | Restore OpenClaw channel policies gradually, starting with allowlists and mention requirements. Keep high-risk channels gated until monitoring is stable. | OpenClaw configuration diff plus runtime audit output demonstrating the new effective policy. |
FAQ
Why isn’t a prompt-based “stop” good enough?
Because prompts are not enforcement. If the agent is manipulated, it can ignore “stop” while continuing to call tools or send data to a model endpoint.
A kill switch must be enforced at the execution layer using policy-as-code, so the runtime and proxy deny actions regardless of what the model says.
What is the smallest useful kill switch in production?
Deny tool execution in the runtime and deny model egress in the proxy. This prevents both local side effects and outbound data movement, even if the agent keeps generating text.
OpenClaw’s tool policy and sandboxing model makes this separation explicit: stop “what tools can run” and stop “where tools can run.”
How do WPC and CST work together during an emergency halt?
The WPC defines the binding policy (for example, halt mode) and is signed and hash-addressed. The CST is issued by clawscope, can carry a scope hash, and can optionally pin to the WPC hash so it cannot be used under a different policy.
During an incident you typically both switch to a halt WPC and revoke CSTs for in-flight runs to force re-authorization to fail.
What do I show an auditor after we hit the kill switch?
Show gateway receipts for model calls and a proof bundle per job run that ties receipts to run metadata. Store the incident-relevant artifacts in Trust Pulse so reviewers have stable references.
This avoids “trust us” narratives and supports a concrete timeline: last allowed call, halt activation, and post-halt blocks.
Does this replace OpenClaw’s built-in security audit and sandboxing?
No. OpenClaw’s local controls reduce blast radius on the host and help prevent common misconfigurations, and you should still run its security audit regularly.
Claw EA adds a permissioned execution and verification layer around runs, including WPC-based policy fetch/verify, CST enforcement, and receipted model egress via clawproxy.