A System Prompt Report is a policy artifact that makes an agent’s “prompt commitments” auditable and enforceable during execution, not just readable in a text file. In Claw EA, the report is bound to a WPC and enforced through CST scoping plus gateway receipts and a proof bundle, so approvals can be verified after the run and checked before the run proceeds.
OpenClaw is the baseline agent runtime here, but prompt text alone is not a control plane. Permissioned execution requires policy-as-code with fail-closed verification, because prompts can be overridden, injected, or simply ignored by tools and plugins at runtime.
Step-by-step runbook
This runbook is written for teams that need execution approvals tied to what the agent was actually allowed to do. It assumes you already run agents in OpenClaw and want a durable approval record that survives prompt edits and model changes.
-
Define the intended system prompt commitments as a stable “report template.” Include what must be true (constraints), what may vary (parameters), and what is disallowed (explicit prohibitions).
-
Encode the enforcement intent in a WPC = Work Policy Contract (signed, hash-addressed policy artifact; served by clawcontrols). Treat the WPC hash as the approval target, not a mutable prompt string.
-
Issue a CST = scoped token (issued by clawscope) for the job. Use CST scope hash and, when you need strict approvals, pin the policy hash so the run is bound to a specific WPC.
-
Route model calls through clawproxy so you receive gateway receipts for model calls. If you use OpenRouter, run it via fal routed through clawproxy so the same receipt mechanism applies.
-
At start-of-run, verify the WPC fetched by hash and verify CST claims. If verification fails, block tool execution and model egress rather than “best-effort” continuing.
-
At end-of-run, collect the proof bundle (a harness artifact bundling receipts and related metadata for audit/verification). Store it where your auditors can access it, optionally publishing a Trust Pulse for marketplace-stored artifact storage/viewer.
Threat model
The System Prompt Report exists because prompt text is not a security boundary. Your real boundary is the execution layer: which tools can run, which models can be called, and which identities can mint or reuse permissions.
| Threat | What happens | Control in Claw EA and OpenClaw |
|---|---|---|
| Prompt injection overrides “commitments” | An external message convinces the agent to ignore the system prompt and perform a restricted action. | Bind execution to a WPC and enforce via CST scope hash and optional policy hash pinning. Use OpenClaw tool policy and sandboxing to reduce blast radius. |
| Silent policy drift | The prompt changes after approval, but the job still runs and looks “approved” in chat logs. | Approve the WPC hash, not the prompt string. Require fail-closed fetch/verify of the WPC at run start and record the hash in the proof bundle. |
| Token replay across jobs | A CST from a previous run is reused to execute a new run with the same privileges. | Marketplace anti-replay binding (job-scoped CST binding) so the token is bound to a specific job context. |
| Unverifiable model usage | You cannot prove which model was called, with what request, under what policy. | Gateway receipts emitted by clawproxy for model calls, then bundled into a proof bundle for later verification. |
| Over-broad tool access in the runtime | The agent has shell or filesystem access beyond what the prompt intended, so one mistake becomes a system compromise. | Use OpenClaw tool policy allow and deny lists plus Docker sandboxing. Treat “elevated” host execution as an explicit exception with separate approval. |
Policy-as-code example
This example shows a minimal schema-style System Prompt Report that can be referenced by, or embedded into, a WPC. The point is not the exact fields, but that the report is deterministic, reviewable, and bindable to execution via hashes.
{
"artifact_type": "system_prompt_report",
"version": "1",
"system_prompt_sha256": "base16...",
"commitments": [
{
"id": "no_secret_exfil",
"statement": "Do not output secrets or raw credentials.",
"enforcement": "fail_closed"
},
{
"id": "tools_minimal",
"statement": "Only use allowlisted tools; no host elevated execution.",
"enforcement": "fail_closed"
}
],
"approvals": [
{
"approver": "security-team",
"scope": "job_class:finance-recon",
"approved_wpc_hash": "b64u...",
"expires_utc": "2026-12-31T00:00:00Z"
}
],
"runtime_bindings": {
"cst_scope_hash_required": true,
"policy_hash_pinning": "required",
"receipts_required": true
}
}
Validation rules should be simple and strict. If the WPC hash does not match what was approved, or if a CST is missing the expected scope hash, the run should stop before any external model calls or tool actions occur.
What proof do you get?
You get two layers of evidence: what policy was supposed to apply, and what actually happened. The WPC provides the signed, hash-addressed policy artifact; the proof bundle provides execution evidence tied to the run.
Concretely, the proof bundle can include the WPC hash reference, CST claim details (including scope hash and any policy hash pinning), and gateway receipts for model calls. Gateway receipts are signed receipts emitted by clawproxy, which means you can later verify that model calls occurred through the controlled path rather than an unlogged side channel.
For cross-team reviews, publish the proof bundle to a Trust Pulse when you need a marketplace-stored artifact for audit/viewing. That gives auditors a stable handle to inspect what was approved and what was executed, without relying on mutable chat transcripts.
Rollback posture
Rollback is mainly about stopping unsafe repetition and returning to a known-good policy state. With policy hash pinning, you can roll back by switching the approved WPC hash and rejecting jobs that present an unapproved hash.
| Action | Safe rollback | Evidence to check |
|---|---|---|
| Prompt or policy change found to be unsafe | Revert to last approved WPC hash and require CST policy hash pinning to that hash. | Proof bundle shows the WPC hash used for each run and whether pinning was present in CST claims. |
| Suspected token misuse | Invalidate the job path and require a fresh CST per job with anti-replay binding. | Proof bundle plus job metadata show whether a CST was job-scoped and whether reuse occurred. |
| Model routing bypass suspected | Block runs that do not produce gateway receipts and require clawproxy routing for model calls. | Gateway receipts presence, signature validity, and matching run identifiers inside the proof bundle. |
| Tool blast radius too wide | Tighten OpenClaw tool policy profile and enable sandboxing for the affected sessions. | OpenClaw security audit output and configuration diffs for tool allow and deny plus sandbox mode. |
Some orgs also pair this with enterprise identity controls. For Microsoft environments, you can align approvals to Entra ID groups and record operator actions in Microsoft Purview audit logs, but the enforcement still needs to happen in the agent execution path.
FAQ
Why isn’t a system prompt enough for permissioned execution approvals?
A system prompt is advisory text to a model, and it can be subverted by injection, tool misuse, or runtime configuration changes. Permissioned execution approvals need machine-enforced constraints, which is why the approval target should be a WPC hash plus CST scoping and receipt-backed verification.
What exactly is approved when we “approve a System Prompt Report”?
You are approving an immutable reference: the WPC hash that encodes the enforcement rules and the bindings required for the run. The report is the readable summary, but the approval should point at the hash-addressed artifact to prevent silent drift.
How do we verify the agent really used the approved model routing?
Require gateway receipts for model calls emitted by clawproxy and include them in the proof bundle. A verifier can then validate signatures and confirm calls were made through the controlled gateway path.
Can we integrate approvals with Entra ID or Microsoft Graph?
Yes, via official API or an enterprise buildout, you can map “who may approve” to Entra ID groups and enforce operator workflows outside the agent. Keep the enforcement decision inside the run boundary by binding execution to WPC plus CST, not to a UI-only approval checkbox.
What should “fail-closed” mean in practice?
If the WPC cannot be fetched and verified by hash, if CST claims do not match the expected scope hash, or if gateway receipts are missing, the job should not proceed to external model calls or tool execution. This prevents “partial runs” that look compliant but are not verifiable.