SOX control testing evidence runs are safest when the agent is constrained by a permissioned execution layer, not just instructions in a prompt. Claw EA runs OpenClaw as the baseline agent runtime and adds machine-checkable controls using WPC, CST, gateway receipts, and proof bundles.
This workflow is designed for the parts auditors care about: repeatable evidence collection, step-up approvals before irreversible actions (export reports, file audit evidence), and a rollback posture with verifiable artifacts you can re-check later.
Step-by-step runbook
1) Define the evidence job as a WPC. Write a Work Policy Contract (WPC) that lists allowed tools, permitted data classes, redaction rules, and which actions require human approval. Publish the signed WPC to the WPC registry so it can be fetched and verified at runtime.
2) Issue a CST that is pinned to the policy. Create a CST (scoped token) via clawscope with a scope hash that matches the intended job boundaries. When you need strong guarantees, pin the CST to the WPC hash so a copied token cannot be reused under a looser policy.
3) Run the agent under OpenClaw with sandbox and tool policy tightened. Configure OpenClaw so evidence collection tools run in a sandbox where practical, and explicitly allow only the tools needed for the test. Use OpenClaw’s built-in security audit checks before you enable any external channels or shared workspaces.
4) Pull evidence via official APIs or MCP server, not ad hoc browsing. For system-of-record evidence (ERP, ticketing, GRC, IAM), retrieve artifacts via official API or via an MCP server you control, then store them as immutable inputs to the run. If Microsoft 365 is in scope, treat Microsoft Graph permissions/scopes and Conditional Access as part of the run’s prerequisites, and keep privileged access behind PIM-based activation.
5) Enforce step-up approvals on irreversible actions. When the agent reaches an irreversible step like exporting a report or filing audit evidence, require a human approval checkpoint that includes a preview of what will be exported and where it will be stored. The approval should be recorded as part of the run metadata so you can show who authorized the action and under which WPC.
6) Apply DLP and redaction before any outbound write. Treat generated narratives, exported PDFs, and attachments as potentially sensitive even if the source is “internal.” Redact secrets and identifiers per the WPC, and keep a hash of the unredacted original only if your retention policy allows it.
7) Close the run with a proof bundle and retention decisions. At completion, produce a proof bundle that includes gateway receipts for model calls, policy identifiers, and run metadata needed for later verification. Store the bundle in your evidence repository and optionally publish a Trust Pulse for easier audit viewing, while keeping the underlying artifacts in your controlled storage.
Threat model
SOX evidence runs fail in predictable ways: over-broad access, replayed runs, and evidence tampering after the fact. The goal is to make the run deterministic enough to repeat, and bounded enough to stop damage when the agent is wrong.
| Threat | What happens | Control |
|---|---|---|
| Prompt-only “policy” drift | The agent follows updated chat instructions that expand scope, export extra data, or change the evidence narrative. | Use a signed WPC and pin the CST to the WPC hash; the runtime enforces what tools and actions are allowed, regardless of what a prompt says. |
| Replay of a prior token or run context | A CST is reused to rerun evidence collection later, potentially against different data or in a different environment. | Use marketplace anti-replay binding with job-scoped CST binding; require per-job tokens and fail closed if the token is not bound to the job. |
| Irreversible evidence actions without review | The agent exports a report, uploads attachments, or files audit evidence with incorrect period, entity, or population. | Step-up approvals for export and filing actions; the WPC should mark these as deny-by-default unless a human approval flag is present. |
| Data leakage in narratives and attachments | Sensitive data is embedded in a generated summary or an exported file, then shared externally or stored too broadly. | DLP and redaction rules in the WPC; keep outputs restricted and redact before outbound writes. Treat evidence packaging as a separate step with its own controls. |
| Disputed evidence provenance | You cannot show which model calls were made, what policy was in effect, or whether outputs were edited after generation. | Route model calls through clawproxy to emit gateway receipts, then bundle them into a proof bundle with policy hash references for later verification. |
Policy-as-code example
This is a compact, JSON-like policy sketch you can adapt into a WPC. The important parts are: explicit tool boundaries, step-up approvals on irreversible actions, DLP and redaction rules, and evidence retention and replay posture.
{
"wpc_version": "1",
"job_type": "sox_control_testing_evidence_run",
"scope": {
"control_ids": ["ITGC-AC-01", "ITGC-CHG-03"],
"period": {"from": "2026-01-01", "to": "2026-01-31"},
"entities": ["US-Prod"]
},
"tools": {
"allow": [
"read_only_api_client",
"evidence_packager",
"model_via_clawproxy"
],
"deny": ["shell", "write_filesystem_outside_workspace", "browser_remote_control"]
},
"approvals": {
"required_for": [
{"action": "export_report", "reason": "irreversible"},
{"action": "file_audit_evidence", "reason": "irreversible"}
],
"approval_payload_must_include": ["destination", "period", "entity", "artifact_list_hash"]
},
"dlp": {
"redact": ["secrets", "access_tokens", "government_id", "bank_account"],
"allow_pii": "deny_by_default",
"output_label": "SOX-EVIDENCE-CONFIDENTIAL"
},
"replay_and_retention": {
"token_policy": {"cst_scope_hash_required": true, "policy_hash_pinning": true},
"anti_replay": {"job_scoped_binding": true},
"retain_proof_bundle_days": 365,
"retain_raw_artifacts_days": 90,
"rollback": {"do_not_delete_prior_evidence_packages": true}
}
}
What proof do you get?
Every model call routed through clawproxy produces gateway receipts. Those receipts are designed to be verified later, so you can show that a specific set of calls occurred under a specific job context instead of relying on screenshots of chat logs.
Claw EA packages gateway receipts, policy references (including the WPC hash), token binding metadata, and run identifiers into a proof bundle. The proof bundle is the unit you hand to audit, or re-verify internally, when someone asks “what exactly happened in this evidence run?”
If you need a shareable audit view, you can store a Trust Pulse artifact for the run so reviewers can find the proof bundle quickly. Treat Trust Pulse as the index and viewer, and keep your raw evidence files in your own controlled repository with your retention rules.
Rollback posture
Rollback for SOX evidence runs is mostly about preventing destructive changes and making corrections traceable. You should assume any export or filing step is irreversible and design the run so the “rollback” action is to supersede, not erase.
| Action | Safe rollback | Evidence |
|---|---|---|
| Draft evidence narrative generated | Regenerate under the same WPC and period scope; keep the prior draft as superseded, not deleted. | Proof bundle containing gateway receipts and run metadata, plus hashes of drafts stored in your repository. |
| Export report (high risk) | Export a corrected report with a new run id; mark the prior export as voided in your evidence index. | Human approval record plus proof bundle; artifact hashes for both exports and the voiding note. |
| File audit evidence (high risk) | File an amendment package that references the prior submission; do not attempt deletion unless your GRC system supports it with an audit trail. | Approval payload, destination metadata, and proof bundle linking the filing step to the WPC and job-scoped CST binding. |
| DLP redaction applied | Re-run packaging with updated redaction rules; preserve the old package with restricted access and a deprecation note. | Two packages with distinct hashes; WPC revisions recorded so you can explain why redaction changed. |
Optional controls that can be implemented in an enterprise buildout include egress allowlists enforced outside clawproxy and automatic cost budget enforcement. Keep them separate from the core proof path so verification still works when those controls evolve.
FAQ
Why is prompt-only governance not enough for SOX evidence runs?
A prompt is editable and hard to enforce, especially when the agent encounters new data or conflicting instructions. A WPC makes the allowed actions and data handling rules machine-checkable and stable across reruns, and CST policy hash pinning helps prevent “same token, different behavior.”
What makes an action “irreversible” in this workflow?
Exports and filings create records outside the agent’s workspace and often trigger retention or distribution rules you cannot undo cleanly. Treat export reports and file audit evidence as high risk, require step-up approval, and record the approval inputs in the run metadata.
Can we run this without giving the agent broad system access?
Yes, that is the point. Keep data access read-only where possible, restrict tools via OpenClaw tool policy, and isolate execution with sandboxing for collection and packaging steps.
How do auditors verify the run?
They review the WPC that defined permitted actions, then check the proof bundle produced by the run. Gateway receipts provide a tamper-evident record of model calls routed through clawproxy, and the job-scoped CST binding reduces replay risk.
Where do we store evidence files versus proof artifacts?
Store raw evidence files and exports in your controlled evidence repository with your retention schedule. Store proof bundles and any Trust Pulse entries as the verification layer that explains how the evidence was produced and under which constraints.