Prevent agent exfiltration in tool runs | Secure Agent Workflow

To prevent agent exfiltration during tool runs, you need permissioned execution that is enforced by the runtime, not a prompt instruction that can be bypassed. In Claw EA, run OpenClaw as the baseline agent runtime, then bind every job to a WPC and a CST so the agent can only use the tools, files, and outbound paths that the policy allows.

When model traffic is routed through clawproxy, you also get Gateway receipts for model calls and a proof bundle you can verify later. This makes exfiltration attempts easier to detect and easier to prove after the fact, especially for webhook and changefeed style workflows where replay and idempotency bugs are common.

Step-by-step runbook

Classify tool outputs as reversible vs irreversible. Treat uploads, outbound webhooks, and any cross-boundary send as high risk. Decide upfront which tools are allowed to emit network requests, and which can only write to a local workspace.
Define a WPC that enumerates tool permissions, file scopes, and outbound destinations. Use a minimal allowlist, then widen it intentionally when you see real failures. Store the WPC as a signed, hash-addressed artifact and reference it by hash in your run configuration.
Issue a CST for the job and pin it to the WPC hash. The CST is issued by clawscope and should include a scope hash that matches your intended permissions. Use job-scoped CST binding to reduce replay risk, so a token from one job cannot be reused to perform the same webhook send in another job.
Route model calls through clawproxy and require Gateway receipts. This is the easiest place to get verifiable evidence of what the model was asked and what it returned. If you use OpenRouter via fal, route it through clawproxy so every model call emits receipts.
Enforce file path read and write scopes inside the tool runner. In OpenClaw, prefer sandboxed tool execution and mount only the minimum workspace paths as read-only or read-write. Keep secrets and credentials outside the mounted workspace, and treat any tool that can read arbitrary paths as a privileged capability.
Implement DLP and redaction at two points: logs and outbound payloads. Configure OpenClaw logging redaction so tool logs do not spill secrets, and add explicit payload filters in your outbound tools (webhook sender, HTTP client, file uploader). This is where you enforce “no raw document text”, “no credential-like strings”, and “only these fields may leave”.
Harden webhooks and changefeeds with idempotency and replay controls. Require idempotency keys on outbound webhook sends, store a dedupe record for a bounded time window, and refuse replays that do not match the expected job id and policy hash. Sign webhook payloads (HMAC or asymmetric) and verify responses strictly, since webhook destinations are a common exfil channel disguised as normal integration traffic.

Threat model

Exfiltration in tool runs usually looks like a normal integration. The attacker goal is to get the agent to copy sensitive data into an outbound request, a file upload, or a durable changefeed record that an external system later exports.

Threat	What happens	Control
Prompt injection triggers an “innocent” tool to send secrets	The agent is instructed to include credentials or document text in a webhook payload or HTTP request.	Permissioned execution via WPC tool allowlists plus outbound payload DLP and schema checks; do not rely on “never send secrets” prompt text.
Over-broad filesystem access	A tool reads `~/.ssh`, cloud creds, or other host files and then stages them for upload.	Sandboxed tool execution, minimal bind mounts, and explicit file path read scopes; deny “read all” tools unless they are isolated and reviewed.
Webhook replay and idempotency gaps	An attacker repeats a previously valid send, or forces double-sends that leak more data over time.	Job-scoped CST binding to reduce cross-job replay, plus idempotency keys and dedupe storage at the webhook tool layer.
Changefeed as a covert channel	The agent writes sensitive fields into a record stream that later syncs to external systems.	WPC field-level constraints for write tools (allowed fields, max payload size), and outbound redaction before commits.
Model output laundering	Sensitive content appears in the model output and is forwarded by tooling without validation.	Require outbound tools to re-validate and redact payloads even if the model produced them; attach receipts so you can correlate output to send attempts.

Policy-as-code example

The goal of policy-as-code is to make the execution layer fail closed even when the agent is manipulated. A prompt can be overridden by a single malicious message, but a WPC can be verified and enforced consistently for every tool call.

Below is a compact, JSON-like example of a WPC-style policy shape you can use in your organization. Treat it as a template: your WPC should specify the exact tool IDs you run in OpenClaw, the file scopes the sandbox can mount, and the only outbound destinations that can receive data.

{
  "policy_name": "no-exfiltration-tool-run",
  "version": 1,

  "tools": {
    "allow": [
      "fs.read_scoped",
      "fs.write_scoped",
      "webhook.send_scoped",
      "http.fetch_scoped"
    ],
    "deny": [
      "fs.read_all",
      "shell.exec_host",
      "uploader.send_unscoped"
    ]
  },

  "filesystem": {
    "read_paths": [
      "/workspace/input/",
      "/workspace/context/"
    ],
    "write_paths": [
      "/workspace/output/",
      "/workspace/tmp/"
    ],
    "deny_paths": [
      "/home/",
      "/root/",
      "/etc/",
      "/proc/",
      "/var/run/docker.sock"
    ]
  },

  "egress": {
    "domain_allowlist": [
      "hooks.example.com",
      "api.example.com"
    ],
    "ip_allowlist": [
      "203.0.113.10/32"
    ],
    "notes": "Egress allowlists are typically enforced at the network layer. This can be implemented outside clawproxy (optional) and should be treated as required for high-risk tools."
  },

  "dlp": {
    "redact_patterns": [
      "AKIA[0-9A-Z]{16}",
      "-----BEGIN PRIVATE KEY-----",
      "(?i)password\\s*[:=]"
    ],
    "max_outbound_bytes": 8192,
    "allowed_outbound_fields": [
      "event_type",
      "record_id",
      "status",
      "summary"
    ]
  },

  "webhooks": {
    "require_idempotency_key": true,
    "idempotency_ttl_seconds": 86400,
    "require_payload_signature": true
  }
}

What proof do you get?

When model calls are routed through clawproxy, each call yields Gateway receipts that can be verified later. These receipts are the backbone for post-incident questions like “which model response led to the outbound send?” and “did the model output contain the secret, or did the tool add it?”.

For an operational audit trail, package the receipts into a proof bundle together with run metadata you control, such as the WPC hash, the CST scope hash, tool invocation summaries, and hashes of produced artifacts. Store or publish the resulting artifact to Trust Pulse when you need a durable review surface for internal audit or external verification.

Rollback posture

Exfiltration controls should assume some actions cannot be undone. The rollback plan is therefore about making irreversible sends hard to execute, easy to detect, and attributable to a specific job configuration.

Action	Safe rollback	Evidence
Outbound webhook send with sensitive payload	Disable the webhook tool in the WPC, rotate destination secrets, and invalidate job CST issuance for that workflow until reviewed.	Proof bundle containing the WPC hash, CST scope hash, and correlated Gateway receipts around the decision point.
File upload to external system	Remove upload capability from the policy, revoke external credentials, and request deletion via official API if the vendor supports it.	Artifact hashes recorded in the proof bundle, plus tool invocation logs you attach for chain-of-custody.
Changefeed write of sensitive fields	Stop the writer tool, run a compensating scrub job via official API, and tighten field allowlists to only non-sensitive fields.	Proof bundle metadata showing which tool wrote, with timestamps aligned to Gateway receipts for the triggering model output.
Over-broad filesystem mount discovered after the fact	Reconfigure sandbox mounts to minimal scopes, rotate any secrets that were accessible, and re-run the OpenClaw security audit.	Configuration diffs plus a new proof bundle for the remediated run; keep the prior bundle for incident review.

FAQ

Why is prompt-only “do not exfiltrate data” not sufficient?

Because prompts are not an enforcement boundary. A single injected instruction can cause the agent to call a tool with real side effects, so the execution layer must be permissioned using policy-as-code that the runtime can fail closed on.

How do WPC and CST reduce webhook replay risk?

A WPC defines what is allowed, and a CST can be pinned to the WPC hash so the job can only act under that policy. With job-scoped CST binding, a captured token is less useful outside the original job context, which helps reduce cross-job replay.

Where should egress allowlists be enforced?

For high assurance, enforce egress allowlists at the network layer close to the sandbox or runner. Egress allowlists enforced outside clawproxy are optional and can be implemented, but for exfiltration prevention they should be treated as mandatory for outbound tools.

How do I make webhooks idempotent in an agent workflow?

Require an idempotency key per outbound send, persist a dedupe record for a fixed TTL, and reject duplicates even if the model retries. Bind the idempotency key format to job identifiers so a copied key cannot be replayed in a different run.

What should I audit after changing tool permissions?

Run the OpenClaw security audit and confirm the effective sandbox and tool policy configuration, not just the intended config. Then run a canary job and review the proof bundle to ensure the WPC hash and CST scope hash match what you expect.

Prevent agent exfiltration in tool runs | Secure Agent Workflow

Proof-first summary

Step-by-step runbook

Threat model

Policy-as-code example

What proof do you get?

Rollback posture

FAQ

Why is prompt-only “do not exfiltrate data” not sufficient?

How do WPC and CST reduce webhook replay risk?

Where should egress allowlists be enforced?

How do I make webhooks idempotent in an agent workflow?

What should I audit after changing tool permissions?

Sources

Ready to put this workflow into production?

See how this applies to your environment

Prevent agent exfiltration in tool runs | Secure Agent Workflow

Proof-first summary

Step-by-step runbook#

Threat model#

Policy-as-code example#

What proof do you get?#

Rollback posture#

FAQ#

Why is prompt-only “do not exfiltrate data” not sufficient?#

How do WPC and CST reduce webhook replay risk?#

Where should egress allowlists be enforced?#

How do I make webhooks idempotent in an agent workflow?#

What should I audit after changing tool permissions?#

Sources#

Ready to put this workflow into production?#

See how this applies to your environment

Related

Step-by-step runbook

Threat model

Policy-as-code example

What proof do you get?

Rollback posture

FAQ

Why is prompt-only “do not exfiltrate data” not sufficient?

How do WPC and CST reduce webhook replay risk?

Where should egress allowlists be enforced?

How do I make webhooks idempotent in an agent workflow?

What should I audit after changing tool permissions?

Sources

Ready to put this workflow into production?