Inbox phishing triage with safe links policy | Secure Agent Workflow

This workflow runs inbox phishing triage under a safe-links-first posture, where the agent is not allowed to click, fetch, or forward anything unless a policy allows it. OpenClaw is the baseline agent runtime, and Claw EA adds permissioned execution so the agent can only take actions covered by a Work Policy Contract (WPC) and a scoped token (CST).

Prompt-only controls fail because an attacker can embed instructions inside an email, PDF, or landing page and push the model to take irreversible actions. Policy-as-code is the execution layer that stays in force even when the prompt is compromised, and it produces evidence (gateway receipts and a proof bundle) you can verify after the fact.

Step-by-step runbook

Define the triage scope and outputs. Decide what the agent is allowed to produce (for example: a verdict, indicators list, and a recommended containment action). Decide what is forbidden without human approval: clicking links, fetching external content, forwarding messages, and quarantining messages.
Publish a WPC for “phishing triage: read-only”. The WPC is a signed, hash-addressed policy artifact served by clawcontrols, and it should express tool limits, data handling rules, and the safe-links policy requirement. Treat the WPC hash as a release artifact, and change it only through your normal change control.
Issue a CST pinned to the WPC. Use clawscope to mint a CST for this job class, with optional policy hash pinning to the WPC so the token cannot be reused under a different policy. Bind the CST to a single triage job when possible to reduce replay value.
Run the agent in a sandbox with minimal tools. In OpenClaw, enable sandboxing for the session and keep the tool profile narrow. Prefer a read-only workspace and deny elevated execution for this workflow so “analyze” cannot turn into “execute”.
Integrate mailbox access with least privilege. For Microsoft 365, access is typically via official API using Microsoft Graph permissions/scopes, with authentication governed by Entra ID. If you are using Conditional Access and PIM, require the same controls for the service principal or delegated user that backs the triage runner.
Enforce safe link handling at the policy boundary. The agent must not dereference URLs directly. If a link must be evaluated, the workflow should rely on Microsoft Defender for Office 365 Safe Links policy outcomes when available, otherwise the agent can only submit the URL to an approved reputation service endpoint via allowlisted egress (optional and environment-specific).
Capture and store verification artifacts. Route model calls through clawproxy so you get gateway receipts for each model call. Store the resulting proof bundle and, when needed, publish a Trust Pulse for audit viewing and later review.

Threat model

This workflow assumes inbound email content is hostile. The main risk is that the content manipulates the model into taking irreversible actions or exfiltrating data, especially through link clicking, external fetches, or message forwarding.

Threat	What happens	Control
Prompt injection inside the email body	The model is instructed to “verify the link” or “forward to security” and attempts prohibited actions.	Permissioned execution via WPC tool restrictions; deny forward/quarantine tools; enforce “no URL fetch” rule at the tool layer, not in the prompt.
Malicious link and drive-by content	A click or fetch pulls attacker-controlled content and can cascade into more instructions or credential harvesting.	Safe-links-first handling; do not dereference URLs; only consult Safe Links policy results or allowlisted reputation endpoints (optional) using controlled egress.
Data exfiltration through model calls	Email content, headers, or PII gets sent to a model provider without a record, or gets mixed into logs.	Route model traffic through clawproxy to generate gateway receipts; enforce redaction and “only required fields” extraction in the agent plan; keep mailbox reads scoped and minimal.
Over-permissioned Graph access	The runner can delete, forward, or quarantine broadly, turning a single compromise into a tenant-wide incident.	Least-privilege Microsoft Graph permissions/scopes; require Entra ID governance (Conditional Access, PIM where applicable); separate read-only triage identity from remediation identity.
Replay of a prior authorization	A token is reused later to rerun triage or perform unintended actions.	Marketplace anti-replay binding using job-scoped CST binding; short TTL CST; optional policy hash pinning to the WPC.

Policy-as-code example

This is a simplified, JSON-like WPC sketch for read-only triage with a safe links policy. It shows the parts that matter operationally: deny irreversible actions, restrict network egress, and add explicit handling for URL artifacts.

{
  "wpc_version": "1",
  "workflow": "phishing_response.inbox_triage",
  "mode": "read_only",
  "tools": {
    "allow": [
      "mail.read_headers",
      "mail.read_body",
      "mail.read_attachments_metadata",
      "ioc.extract",
      "report.write_case_notes"
    ],
    "deny": [
      "mail.forward",
      "mail.quarantine",
      "mail.delete",
      "browser.open",
      "http.fetch",
      "shell.exec",
      "tools.elevated"
    ]
  },
  "network": {
    "egress_allowlist": {
      "domains": [
        "graph.microsoft.com",
        "security.microsoft.com"
      ],
      "ips": [
        "0.0.0.0/32"
      ]
    }
  },
  "url_handling": {
    "do_not_dereference": true,
    "allowed_resolution": [
      "defender_safe_links_verdict_only"
    ]
  },
  "input_validation": {
    "treat_external_text_as_untrusted": true,
    "strip_or_quote": [
      "instructions_in_email_body",
      "instructions_in_attachment_text",
      "instructions_in_web_previews"
    ],
    "require_explicit_user_intent_for": [
      "forward",
      "quarantine",
      "external_fetch"
    ]
  }
}

The key point is that the deny list blocks the irreversible operations even if the model tries to call them. The egress allowlist is shown explicitly because safe links posture breaks down if the agent can fetch arbitrary URLs (this control can be implemented in your environment and enforced outside clawproxy as an optional hardening layer).

What proof do you get?

Each model call routed through clawproxy emits gateway receipts that are signed and can be verified later. These receipts let you answer basic questions with evidence: which model was called, when, with what policy context, and whether the call was bound to the expected job-scoped CST.

At the end of a run, Claw EA packages the receipts and related metadata into a proof bundle for audit and verification. If you need a durable, reviewable artifact for stakeholders, you can store and view it as a Trust Pulse, and tie it back to the exact WPC hash and CST scope hash used for the run.

Rollback posture

Phishing triage has a safer rollback posture if you keep the agent read-only and push containment into a separate, approval-gated workflow. That separation reduces the chance that a single injected email can trigger quarantines or message forwarding at scale.

Action	Safe rollback	Evidence
Classify and summarize a suspected phish (read-only)	Discard the result; rerun with a newer WPC if needed.	Proof bundle linking run inputs, WPC hash, CST scope hash, and gateway receipts.
Rewrite analyst notes into a case system	Revert the case note entry via the case system’s audit history.	Gateway receipts for the model calls that produced the note; proof bundle tying the note text to the run.
Quarantine messages (high risk)	Move back or release from quarantine, depending on the mail system’s capabilities.	Should be in a separate workflow with its own WPC; proof bundle proves whether quarantine was even permitted.
Forward emails (high risk)	Not reliably reversible once delivered externally.	WPC deny list should block it in this workflow; proof bundle shows attempted tool calls if the agent tried.
Fetch external content from URLs (high risk)	Not reversible; can trigger secondary payload delivery or tracking.	WPC and egress allowlist prevent dereference; gateway receipts show no external fetch tool calls occurred.

FAQ

Why is policy-as-code required instead of “tell the agent not to click links”?

Because the email is untrusted input, and it can contain instructions engineered to override the prompt. Policy-as-code enforces the rule at execution time, so even if the model tries to call a browser or fetch tool, the call is denied by the WPC.

How do Safe Links policies fit into this workflow?

Safe Links is the organizational control that rewrites and checks URLs in Microsoft Defender for Office 365. This workflow treats Safe Links outcomes as the only acceptable “link evaluation” signal, and it blocks direct URL dereferencing by the agent.

Can this triage run against Microsoft 365 mailboxes?

Yes, typically via official API using Microsoft Graph permissions/scopes with Entra ID authentication. Keep triage read-only and use a separate identity and WPC for any remediation actions like quarantining.

What stops token reuse or replay across jobs?

CSTs are issued by clawscope and can be bound to a job using marketplace anti-replay binding (job-scoped CST binding). You can also use optional policy hash pinning so the CST is only valid under the intended WPC.

What does an auditor actually review after an incident?

They review the WPC hash that defined allowed actions, the CST scope hash used for the run, and the gateway receipts for model calls emitted by clawproxy. The proof bundle packages these artifacts so verification does not rely on screenshots or application logs alone.

Inbox phishing triage with safe links policy | Secure Agent Workflow

Proof-first summary

Step-by-step runbook

Threat model

Policy-as-code example

What proof do you get?

Rollback posture

FAQ

Why is policy-as-code required instead of “tell the agent not to click links”?

How do Safe Links policies fit into this workflow?

Can this triage run against Microsoft 365 mailboxes?

What stops token reuse or replay across jobs?

What does an auditor actually review after an incident?

Sources

Ready to put this workflow into production?

See how this applies to your environment

Inbox phishing triage with safe links policy | Secure Agent Workflow

Proof-first summary

Step-by-step runbook#

Threat model#

Policy-as-code example#

What proof do you get?#

Rollback posture#

FAQ#

Why is policy-as-code required instead of “tell the agent not to click links”?#

How do Safe Links policies fit into this workflow?#

Can this triage run against Microsoft 365 mailboxes?#

What stops token reuse or replay across jobs?#

What does an auditor actually review after an incident?#

Sources#

Ready to put this workflow into production?#

See how this applies to your environment

Related

Step-by-step runbook

Threat model

Policy-as-code example

What proof do you get?

Rollback posture

FAQ

Why is policy-as-code required instead of “tell the agent not to click links”?

How do Safe Links policies fit into this workflow?

Can this triage run against Microsoft 365 mailboxes?

What stops token reuse or replay across jobs?

What does an auditor actually review after an incident?

Sources

Ready to put this workflow into production?