Token and cost budgets only work when they are enforced by the execution layer, not by a prompt. In Claw EA, you express budgets in a WPC and bind a job to that policy using a CST, so the runtime can prove what policy governed the run and what was actually spent.

OpenClaw is the baseline agent runtime: it already separates sandboxing from tool policy and provides an operational boundary where budget checks can be implemented. Gateway receipts and a proof bundle let you reconcile usage after the fact, and, where you choose, fail closed when the budget would be exceeded.

Step-by-step runbook

  1. Define a budgeted policy as a WPC. Include limits such as max tokens per call, max tokens per job, and an optional max cost per job in USD.

    Keep the policy narrow: include model allowlists and tool allowlists so a budget exception cannot be “fixed” by switching to a higher-cost model or by invoking extra tools.

  2. Publish the WPC to the WPC registry and record its policy hash. Treat the policy hash as the deployment artifact you promote through environments.

    In change control, review the diff of the policy payload, not just a human description.

  3. Issue a CST for the job and pin it to the budget policy hash. This makes the job’s authorization and its budget contract inseparable in downstream verification.

    Use job-scoped CST binding to reduce replay risk, so a token from one job cannot be reused to run another job under a different context.

  4. Run the agent in OpenClaw with tool policy and sandbox settings aligned to the WPC. Use OpenClaw tool policy to control which tools exist, and sandboxing to limit blast radius when tools execute.

    For model calls, route traffic through clawproxy so each call produces gateway receipts tied to the job and policy context.

  5. Implement budget checks at the runtime boundary. Today this is typically done in the OpenClaw provider/tool plugin (or via an MCP server) by reading the pinned policy hash and maintaining a per-job usage ledger.

    Automatic cost budget enforcement inside clawproxy is planned, so plan for both “monitor” mode (audit and alert) and “enforce” mode (fail closed) depending on your risk tolerance.

  6. Collect gateway receipts and emit a proof bundle at the end of the job. Store the proof bundle in your evidence system, or publish it to Trust Pulse when you need a stable artifact for audit viewing.

    Reconcile token totals and approximate cost from receipts against provider billing exports (via official API) as a separate finance control.

Threat model

Budgets fail when they are only “instructions” to the model. A model can ignore the instruction, split work into more calls, switch models, or route work through unmetered tools unless the execution layer blocks those paths.

Threat What happens Control
Prompt-only budget bypass The agent keeps calling the model after “reaching the limit” because nothing enforces the limit at the call boundary. Put budgets in a WPC and enforce at the provider/tool boundary; route model calls through clawproxy so every call is receipted.
Model switching to increase spend The agent moves from a cheap model to an expensive one to “finish faster”, defeating cost assumptions. WPC model allowlist plus CST policy hash pinning; reject calls for models not permitted by the pinned policy.
Token fragmentation The agent splits a task into many smaller calls to evade per-call token limits and inflate total usage. Use both per-call limits and per-job limits in the WPC; track cumulative totals in the runtime ledger, and fail closed when exceeded.
Replay of an authorized token A CST captured from logs or a misconfigured host is reused to run additional calls outside the intended job window. Marketplace anti-replay binding with job-scoped CST binding; keep CST TTL short and rotate frequently.
Tool side-channel spend The agent offloads work to tools that call external LLMs or services directly, outside your metering. OpenClaw tool policy to restrict tools; require that model access occurs only via the clawproxy-routed provider in the approved profile.

Policy-as-code example

This is a compact, JSON-like sketch of a WPC payload that expresses token and cost budgets. Enforcement can be “monitor” (audit-only) or “enforce” (fail closed), depending on what you wire into your OpenClaw provider plugin or MCP server.

{
  "wpc_version": "1",
  "policy_name": "agent-budgets-prod",
  "job": {
    "max_runtime_seconds": 1800,
    "budget": {
      "max_tokens_per_call": 4096,
      "max_tokens_per_job": 120000,
      "max_cost_usd_per_job": 25.00,
      "enforcement_mode": "monitor"
    }
  },
  "model_policy": {
    "allowed_models": [
      "openrouter:.../gpt-4o-mini",
      "openrouter:.../claude-3.5-sonnet"
    ]
  },
  "tool_policy": {
    "profile": "allowlist",
    "allow": ["read", "write", "exec", "http"],
    "deny": ["elevated.exec"]
  },
  "receipt_policy": {
    "require_gateway_receipts": true,
    "require_proof_bundle_on_completion": true
  }
}

Operationally, you pin the WPC policy hash into the CST at job start, and you reject any model call whose effective policy hash does not match. That makes “which budget applied” verifiable later, even if the agent output is messy.

What proof do you get?

You get gateway receipts for model calls emitted by clawproxy. Receipts are the durable unit you can count and verify: model, timestamps, and accounting-relevant metadata that can be rolled up into totals for a job.

At the end of a run, you produce a proof bundle that packages receipts and related metadata needed for audit and verification. If you need a stable artifact for sharing or review, you can store and view it via Trust Pulse.

Budgets become auditable in two directions. You can show the intended limits (the WPC hash and content) and the actual usage (receipts plus the derived totals), and you can explain any “fail closed” event as a policy decision rather than an operator guess.

Rollback posture

Budget controls should have a safe rollback that does not widen privileges. Treat rollbacks as policy changes: create a new WPC, pin the new hash, and re-run with explicit evidence.

Action Safe rollback Evidence
Job hits token or cost cap mid-run Stop the job and re-run with a new WPC that raises limits, plus a tighter tool/model allowlist if needed. Old and new WPC hashes, CST policy hash pinning for each job, proof bundles showing where the cap triggered.
Unexpected model spend spike Roll back to a cheaper allowed model set by publishing a new WPC, then pin and re-issue CSTs for subsequent jobs. Gateway receipts showing the model used per call, plus the WPC model allowlist for the run.
Tool path bypasses metering Disable or deny the tool in OpenClaw tool policy and re-run in sandboxed mode; require model access via clawproxy-routed provider only. OpenClaw config snapshot, WPC tool policy, and proof bundle demonstrating all model calls were receipted.
Credential exposure concerns Revoke CSTs, rotate upstream provider keys, and re-run with shorter CST TTL and stricter job scoping. Revocation logs (your system), job-scoped CST binding evidence, and proof bundles for affected jobs.

FAQ

Why can’t I just tell the agent “do not exceed $10” in the prompt?

Because prompts are not a permission boundary. Budgets need a call-site decision: allow or deny the next model call, enforced by the runtime or proxy under a pinned policy hash.

What is the minimum I need to deploy to get budget evidence?

Use a WPC to define the limits, issue a CST pinned to that policy hash, and route model calls through clawproxy to generate gateway receipts. Then package receipts into a proof bundle for the job.

Do you enforce cost budgets automatically at the proxy?

Automatic cost budget enforcement is planned. Today you can implement enforcement in your OpenClaw provider/tool plugin (or via MCP server) and use gateway receipts plus the pinned WPC to prove what happened.

How does this relate to Microsoft 365 agents and Graph permissions?

Budgets are separate from Graph permissions. Use Entra ID and least-privilege Microsoft Graph scopes to limit what actions are possible, then use WPC budgets to limit how much model work is permitted while performing those actions via official API.

What happens if I need to raise limits for a single incident?

Create a new WPC with the temporary limits, pin it in a job-scoped CST, and run a separate job. That gives you a clean audit trail showing the exception policy and the proof bundle for the exception run.

Sources