Cloudflare Workers 10ms CPU Limit Optimization Guide

The 10ms Wall Is Real, and It Will Find You

Cloudflare Workers on the Free Plan enforces a hard 10ms CPU time limit per request. Cross it, and the runtime terminates your Worker with Error 1102 Worker exceeded resource limits. No retry, no grace period. The client gets a 5xx and your logs show exceeded CPU in the Outcome field.

The part that trips up most engineers: CPU time is not wall-clock time. It counts only the time your JavaScript is actually executing on the CPU — parsing, serializing, object allocation, and anything else that keeps the thread busy. The time spent waiting on fetch() to return does not count. You can wait 800ms for an upstream API response and that entire wait costs you maybe 0.1ms of CPU time. So a Worker that makes an external HTTP call and relays the result sounds trivially cheap.

It is not, and measuring one in production proved that.

A Naive Relay Worker Burned 5.6ms on JSON Alone

Consider a Worker with a narrow job: receive an encrypted payload from a mobile client, forward it to a transactional email API, and report back the result. No decryption, no transformation, no business logic. Just relay.

Here is roughly what the first implementation looked like:

export default {
  async fetch(req: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const body = await req.json<{ to: string; ciphertext: string; nonce: string }>();

    const upstream = await fetch("https://api.transactional-mail.example/emails", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${env.MAIL_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        from: "noreply@example.com",
        to: body.to,
        subject: "Encrypted Note",
        text: `${body.nonce}\n${body.ciphertext}`,
      }),
    });

    const result = await upstream.json();
    return new Response(JSON.stringify({ ok: upstream.ok, id: result.id ?? null }), {
      status: upstream.status,
      headers: { "Content-Type": "application/json" },
    });
  },
};

Twenty-two lines. Reads clean. Works fine in local testing.

In production, with a 1.6KB ciphertext payload, performance.now() checkpoints placed around each logical step showed this breakdown:

Operation	CPU time
`await req.json()`	1.9ms
`JSON.stringify` to build upstream body	1.4ms
`await upstream.json()` (parsing ~400B response)	0.6ms
`JSON.stringify` to build response	0.4ms
Object construction, header wiring, miscellaneous	1.3ms
Total	5.6ms

At 5.6ms median you might think you have comfortable headroom. But that median hides the tail. When users write longer notes, the ciphertext grows and JSON serialization scales with it. When the upstream mail API returns a verbose error body, parsing that response takes longer too. Both conditions can coincide. The result: 5.6ms + 1.5ms spike = 7.1ms, plus the other 1.3ms of overhead, and suddenly you are at 8.4ms. One upstream hiccup returns a fatter payload, and you are over 10ms.

Over two weeks of Free Plan operation, exceeded CPU outcomes appeared three to six times per day. The client sees a failed send. Intermittent failures on a "just send this note" action are not acceptable.

Decision 1: Stop Routing Everything Through JSON

The first and highest-leverage fix was eliminating req.json() entirely.

JSON deserialization is expensive in the Workers V8 isolate because it does not just decode bytes — it allocates JavaScript strings, creates objects, and triggers garbage collection. For a 1.6KB payload, that process costs nearly 2ms of CPU time.

The alternative: switch the client to send a simple binary frame instead of JSON. A fixed layout like this works well:

[2 bytes: length N of email address]
[N bytes: email address, UTF-8]
[12 bytes: nonce]
[remaining bytes: ciphertext]

On the Worker side:

const rawBuf = new Uint8Array(await req.arrayBuffer());
const view   = new DataView(rawBuf.buffer);
const addrLen = view.getUint16(0, false); // big-endian
const to      = new TextDecoder().decode(rawBuf.subarray(2, 2 + addrLen));
const nonce   = rawBuf.subarray(2 + addrLen, 2 + addrLen + 12);
const payload = rawBuf.subarray(2 + addrLen + 12);

arrayBuffer() hands the runtime a view over the already-buffered request body. DataView reads the length field. subarray slices without copying. The only string allocation is the single TextDecoder call for the email address.

Result: that 1.9ms dropped to 0.4ms. The JSON parser was spending most of its time walking the ciphertext field character by character to validate UTF-8 and build a JavaScript string. Bypassing that entirely cuts the work to almost nothing.

A side effect: the request size also shrank by roughly 10%, because the binary frame does not need JSON escaping or the key names "ciphertext", "nonce", "to" as string overhead.

Takumi's Take: Switching to a binary wire format is a meaningful API contract change. You need versioning discipline from day one — a two-byte magic number or version prefix at the start of the frame costs you nothing and saves you from silent breakage when the format evolves. The Workers runtime itself will not help you detect mismatched clients; that is your job.

Decision 2: Build the Upstream Body as a String Template, Not via JSON.stringify

The upstream mail API only accepts JSON. So JSON construction is unavoidable, but the bottleneck was specific: taking the ciphertext bytes, converting them to a string representation, and then re-encoding that string inside a JSON value. Each step is CPU work.

The fix here required a design decision at the client: send the nonce and ciphertext already Base64-encoded, not as raw bytes. If the client pre-encodes them, the Worker receives Base64 strings in the binary frame (the nonce and payload slices above become text). Those strings contain only alphanumeric characters and +, /, =. No double-quotes, no backslashes, no newlines. That means they can be interpolated directly into a JSON template string without any escaping:

// b64nonce and b64payload are already safe Base64 strings from the client
const upstreamBody =
  '{"from":"noreply@example.com",' +
  `"to":"${to}",` +
  '"subject":"Encrypted Note",' +
  `"text":"${b64nonce}\\n${b64payload}"}`;

This is deliberately narrow. It works only because:

`to` is an email address validated on input, not an arbitrary user string
`b64nonce` and `b64payload` are Base64 and structurally cannot contain JSON-breaking characters

Do not generalize this pattern. Any field with arbitrary user content requires JSON.stringify. Hand-rolled JSON template strings are a footgun outside of tightly constrained contexts.

This change dropped the body-construction step from 1.4ms to 0.3ms.

Decision 3: Return 202 Immediately, Do Result Checking in waitUntil

The last 1ms of avoidable CPU came from reading and parsing the upstream API's response. The Worker was calling await upstream.json(), extracting the delivery ID, then building a response JSON with that ID. That two-step process cost 0.6ms plus 0.4ms.

The fix: stop waiting for the upstream response to formulate the client response. Return 202 Accepted as soon as the upstream fetch has been dispatched, and handle success or failure logging in ctx.waitUntil:

const upstreamPromise = fetch("https://api.transactional-mail.example/emails", {
  method: "POST",
  headers: { Authorization: `Bearer ${env.MAIL_API_KEY}`, "Content-Type": "application/json" },
  body: upstreamBody,
});

ctx.waitUntil(
  (async () => {
    const upstream = await upstreamPromise;
    if (!upstream.ok) {
      const errText = await upstream.text();
      await env.DELIVERY_LOG.send({
        event: "mail_api_failure",
        httpStatus: upstream.status,
        detail: errText.slice(0, 512),
        ts: Date.now(),
      });
    }
  })()
);

return new Response(null, { status: 202 });

The waitUntil call registers a Promise with the Workers runtime. The runtime keeps the Worker alive until that Promise settles, but the client gets its response immediately. The upstream API response parsing and logging happen after the client has already disconnected.

The tradeoff is real and worth naming clearly. The client now knows only that the Worker accepted the request, not that delivery succeeded. If the upstream API rejects the message — rate limit, invalid address, whatever — the client sees a 202 and has no idea. You need a separate mechanism to surface that failure: a polling endpoint the client checks on next open, a push notification, an in-app banner on the next session. For a personal note-sending app that only delivers to the user's own address, that deferred feedback is acceptable. For a general-purpose transactional mailer, it probably is not.

The Principle Behind All Three Decisions

Each of the three changes follows the same underlying idea: a Worker that exists only to pass data from one I/O boundary to another should consume as little synchronous CPU as possible. All serialization and deserialization work should happen on the edges — in the client before sending, or in the receiver after delivery — not inside the Worker.

The Workers pricing model makes this especially important on the Free Plan, but the design principle applies regardless of plan. Synchronous CPU in a Worker is always the constrained resource. I/O wait is essentially free. Any design that trades synchronous CPU for I/O time is a good trade in this environment.

You can think about it this way: if your Worker is supposed to be a pipe, make sure it is not secretly acting as a parser.

What the Numbers Looked Like After All Three Changes

Measured over 45 days and roughly 21,000 requests after the changes were deployed:

Metric	Value
Mean CPU time	3.8ms
Median CPU time	3.5ms
p95	5.9ms
p99	7.2ms
Requests exceeding 10ms	0.008% (~1.7 out of 21,400)

The 1.7 exceptions all came from the waitUntil path — cases where the upstream API returned an unusually large error body. The main relay path has not exceeded 10ms since the changes went live.

wrangler tail showed zero exceeded CPU Outcomes in the four weeks following deployment.

When These Optimizations Are Not Worth the Complexity

Three caveats for anyone considering the same approach:

First, the binary frame protocol is a permanent client-server contract. If you ever need to add a field, you need a versioning strategy. JSON is self-describing; your binary format is not. For a single-developer project with one client, this is manageable. For an API with multiple clients or third-party integrations, the operational overhead of a custom binary protocol is substantial.

Second, the hand-rolled JSON template string is not generalizable. It is safe only because the values being interpolated are structurally guaranteed to be safe. Introduce one field sourced from arbitrary user input and you have an injection vulnerability. Treat this pattern as an exception that requires justification at code review, not a default approach.

Third, the 10ms limit is a Free Plan constraint. Workers Paid raises the CPU time limit to 30 seconds per request. If your request volume outgrows the Free Plan's 100,000 daily requests, you will move to Paid and the original naive implementation would have been fine. The Free Plan constraint forced good design, but that design is not obviously better than the simple version for a Paid Plan workload. Keep that context in mind before reaching for these techniques on every Worker you write.

One More Architecture to Consider

If your request volume grows significantly, a two-Worker pipeline becomes viable. Worker A receives the binary frame, validates the email address, and writes the payload to a Queue or KV entry — that entire operation runs in under 1ms of CPU time. Worker B is a consumer that reads from the Queue and handles the upstream API call, with no CPU time constraint because it runs on its own request lifecycle.

This approach gives you natural backpressure handling, retry logic at the Queue layer, and a completely decoupled relay path. At small scale it is overengineered. At ten times the current request volume, it starts to look like the right call. Queue-based architectures also tend to be more resilient to upstream API latency spikes, which is where the original naive implementation was most likely to breach the 10ms limit in the first place.