The same anatomy AWS publishes, scoped to v9-worker-pipe-*.
Y axis is invocations / second, capped at 16.67/s (the
L-A1AFA3CF account-wide spawn ceiling, 1000/min). Live values polled from
lambda:get_function_concurrency +
list_provisioned_concurrency_configs + CloudWatch
ConcurrentExecutions.
Per-minute MAX invocations/second across all 100 v9 shards (10-second resolution: peak of the six 10s sub-buckets ÷ 10). This is the actual instantaneous spawn rate, not an average — a 50-invocation burst in 5 seconds reads as 10/s, not 0.83/s. Bars turn orange at 70% of ceiling, red at 100%.
| Layer | Limit | Adjustable? | What it means here |
|---|---|---|---|
| Per-shard reserved concurrency | 0–1,000 instantaneous | yes (per fn) | blast-radius cap; one shard cannot drain the account |
| L-B99A9384 concurrent executions | 20,000 account | yes (support) | only matters at very heavy load |
| L-A1AFA3CF scaling rate | 1,000 / minute = 16.67/s | NO | actual ceiling: how fast cold containers can be created |
| SQS messages/s | 3,000 unbatched | yes | not binding for v9 |
| S3 GET / prefix | ~5,500/s | shard prefixes | not binding for v9 |
Translating fires/sec into useful sustained work, with measured per-leaf compute (Snake at 4 GB warm, ~700 ms/leaf):
| Sustained spawn rate | Lambda-seconds / hour | Equivalent constant compute | 5K×5L jobs / hour ceiling |
|---|---|---|---|
| 0.17/s (10/min) | 252 | 0.07 vCPU continuous | ~95 |
| 1.7/s (100/min) | 2,520 | 0.7 vCPU continuous | ~950 |
| 8.3/s (500/min) | 12,600 | 3.5 vCPU continuous | ~4,750 |
| 16.7/s (cap) | 25,200 | ~7 vCPU continuous | ~9,500 |
Empirical observation, 2026-05-21: a 1K×25L training run fired exactly 75 leaf invocations (3 leaves × 25 layers) in ~6 seconds of useful work. If those 75 leaves had fired in parallel we'd have seen a 15/s burst — that's 90% of L-A1AFA3CF's 16.67/s ceiling, with zero throttles, hitting the cap on a tiny 1K-row job.
What we actually saw on CloudWatch: 75 invocations spread across a full minute = 1.25/s sustained, peak ~3–5/s in a 10s window. That's 3–4× below the AWS ceiling. The leaves are not all firing in parallel; they're queueing behind their own parents.
| Where the chain serializes | Cost |
|---|---|
| Each layer = independent SQS-dispatched root divide | 5–25 root divides, each lands on its own cold shard via hash(jid, layer) |
Per-layer chain: divide → conquer → leaf | Conquer time.sleep(PARENT_POLL_MS) on DDB until leaf acks; divide does the same waiting on conquer. Three cold-start hops billed serially per layer. |
_pick_least_loaded_shard on every divide | 100-key DDB BatchGetItem, 30–80 ms on the critical path |
The remediation is structural and free: when n ≤ τ,
divide should directly enqueue the leaf payload — conquer is doing no
useful work in that path, only forwarding indices and burning a Lambda slot
on a poll. With that one change, layer chains collapse from 3 hops to 1,
and 25 layers can spawn in parallel instead of stretched across a minute.
bucket-many rows to chew on, and the recursion was the only
mechanism dictating fan-out. Today's conquer takes the entire
n ≤ τ slice and hands it to one leaf, which then
re-partitions internally via build_bucket_chain. That works
correctness-wise, but it collapses the fan-out from "one Lambda per bucket"
back to "one Lambda per ≤τ slice", erasing the whole point of having
100 shards. Restoring bucket-grained leaves — one Lambda per
bucket-sized atom, picked up directly off SQS — is what
unlocks the mesh.
See /economics §5.4 on why the provisioning toggle is a wall-clock product, not a cost product, until chain-serialization is removed: 14× the bill for 2× the speed is a bad trade. Fix the chain first, then the toggle becomes a real lever.
v9's wall-clock scaling is empirically O(n0.77) sublinear up to ~7.5K rows × 25 layers on synthetic data. Past that, the L-A1AFA3CF on-demand ramp gates spawn rate before the tree finishes its useful fan-out, and the model diverges from the regression — the wall clock isn't compute-bound anymore, it's queueing for warm containers. The fix that costs nothing is the chain-serialization removal in §3.1; the fix that costs some dollars is provisioning a thin slice of bypass capacity, toggled on for the duration of a job and off afterwards.
Provisioned slots bill per second. Toggle on ($pprox$60–90s warm-up, not billed during spin-up; we charge from READY) and off (instant), with no minimums. So you pay for the wall-clock window the slots are active — nothing more.
To raise effective spawn rate from 16.67/s to ~33/s (a 2× lift), we don't need 100 slots; we need enough provisioned capacity to absorb the second 16.67/s of leaf invocations the on-demand pool can't warm. With ~700 ms/leaf at 4 GB:
That's 12 slots fully shared across the 100-shard mesh — SQS pulls them from whichever shard the next leaf hashes to, so the bypass is global, not pinned. Above the 7.5K×25L break this is what keeps the N^0.77 line from bending.
Pricing eu-west-3, 4 GB, 2026-05-21:
| N slots | Reservation only (idle) | At 100% utilization (active on top) |
|---|---|---|
| 1 | $0.070/hr | $0.234/hr |
| 5 | $0.350/hr | $1.169/hr |
| 12 (= 2× bypass) | $0.841/hr | $2.804/hr |
| 25 | $1.752/hr | $5.842/hr |
| 50 | $3.504/hr | $11.683/hr |
| 100 | $7.008/hr | $23.367/hr |
Formula at 4 GB:
idle floor = N × 4 × 3600 × $0.0000048673 ≈ $0.0701 × N / hr at 100% util = N × 4 × 3600 × $0.0000162242 ≈ $0.2336 × N / hr
Concrete: flip 12 slots on for the 5-second window of a single training burst → 12 × 4 × 5 × $0.0000162242 = $0.0039. The reservation floor only bites when slots sit idle. Toggle-on-demand sidesteps the 24/7 floor entirely.
The full-mesh framing in /economics §5 ($168/day at 100 shards always-on) is the always-warm scenario, useful for SLA-class predict throughput. For training-burst bypass, you only need 12 slots because:
Measured 2026-05-21: training runs follow N^0.77 wall-clock through n=5K, n=7.5K at 25 layers. At n=7.5K, L=25, the run hits the on-demand spawn ceiling on its inner layers and the regression breaks — subsequent layers stretch out behind queue depth instead of finishing in parallel. Before the break we have a clean sublinear curve; after, the wall clock fans into "as many seconds as the ceiling will give us." The 12-slot bypass closes that gap exactly.
See also: /paper for the 1M-row claim that implicitly assumes free L-A1AFA3CF headroom (it does past the bypass); /economics §5 for full-mesh always-warm economics; /architecture §8 for why sharding gives isolation, not throughput.