v9 uses ONE Lambda binary at 4096 MB, deployed to 100 functions.
v9-worker (4096 MB, ~3 vCPUs):
invoke : $0.0000002 per request
compute : 4 GB × duration × $0.0000166667/GB-s
= $0.0000667/s
divide (~80ms warm) : $0.000005 + $0.0000002 ≈ $0.0000054
conquer (~150ms warm) : $0.000010 + $0.0000002 ≈ $0.0000102
leaf (~400ms warm) : $0.0000267 + $0.0000002 ≈ $0.0000269
v9 partitions input via binary divide, then hands each ≤τ slice to one leaf. Per layer the tree shape is:
Snake at the leaf does its own internal IF/ELIF/ELSE partition with
build_bucket_chain(bucket=user_b). Snake's partition is class-aware,
not size-fixed: a slice of 500 mixed-class rows at bucket=32 emits
~4 buckets (one per class plus catch-all), not 500/32 = 16. So the
assembled JSON is dramatically lighter than naive arithmetic suggests.
L layers each with ~⌈n/τ⌉ leaves, ~⌈n/τ⌉ conquers,
and ~⌈n/τ⌉−1 divides. Production default τ = 1,000
(locked — not a tuning knob). The cost numbers below assume this.
The cost rows below are split into compute (the leaf bill, linear in N) and reservation (provisioned-slot bypass needed to keep peak invocations/s under the L-A1AFA3CF ceiling, super-linear past N≈30K). At τ=1000, all rows above 7.5K×25L need bypass to be feasible — see /paper §7 for the derivation.
| Scale | L | Wall clock | Slots | $ compute | $ reservation | $ total |
|---|---|---|---|---|---|---|
| 1K | 5 | 3.8s | 0 | $0.0006 | — | $0.0006 |
| 5K | 5 | ~5s | 0 | $0.006 | — | $0.006 |
| 15K | 5 | ~14s | 0 | $0.017 | — | $0.017 |
| 150K | 5 | ~80s | 9 | $0.17 | $0.01 | $0.18 |
| 1M | 5 | 6.5 min | 81 | $1.13 | $0.62 | $1.75 |
| 1M | 1 | 78s | 7 | $0.23 | $0.01 | $0.24 |
| 1M | 25 | 33 min | 449 | $5.65 | $17.16 | $22.81 |
| 10M | 5 | ~38 min | ~470 | $11.30 | $60.10 | $71.40 |
Costs split: leaves are linear in N (the compute column); reservation is super-linear in N because the slot count needed to sustain peak invocations/s grows as O(L · N / log N) at fixed τ (see /paper §7.3). Past N≈30K at L=25, the reservation term overtakes compute and dominates total cost.
Live benchmark: 3 datasets (binary, 3-class, regression), each trained twice
— once on full 5K (perfect-fit), once on 4K with 1K held-out. All predicts
forced through the cloud (cloud_threshold = 1). 12 model trains in
total, ~30K predict round trips, single CloudWatch window:
| Pass | n_train | n_predict | Train (s) | Predict (s) | Quality |
|---|---|---|---|---|---|
| binary — perfect-fit | 5,000 | 5,000 | 4.80 | 6.09 | acc 100.00% |
| binary — held-out | 4,000 | 1,000 | 4.01 | 4.76 | acc 100.00% |
| 3-class — perfect-fit | 5,000 | 5,000 | 4.66 | 6.26 | acc 100.00% |
| 3-class — held-out | 4,000 | 1,000 | 3.72 | 4.44 | acc 100.00% |
| regression — perfect-fit | 5,000 | 5,000 | 24.93 | 3.74 | R² = 1.0000 |
| regression — held-out | 4,000 | 1,000 | 20.39 | 4.20 | R² = 0.9845 |
CloudWatch delta over the entire bench: $0.0434 for 387 Lambda invocations. Total work performed: 6 trainings × 4–5K rows = 27,000 training rows, plus ~24,000 inference round trips. So:
Trainings dominate compute time (52s of 73s wall clock, 71%); predicts dominate count. If we ascribe cost proportional to wall clock:
Cloud predict's per-row cost is not a constant — it scales with the assembled bucket-chain length (training size × layer depth). A second empirical anchor from a Nature consumer running a 22K-row × 5-layer model:
The lever for >>1M-prediction workloads is the local handoff (§6.1):
download the model JSON once with m.to_algorithmeai() and run
algorithmeai.Snake in-process. algorithmeai per-row inference
is sub-millisecond regardless of model size, with no Lambda billing,
no S3 fetch, no HTTP RTT. Cloud predict's pricing is for the case where
the user can't or doesn't want to ship the model to the caller (multi-tenant,
audit trail, central enforcement) — not for high-volume batch.
Numbers persist in v9/bench/results_3x5k.json; rerun via
v9_smoke_economics.py.
72-hour CloudWatch snapshot of the eu-west-3 account, all Lambdas:
| Function | Memory | Invocations (72h) | Duration billed (72h) | Est. cost |
|---|---|---|---|---|
| v9 mesh (100 shards) | 4 GB | ~14,000 | 3.4 hours | ~$0.21 |
| v6-worker (legacy) | 10 GB | 7,485,628 | 5,992 hours | ~$914 |
| v6-divide (legacy) | 4 GB | 172,387 | 2,391 hours | ~$143 |
| Property | v6 (one fn) | v9 (100 fns) |
|---|---|---|
| Per-shard reserved concurrency | 1000 cap (single fn) | 1000 × 100 shards (isolation, not aggregation) |
| Account spawn ceiling (L-A1AFA3CF) | 1000/min, sharded with all retries | 1000/min, sharded by hash — misbehaving job hits one shard's wall |
| 1K cost | $0.003 | $0.0006 |
| 1K wall clock | 2.4s | 3.8s |
| 15K wall clock | 7.3s | ~8s (parity) |
| 150K wall clock | 19.8s | ~80s (peak rate gated) |
| 1M×5L wall clock | concurrency-capped, fails | 6.5 min with 81-slot bypass |
| 10M×5L wall clock | concurrency-capped, fails | ~38 min with ~470-slot bypass |
v9 is the version that completes at scale — v6's recursive self-invokes wedged under L-A1AFA3CF tightening. v9 routes around that with sharding (isolation) plus toggled provisioned slots (peak-rate bypass). The mesh is an availability mechanism; the slots are a throughput mechanism; neither alone replaces the other.
The per-million numbers above are compute-only. The real cost of running v9 has three buckets, and only one of them scales with rows processed:
| Bucket | What you pay | When billed |
|---|---|---|
| Fixed | ~$98/month flat | 24/7, identical at 0 jobs or 1M rows |
| Activation | $0.0117–$0.117/min | only while provisioned-concurrency toggle is on |
| Usage | $1.13/M train + $0.51/M predict | per row processed (compute itself) |
EC2 t4g.2xlarge (gatherer) $96.77/month (on-demand)
$65.30/month (reserved 1y, 32% off)
S3 storage (~5 GB) $0.12/month
DynamoDB on-demand $0.06/month (~50K writes)
CloudWatch log retention $0.30/month
Route53 + ACM cert $0.50/month
────────────
$97.75/month total fixed
The gatherer is 8 vCPU / 32 GB at 99% idle. Massively overprovisioned for a single user; sized for multi-tenant. At 100K rows/month total throughput, the fixed bucket alone amortizes to $978/M rows — dwarfs every other cost. At 100M rows/month it's $0.98/M and disappears into the noise.
L-A1AFA3CF (Lambda concurrency scaling rate) is a 1000/min account-wide ceiling AWS does not adjust. Cold starts on a chain-serialized invoke graph cap v9 at <5× parallelism for small jobs — the 100-shard mesh is structurally underutilized at the spawn-rate ceiling. Provisioned concurrency bypasses this for the provisioned fraction:
Provisioned reservation rate (eu-west-3 x86): $0.0000048673 per GB-second (paid while READY, not while invoking) Spin-up: ~84s, NOT billed Tear-down: ~0.8s, instant Min billing granularity: 1 second 4 GB × 1 shard × 60s = $0.00117 per minute warm 4 GB × 10 shards × 60s = $0.0117 per minute warm 4 GB × 100 shards × 60s = $0.117 per minute warm ($7.01/hour)
The spin-up tax means the toggle is a session primitive, not a per-job one. You toggle on once, run a stream of jobs against the warm mesh, toggle off when done. AWS cannot bill provisioning while the function itself is unreachable, so a reliable toggle-off (EventBridge one-shot + 60s sweep) is mandatory — a leaked toggle on 100 shards burns $168/day.
Define friction as the fraction of warm-window time the mesh sat ready but doing nothing useful. 0% friction = perfectly back-to-back jobs. 50% friction = half your warm window was idle.
provisioned $/min $0.117 (full mesh)
on-demand $/min $0.0083 (measured, 5K×5L bench)
───────
ratio 14× provisioning is 14× the price of plain Lambda
So provisioning only pays for itself when the warm window crams enough back-to-back compute to dilute the reservation tax. Concretely, for a 5K×5L job (one warm session, full 100-shard mesh, 60s budget):
| Scenario | Compute cost | Provisioning cost | Per job | Per million train rows |
|---|---|---|---|---|
| On-demand only (today) | $0.005 | — | $0.005 | $1.13/M |
| Toggle, 100% friction (perfect) | $0.005 | $0.0117 | $0.017 | $3.34/M |
| Toggle, 50% friction (realistic) | $0.005 | $0.117 | $0.122 | $24.40/M |
| Toggle, 5% friction (heavy session) | $0.005 | $1.17 | $1.18 | $235/M |
The session profile that genuinely beats on-demand is 20+ back-to-back jobs through one warm window. One job per session is strictly worse than on-demand — you're paying 14× the compute cost for ~2× the wall-clock improvement.
Provisioning is a session primitive, not a per-job one. The 84s spin-up plus the 14× reservation rate means a single isolated job gets crushed by the toggle. But a session — a stretch of clock time during which you fire many jobs back to back — can amortize the warm cost down to fractions of a cent per job. Concrete numbers, full mesh (100 shards × 4 GB), 60s of spin-up amortized into the budget:
| Session shape | Wall clock | Jobs run | Provisioning $ | Compute $ | $/job | vs on-demand |
|---|---|---|---|---|---|---|
| Spin up, run 1 job | ~90 s | 1 | $0.105 | $0.005 | $0.110 | 22× worse |
| Spin up, run 5 jobs | ~120 s | 5 | $0.140 | $0.025 | $0.033 | 6.6× worse |
| Spin up, run 20 jobs | ~210 s | 20 | $0.245 | $0.100 | $0.0173 | 3.5× worse |
| 1-hour fully-saturated session | 3,600 s | ~600 | $7.01 | $3.00 | $0.0167 | 3.3× worse |
| Compute-saturated infinity (theoretical floor) | ∞ | ∞ | $/sec ratio | $/sec ratio | $0.005 | parity |
Provisioning trades dollars for predictable latency. The 14× tax is the price of skipping cold starts and the L-A1AFA3CF spawn-rate ceiling on the provisioned fraction. Three regimes where that trade is worth taking:
| Regime | Why warm helps | Cost-frame |
|---|---|---|
| Interactive demo | 5K×5L drops from ~6 s to ~2 s wall clock; live audience doesn't see latency | $0.10 per 60s warm window — eat it as a sales cost |
| SLA predict throughput | Cloud predict at scale needs 100s of leaves spawning together; cold-start tail blows P99 | $7/hour during business hours — charge it to the customer's SLA tier |
| Continuous training pipeline | Hourly model rebuild on fresh data, jobs back-to-back, want each rebuild <15 s | $84/day if always warm during work hours; tolerable if model rebuild is critical-path |
For everything else — ad-hoc training, exploratory benches, the "morning bruv let me try this dataset" flow — on-demand is the right default. v9's small-job wall clock (~6 s for 5K×5L) is bad enough to justify the toggle when latency is the product, fine to live with when correctness is the product.
| Profile | Volume | Toggle? | $/M train rows | What dominates |
|---|---|---|---|---|
| Infrequent | 1–5 jobs/month | off | ~$5,000/M | EC2 fixed cost — per-row math meaningless |
| Active | ~10K rows/day | off | $1.13/M + $325/M fixed | EC2 still 290× usage |
| Production | ~100K rows/day, spot toggle | 50% friction | $24.40/M + $32.50/M fixed | activation & fixed roughly even |
| Heavy | 1M rows/day, session toggle | 10% friction | $11.70/M + $3.25/M fixed | activation dominates compute |
| Industrial | 10M rows/day, always warm | 0% friction (saturated) | $3.34/M + $0.33/M fixed | compute ≈ activation, fixed gone |
The blue dashed line is on-demand — the toggle curve approaches 14× that and stops. Provisioning is a wall-clock product, not a cost product.
One t4g.2xlarge in eu-west-3, 8 vCPU / 32 GB:
On-demand : $0.1344/hour = $96.77/month Reserved 1y : $0.0907/hour = $65.30/month (32% off)
The gatherer hosts the FastAPI app (/v9/train, /v9/status,
/v9/model, /grid/v9) and runs the SQS drainer thread.
Idle CPU 99%, RAM 0.85 / 30 GB at current load — massively overprovisioned for
a single user, perfect for multi-tenant scale.
Two paths:
monceai.Snake.to_algorithmeai()Download /v9/model/{id}, instantiate algorithmeai.Snake(path),
predict locally. $0 per prediction, sub-millisecond per row on commodity hardware,
zero network roundtrip.
This is the recommended path for v9 because the model JSON is byte-equivalent in semantics to a locally-trained Snake. There is no operational reason to keep inference in the cloud.
/v9/predict/{id}Not yet implemented. When wired, expected:
v9-predict (1 GB, ~5ms warm): invoke : $0.0000002 compute : 1 GB × 0.005s × $0.0000166667 = $0.000000083 Total per Lambda ≈ $0.0000003 per call ≈ $0.30 per million predictions
Comparing v9 to closed-source classification APIs at 10K-row training scale:
| Provider | 10K-row train cost | Wall clock | Inference |
|---|---|---|---|
| SnakeBatch v9 | $0.001 | ~8s | $0 local / $0.30/M cloud |
| Vertex AI Tabular | $1–3 (sustained) | ~20m minimum | per-call billed |
| SageMaker Autopilot | $5–10 | ~1h | endpoint $/hr |
| OpenAI fine-tune classifier | $10+ per 10K | ~30m | token-billed |
The SnakeBatch number isn't a discount; it's a different cost class. SAT-by-construction means we don't pay for backprop, we don't pay for hyperparameter search, we don't pay for retries. The work is polynomial in n and linear in spot Lambda time. There is no model to "fit" beyond constructing the formula.
© 2026 Charles Dana · Monce SAS · SnakeBatch v9 · /paper · /architecture · /math