v9 spreads training across 100 identical Lambda functions (v9-worker-pipe-{0..99}).
Sharding is for blast-radius isolation, not throughput multiplication: AWS gates
on-demand spawn at the account-level L-A1AFA3CF rate (1000/min = 16.67/s), so 100 shards
share one budget. Three roles dispatched by SQS message type: divide recursively halves until
slice ≤ τ, conquer hands the slice to one leaf, leaf calls
algorithmeai.Snake(local_pop, n_layers=1, bucket=user_bucket, noise=0) and emits
literals to a single report queue. An EC2 c7g.large gatherer assembles the model from
report messages.
The load-bearing invariant: every assembled bucket carries a unique complete-AND
condition. Snake's native build_bucket_chain is honored at the leaf, so
the v9 model.json is structurally a flat IF/ELIF/ELSE chain that
algorithmeai.Snake rehydrates and traverses byte-for-byte identically to
a locally-trained model.
The earlier conquer-then-fan design chunked indices into ⌈n/bucket⌉
slices and fired one leaf per chunk, all sharing the same cond_prefix. After
assembly, the layer contained multiple buckets with identical conditions:
layer 0 (broken):
bkt[0]: cond=[score < 50], members={32 of B}, lookalikes={0..31: ...}
bkt[1]: cond=[score < 50], members={32 of B}, lookalikes={0..31: ...}
bkt[2]: cond=[score < 50], members={32 of B}, lookalikes={0..31: ...}
...
bkt[10]: cond=[score ≥ 50, x < -3], members={32 of A}, ...
algorithmeai's traverse_chain walks top-to-bottom and returns the
first matching bucket. So a row routed to score < 50 landed in
bkt[0], saw 32 lookalikes, and the other 9 buckets carrying
165+ additional B-members became dead weight — their lookalikes never voted.
This produced phantom probability mass: a row with lookalike_tally = {A: 50}
(unanimous) returned {A: 0.76, B: 0.24}. The 0.24 of B came from thin-pool
fallback — a vote starved by undersampling.
The bucket parameter was being misused as a slicing knob at the conquer level, when it is supposed to be Snake's native partition size inside the leaf. The fix:
n ≤ τ indices.Snake(local_pop, bucket=user_bucket) and lets Snake's
build_bucket_chain do the IF/ELIF/ELSE partition with native per-bucket conditions._globalize prepends cond_prefix to each Snake-emitted condition:
[*divide_path, *snake_subcondition]. Every assembled bucket gets a unique
complete-AND, by construction.Before vs after, identical 1000-row 3-class Gaussian (well-separated, σ=1, centers ≥6):
| Variant | Fit | max_p mean | max_p min | p ≥ 0.99 |
|---|---|---|---|---|
| v9 cloud (broken, chunked) | 100% | 0.9978 | 0.5963 | 989 / 1000 |
| v9 cloud (one-leaf-per-conquer) | 100% | 1.000 | 1.000 | 1000 / 1000 |
| algorithmeai (local twin) | 100% | 1.000 | 1.000 | 1000 / 1000 |
Tree shrunk from 166 leaves/job to 23 leaves/job for the same 5L training, because Snake's native partition is class-aware (4 buckets for 500 mixed-class rows, not 16). Wall time 6.06s — slower per-leaf, but the tree is dramatically simpler.
/v9/model/{id}, do Snake(path), and inference is
identical to local training. No reconciliation logic, no special-casing, no fallback votes.
Client (monceai SDK)
↓ HTTPS
EC2 gatherer (c7g.large) — FastAPI: /v9/train /v9/status /v9/model
↓ SQS-style invoke
v9-worker-pipe-{0..99} — one binary, three roles by message:
role=divide: role=conquer: role=leaf:
---------- ----------- ---------
n ≤ τ ? fan ONE leaf with all Snake(local_pop,
no → oppose → left/right indices ≤ τ. n_layers=1,
fire 2 divides wait DDB ack. bucket=user_bucket,
yes → fire 1 conquer. emit None. noise=0)
emit None. SQS report → gatherer
Gatherer SQS drainer thread → assembles layers → model.json
v6 was one Lambda function (recursive). v9 is one Lambda binary deployed to 100 functions, dispatched by SQS message type. The reason isn't aggregate throughput (L-A1AFA3CF caps that at 1000/min account-wide regardless of how many functions you declare); it's blast-radius isolation. v6's recursive self-invocation under the May-2026 rate-limit tightening burned $1057 in 19 hours retrying into its own throttle. v9's 100 functions cap each shard's spawn at 1% of the budget, so a misbehaving job hits its own shard's wall instead of draining the account. Fan-out parallelism is governed by L-A1AFA3CF (free) plus toggled provisioned slots (paid) — see §7 and /concurrency §4.
The other change: v6 had a tautology shim in the leaf (_tautological_layer)
to handle 1-class slices. v9 removed it. Snake handles the 1-class case natively
(emits empty clauses + position-keyed lookalikes), and the tautology shim was emitting
target-name lookalike keys incompatible with get_lookalikes_fast's
int(key) cast. Removing the shim fixed inference at zero cost.
Live run, 2026-05-21, all predicts via /v9/predict-sync
(cloud_threshold = 1, every row goes through the mesh).
n_layers=5, bucket=250, noise=0.25.
Two passes per dataset:
perfect-fit (train on 5K, predict on the same 5K) and
held-out 80/20 (train on 4K, predict on 1K unseen).
| Dataset | Task | Pass | Train (s) | Predict (s) | Metric |
|---|---|---|---|---|---|
| Binary Gaussian (5K) | classification | perfect-fit (5K→5K) | 4.80 | 6.09 | acc 100.00% |
| held-out 80/20 (4K→1K) | 4.01 | 4.76 | acc 100.00% | ||
| 3-class numeric+text (5K) | classification | perfect-fit | 4.66 | 6.26 | acc 100.00% |
| held-out 80/20 | 3.72 | 4.44 | acc 100.00% | ||
| y = 2.5a + b² + N(0,0.5) (5K) | regression | perfect-fit | 24.93 | 3.74 | R² = 1.0000, RMSE = 0.00 |
| held-out 80/20 | 20.39 | 4.20 | R² = 0.9845, RMSE = 0.96, MAE = 0.70 |
Numbers persist in v9/bench/results_3x5k.json with model_id and
JSON path for every model trained, so any of these can be reloaded with
algorithmeai.Snake(path) for offline replay.
The 5K bench in §6 demonstrates correctness. This section extrapolates to 1M rows under the empirical wall-clock fit, prices the AWS bypass needed to hold that curve, and gives a closed-form upper bound. Default training config holds τ = 1000 throughout — we do not tune it for cost.
Six sequential runs at L=25 (synthetic regression, N from 2K to 4K) fit a power law with R² = 0.994:
This is observably the same regime described in /economics §5.4: mesh amortization across overlapping layers, parent-poll cost diluted by depth, leaves dominating per-row time. We treat it as the v9 wall-clock model up to the point where peak invocations/second crosses the AWS spawn ceiling — past that, the run does not slow down, it fails (see §7.3).
L-A1AFA3CF is binary, not soft. A 1-minute window at peak rate > 16.67/s without bypass causes AWS to reject the excess invocations; the divide tree's parent-poll waits never resolve, the gatherer sees zero leaves return, and the job wedges. CloudWatch's 10-second resolution shows this; the 1-minute SUM of throttles can read 0 while children downstream are silently dying. Empirically observed at N=7500, L=25, τ=1000: peak 17.40/s on the 15:38Z bar, full mesh stall.
So the feasibility predicate is binary:
FEASIBLE(N, L, τ, N_slots) ≡
L · ⌈N/τ⌉ / spawn_window(N, τ) ≤ 16.67 + N_slots / t_leaf
with spawn_window(N, τ) ≈ 3.8 s · ⌈log₂(N/τ)⌉ (measured)
t_leaf ≈ 0.7 s (4 GB warm)
Solving the predicate for the minimum bypass slots needed to keep peak rate under the ceiling at τ=1000:
Slot demand grows super-linearly in N because the depth-log dilution is weaker than the linear leaf count growth. This is the single most important fact about v9 economics at τ=1000: holding the N0.777 wall-clock curve through 1M rows requires bypass capacity that scales nearly with N itself.
Two terms: leaf compute (linear in total leaves × · layers) and provisioned-slot reservation (linear in slot-count × wall-clock). Compute is calibrated against the 5K bench at $1.13/M training rows (/economics §3.1). Reservation is the eu-west-3 4 GB rate, $0.0000048673/GB-s, paid per second the toggle is on:
Substituting the asymptotic forms:
$_compute = O(L · N) (linear leaves × per-leaf bill) $_reservation = O(L2 · N1.777 / log N) (slots × wall clock) $_total = O(L2 · N1.777 / log N) for large N
The cost is super-linear in N once bypass kicks in. This is the counterpart to the sub-linear N0.777 time complexity: holding wall clock sub-linear at fixed τ costs super-linear money. The two together are internally consistent — total Lambda-seconds (compute) bills the integral of work and is linear; reservation bills the wall-clock window times the slot count, and the slot count itself grows nearly with N.
The crossover at L=25 is around N≈30K: below it, compute dominates and the toggle is a rounding error; above it, the reservation term takes over and each doubling of N more than doubles total cost.
Plugging N = 106, τ=1000, t_leaf=0.7s into the equations:
| L | peak rate | slots needed | wall clock | $ compute | $ reservation | $ total |
|---|---|---|---|---|---|---|
| 1 | 26.3/s | 7 | 78 s | $0.23 | $0.01 | $0.24 |
| 5 | 132/s | 81 | 6.5 min | $1.13 | $0.62 | $1.75 |
| 25 | 658/s | 449 | 33 min | $5.65 | $17.16 | $22.81 |
The naive extrapolation "5K cost × 200 = 1M cost" gives $0.05, which is the number that lived in /economics §3 for months. That number is wrong by 35×. It assumes free L-A1AFA3CF headroom — true at 5K (peak rate ~7/s, sub-ceiling), false at 1M (peak rate 132/s, four ceilings deep). The compute piece extrapolates linearly; the reservation piece is what makes the run feasible at all, and it is not in the 5K bench because the 5K bench did not need it.
The honest /economics-line-69 number for "1M×5L" at τ=1000 is $1.75, not $0.05. Past 1M, the L2·N1.777 term dominates and SnakeBatch becomes a paid product on AWS rather than a free one. The lever to bend this back to linear is fixing the parent-poll chain-serialization (see /architecture §8), which dilutes the spawn-window per recursion level and lowers the slot requirement at fixed wall-clock. Until then, the τ=1000 invariant pins us to this curve.
Snake's noise parameter samples from local_pop, which post-divide
is the routed slice. Injecting that as noise inside the leaf doesn't add
diversity — routed members already satisfy the path's AND. Worse, if Snake ever
picked noise members violating the path, it would reintroduce the Bayesian
contradiction. So v9 hard-codes noise=0 at the leaf.
True regularization noise = members of the global population that satisfy the path's AND but were not selected by oppose for routing. That's a gatherer-side or pre-leaf-augment problem, not Snake's internal noise. Reserved for the next iteration.
v9/ worker/handler_snake.py one binary, three roles gatherer/v9_train.py FastAPI: /v9/train /v9/status /v9/model gatherer/grid_v9.py live grid: /grid/v9 esquisses/ perfect-fit probes + Bayesian-contradiction writeup
© 2026 Charles Dana · Monce SAS · SnakeBatch v9 · /architecture · /economics · /concurrency · /math