The Math — SnakeBatch

Charles Dana · Monce SAS · May 2026

/paper/v9 · /architecture/v9 · /economics/v9

One page. Equations only. The way Charles thinks about it.


1. Snake as Indicator-to-CNF

Let P be a finite population, T a discrete target, and 1C(x) the indicator of class C ⊂ P. The Dana Theorem (2024):

∀ C ⊂ P, ∃ φC ∈ CNF :  1C(x) = ¬φC(x)

constructed in polynomial time by:

φC = ∧f ∉ Ck oppose(tk, f)

where oppose(t, f) returns a literal L with L(t) = 1, L(f) = 0 for some t ∈ C. Each clause excludes at least one non-member without falsifying any member. The construction is O(|P|2 · m) where m = |features|.


2. Bucketing Linearizes

Partition P into n/b buckets via an IF/ELIF/ELSE chain. Each bucket holds ≤ b members, so SAT construction inside it is O(b2m).

Ttrain(n, m, L, b) = O(L · n · b · m)

Linear in n, linear in m, L = layers, b = bucket size (constant; default 250 in algorithmeai, user-controlled in v9).


3. Lookalike Probability

For a query X routed to bucket B at layer l, let negated(X) = {i : clausei(X) = 0}. The lookalike pool is:

Λl(X) = { m ∈ B.members : Λ-condm ⊆ negated(X) }

and predicted class probability is:

P(c | X) = ( ∑l=1..L |{m ∈ Λl(X) : T(m) = c}| ) / ( ∑l=1..Ll(X)| )

Continuous targets: each unique y is its own class. Perfect fit on training data is by construction (every row is its own singleton class with at least one defining clause).


4. v9 Tree Topology

Binary divide until slice size ≤ τ; then conquer hands the slice to one leaf. Per layer:

depth = ⌈ log2(n / τ) ⌉  ·  leaves = conquers = ⌈ n / τ ⌉  ·  divides = ⌈ n / τ ⌉ − 1

Total per training:

W(n, L) = L · (3 · ⌈ n/τ ⌉ − 1)

5. Wall Clock

Divides chain through depth, conquers wait once for their leaf, leaves run in parallel:

Twall(n, L) ≈ depth · λinvoke + Tleaf + bus_drain

With λinvoke ≈ 100ms warm, Tleaf ≈ 0.5s, bus_drain ≈ 1s:

Twall(107, 5) ≈ 13 · 0.1 + 0.5 + 1 = 2.8s  per layer = ~14s for 5L parallel-invoked layers

6. The Routing Invariant (Bayesian)

Let path(X) be the divide-tree leaf X reaches. Each bucket carries a condition — a conjunction of literals along its path:

cond(B) = ∧ii,   X ∈ B ⇔ X satisfies cond(B)
Invariant. For any two assembled buckets B1, B2:
cond(B1) ≠ cond(B2)

This is unique complete-AND per bucket. Equivalent: the assembled chain is a flat IF/ELIF/ELSE where at most one bucket matches any X. traverse_chain's "first match wins" is then order-independent.


7. The Bayesian Contradiction (what we removed)

If cond(B1) = cond(B2), both buckets fight for the same X. traverse_chain returns B1; B2's lookalikes never vote. Worse:

For X with Ptrue(c | X) = 1 and unanimous lookalikes in B2,
the model returns Pobs(c | X) = |Λ1(X)| / total
which can be < 1 because B2's mass is missing.
The thin pool then hallucinates opposing class probability mass.

Cure. One leaf per route ⇒ Snake's build_bucket_chain is the only source of bucket conditions, and its output respects unique complete-AND by construction.


8. The Noise Trap

Snake's noise samples from local_pop = the routed slice. Adding noise this way still respects cond(B) (the noise members are already on the path). True regularization noise samples from:

Ncond(B) = { p ∈ Pglobal : p satisfies cond(B), p ∉ routed(B) }

If we sampled any p violating cond(B):

P(c | X) would integrate over members violating cond(B) — a Bayesian contradiction: voters must satisfy the path's AND.

Hence v9: noise = 0 at the leaf, until post-routing global injection lands.


9. Cost

$(n, L) = L · ( ⌈ n/τ ⌉ · Cleaf + (⌈ n/τ ⌉ − 1) · Cdiv + ⌈ n/τ ⌉ · Ccnq )

With Cleaf ≈ $2.7×10-5, Cdiv ≈ $5.4×10-6, Ccnq ≈ $10-5:

$(107, 5) ≈ 5 · (313 · 2.7×10-5 + 312 · 5.4×10-6 + 313 · 10-5) ≈ $0.50

10. Empirical Time Complexity (3 datasets × 5K, May 2026)

We have two training sizes per dataset (n=4000 held-out, n=5000 perfect-fit). Two points fit a linear extrapolation in the form T(n) = a + b·n. The mesh fan-out and SQS round-trip are baked into a; b is the true per-row leaf cost amortized across shards.

TaskT(4000)T(5000)a (s)b (ms / row)Predict T (1K rows)
binary4.01s4.80s0.850.794.76s
3-class3.72s4.66s−0.040.944.44s
regression20.39s24.93s2.234.544.20s

Two-point fits are honest about what they are. Reading the slopes:

Asymptotic form holds. Ttrain(n,L,b,m) = O(L · n · b · m) predicted by §2; measured per-row slopes are within the expected range for the leaf cost model. The mesh's fan-out keeps the multiplier flat across shards rather than serial, so n grows in the horizontal axis where the algorithm is linear.
Train wall clock vs n — measured slopes n (training rows) wall clock (s) 0 2K 4K 5K 6K 0 10 20 30 regression T = 2.23 + 4.54·n/1000 3-class T = -0.04 + 0.94·n/1000 binary T = 0.85 + 0.79·n/1000

Predict slope — depends on the model, not just n

Per-row cloud predict cost is not a constant. It traverses the assembled bucket chain, whose length grows with training size and layer depth. Two empirical points:

Model trained onLayersPredict 1K rowsRows/s
Toy ~800 rows3~3.4s~1400
Nature 22,147 rows5~18s~250

So:

Tpredict(n; M) ≈ RTT + n · rc(M)   where rc(M) scales with assembled-bucket-chain length

The dispatch threshold (§11) is therefore a function of which model the user is predicting against, not just how many rows they ship. The SDK's default conservatively assumes the lighter end of this range.


11. SDK Dispatch Threshold (when local beats cloud)

Let n = batch size at predict time, rl(M) = local algorithmeai per-row latency on assembled model M, rc(M) = cloud per-row latency on the same model at the leaf, RTT = HTTPS + Lambda warm + drainer overhead:

Tlocal(n) ≈ n · rl(M)  ·   Tcloud(n) ≈ RTT + n · rc(M)

Cloud wins when Tcloud(n) < Tlocal(n). Solve:

n* = RTT / (rl(M) − rc(M))

Critically, both rl(M) and rc(M) scale together with the same model. Their difference is what governs the crossover, and the difference is small — the cloud isn't "faster per row," it's "more cores at once." A few empirical anchors:

Modelrl(M)rc(M)RTTn*
Toy ~800/3L~0.01ms~0.7ms~1s~1500
Nature 22K/5L~0.05ms~4ms~1s~250

n* is lower on heavier models because the cloud's "n more parallel shards" advantage shrinks: each shard does more work per row, so adding 18 chunks doesn't 18× the throughput — the slowest leaf still gates wall clock.

The SDK default cloud_threshold = 500 (since v2.2.1) is the no-information default; 500 rows guarantees the user sees the mesh activate (a demo property) without sandbagging local speed. For production workloads on a known model, override:

Snake(model_id=…, cloud_threshold=N)

where N is sized to the empirical rl / rc of that model. Big-train, deep-layer models prefer cloud_threshold closer to 100; toy models prefer 5000+.

Population modes are always cloud. get_audit, get_lookalikes, get_augmented, get_candle read the population dictionary. The SDK ships a stripped model locally (no population); the mesh has the full one. Dispatch ignores n for these modes — cloud unconditionally.

11. Headline

nLWallCostFit on D
10353.8s$0.0006100%
1055~20s~$0.005100%
1065~30s~$0.05100% (predicted)
1075~50s~$0.50100% (predicted)
Perfect fit on D is not aspirational. It's a theorem of the construction. The Dana Theorem guarantees φC exists in poly time; v9 produces it distributed, with unique complete-AND per bucket; algorithmeai inference resolves it identically to a local Snake. The 100% column is what the math says, every time.

© 2026 Charles Dana · Monce SAS · /paper/v9 · /architecture/v9 · /economics/v9