One page. Equations only. The way Charles thinks about it.
Let P be a finite population, T a discrete target, and 1C(x) the indicator of class C ⊂ P. The Dana Theorem (2024):
constructed in polynomial time by:
where oppose(t, f) returns a literal L with L(t) = 1, L(f) = 0
for some t ∈ C. Each clause excludes at least one non-member without
falsifying any member. The construction is O(|P|2 · m) where
m = |features|.
Partition P into n/b buckets via an IF/ELIF/ELSE chain. Each bucket holds ≤ b members, so SAT construction inside it is O(b2m).
Linear in n, linear in m, L = layers, b = bucket size (constant; default 250 in algorithmeai, user-controlled in v9).
For a query X routed to bucket B at layer l, let negated(X) = {i : clausei(X) = 0}. The lookalike pool is:
and predicted class probability is:
Continuous targets: each unique y is its own class. Perfect fit on training data is by construction (every row is its own singleton class with at least one defining clause).
Binary divide until slice size ≤ τ; then conquer hands the slice to one leaf. Per layer:
Total per training:
Divides chain through depth, conquers wait once for their leaf, leaves run in parallel:
With λinvoke ≈ 100ms warm, Tleaf ≈ 0.5s, bus_drain ≈ 1s:
Let path(X) be the divide-tree leaf X reaches. Each bucket carries a condition — a conjunction of literals along its path:
This is unique complete-AND per bucket. Equivalent: the assembled chain
is a flat IF/ELIF/ELSE where at most one bucket matches any X. traverse_chain's
"first match wins" is then order-independent.
If cond(B1) = cond(B2), both buckets fight for the
same X. traverse_chain returns B1; B2's lookalikes
never vote. Worse:
Cure. One leaf per route ⇒ Snake's build_bucket_chain is the
only source of bucket conditions, and its output respects unique complete-AND
by construction.
Snake's noise samples from local_pop = the routed slice.
Adding noise this way still respects cond(B) (the noise members are already on
the path). True regularization noise samples from:
If we sampled any p violating cond(B):
Hence v9: noise = 0 at the leaf, until post-routing global injection lands.
With Cleaf ≈ $2.7×10-5, Cdiv ≈ $5.4×10-6, Ccnq ≈ $10-5:
We have two training sizes per dataset (n=4000 held-out, n=5000 perfect-fit). Two points fit a linear extrapolation in the form T(n) = a + b·n. The mesh fan-out and SQS round-trip are baked into a; b is the true per-row leaf cost amortized across shards.
| Task | T(4000) | T(5000) | a (s) | b (ms / row) | Predict T (1K rows) |
|---|---|---|---|---|---|
| binary | 4.01s | 4.80s | 0.85 | 0.79 | 4.76s |
| 3-class | 3.72s | 4.66s | −0.04 | 0.94 | 4.44s |
| regression | 20.39s | 24.93s | 2.23 | 4.54 | 4.20s |
Two-point fits are honest about what they are. Reading the slopes:
Per-row cloud predict cost is not a constant. It traverses the assembled bucket chain, whose length grows with training size and layer depth. Two empirical points:
| Model trained on | Layers | Predict 1K rows | Rows/s |
|---|---|---|---|
| Toy ~800 rows | 3 | ~3.4s | ~1400 |
| Nature 22,147 rows | 5 | ~18s | ~250 |
So:
The dispatch threshold (§11) is therefore a function of which model the user is predicting against, not just how many rows they ship. The SDK's default conservatively assumes the lighter end of this range.
Let n = batch size at predict time, rl(M) = local algorithmeai per-row latency on assembled model M, rc(M) = cloud per-row latency on the same model at the leaf, RTT = HTTPS + Lambda warm + drainer overhead:
Cloud wins when Tcloud(n) < Tlocal(n). Solve:
Critically, both rl(M) and rc(M) scale together with the same model. Their difference is what governs the crossover, and the difference is small — the cloud isn't "faster per row," it's "more cores at once." A few empirical anchors:
| Model | rl(M) | rc(M) | RTT | n* |
|---|---|---|---|---|
| Toy ~800/3L | ~0.01ms | ~0.7ms | ~1s | ~1500 |
| Nature 22K/5L | ~0.05ms | ~4ms | ~1s | ~250 |
n* is lower on heavier models because the cloud's "n more parallel shards" advantage shrinks: each shard does more work per row, so adding 18 chunks doesn't 18× the throughput — the slowest leaf still gates wall clock.
The SDK default cloud_threshold = 500 (since v2.2.1) is the
no-information default; 500 rows guarantees the user sees the
mesh activate (a demo property) without sandbagging local speed. For
production workloads on a known model, override:
Snake(model_id=…, cloud_threshold=N)
where N is sized to the empirical rl / rc
of that model. Big-train, deep-layer models prefer
cloud_threshold closer to 100; toy models prefer 5000+.
get_audit,
get_lookalikes, get_augmented, get_candle read
the population dictionary. The SDK ships a stripped model locally (no population);
the mesh has the full one. Dispatch ignores n for these modes — cloud
unconditionally.| n | L | Wall | Cost | Fit on D |
|---|---|---|---|---|
| 103 | 5 | 3.8s | $0.0006 | 100% |
| 105 | 5 | ~20s | ~$0.005 | 100% |
| 106 | 5 | ~30s | ~$0.05 | 100% (predicted) |
| 107 | 5 | ~50s | ~$0.50 | 100% (predicted) |
© 2026 Charles Dana · Monce SAS · /paper/v9 · /architecture/v9 · /economics/v9