SnakeBatch × Excel

Charles Dana · Monce SAS · May 2026

A predictive layer for the spreadsheet. SAT classification, missing-data inference, lookalike audit — native cell formulas.

SnakeBatch Paper Architecture Economics CSV Demo

1. The pitch in one sentence

Excel ships =TREND, =FORECAST.LINEAR, =FORECAST.ETS. None of them touch text, mixed types, categories, or industrial vocabulary. SnakeBatch does. We add three formulas to Excel and the spreadsheet becomes a predictive workbench:

=PREDICT(train, target, test)
=PREDICT.CANDLE(train, target, test)
=FILL(table)
=AUDIT(train, target, test)
=POTENTIAL(table, target)

Train via snakebatch.aws.monce.ai. Model lives inside the .xlsx as a Custom XML Part. Predictions spill across rows the same way =FILTER does. The workbook becomes a self-contained ML artifact — email it, share it, the model goes with it.

2. Three problems Excel doesn't solve today

Problem	What Excel does	What SnakeBatch does
Missing data FILL	Leaves `#N/A`, drops the row from charts, breaks `=AVERAGE`.	Trains a Snake on the columns we DO have, predicts the blank from the rest of the row, fills it with the most logical value plus a confidence.
Mixed-type classify PREDICT	`LINEST` only takes numbers. Categorical features need one-hot manually. Test column with stray text? Crashes.	Auto-detects `N/I/B/T` per column from the training set. Test rows with messier formatting are coerced via the same datatype map. No crash, no manual encoding.
Why this prediction? AUDIT	Trendline R² tells you "fit is OK overall". Nothing tells you why row 47 got the answer it got.	Snake stores lookalikes per bucket. Every prediction comes with the 5-10 training rows that look most like this one. Auditor-grade explainability, native to the cell.

3. `=FILL` — missing data, magically

User cost: select the table. Hit one button (or type =FILL(A1:E120)). Done.
What happens: for every column with blanks, SnakeBatch trains one Snake using the other columns as features, then predicts each missing cell.

The cell view

Client

Article

Qty

Délai

Prix

LGB

44.2 rTherm

120

4280€

VME

33.1 LowE

—

2640€

ACTIF

44.2 Silence

200

—

33.1 LowE

2640€

LGB

44.2 rTherm

120

4280€

↓ =FILL(A1:E5)

Client

Article

Qty

Délai

Prix

LGB

44.2 rTherm

120

4280€

VME

33.1 LowE

4j (0.91)

2640€

ACTIF

44.2 Silence

200

7140€ (0.83)

VME (0.78)

33.1 LowE

2640€

LGB

44.2 rTherm

120

4280€

Under the hood

# For each column with blanks:
for col_with_blanks in table:
    train = rows_where(col_with_blanks is not blank)
    test  = rows_where(col_with_blanks is blank)
    snake = Snake(train, target_index=col_with_blanks)   # 1.8s for 891 rows
    fill[col_with_blanks] = [snake.get_prediction(r) for r in test]

One Snake per column with blanks, all trained in parallel on Lambda. A 5-column / 1000-row table with 3 sparse columns → 3 Lambdas fire concurrently → ~2s wall clock.

4. `=PREDICT` — train / target / test

Train range
A1:E120

→

Target col
"Délai"

→

Test range
G1:K30

⇒

Spilled predictions
L1:L30

' Excel cell:
=PREDICT(A1:E120, "Délai", G1:K30)     → spills 30 predictions down L
=PREDICT.PROB(A1:E120, "Délai", G1:K30) → spills 30 × |classes| matrix
=PREDICT.TOP(A1:E120, "Délai", G1:K30, 3) → top-3 candidates per row

Datatype handling — the test-trickier-than-train problem

Real spreadsheets are messy. Train says "5j", test says "5 jours". Train says 120, test says "120,00". Train says "LGB", test says "LGB Menuiserie SAS". SnakeBatch handles this in two layers:

Layer	What it does	Where it lives
Datatype lock	Train infers `N/I/B/T` per column. Test rows are coerced through the same map — `"120,00"` → `120.0`, blanks → `0.0` for numeric, `""` for text.	`_preprocess()` in `app.py`
Reliability overlay (monce-suite)	For factory-bound columns (`client`, `verre`, `intercalaire`): canonical-ID matching first, Snake second. `"LGB Menuiserie SAS"` → `9232` before classification.	`claude.aws/stage_0`, `snake.aws/query`

Why the overlay matters: a Snake trained on "LGB" doesn't recognize "LGB Menuiserie SAS". The matching layer normalizes both to 9232 before the Snake sees them. Industrial knowledge inside the predictor. This is the part Mathieu cares about.

5. `=AUDIT` — the killer feature

Every prediction comes with its 5 closest training rows.

This is what no other Excel ML tool can do. LINEST gives you coefficients. FORECAST.ETS gives you a confidence band. Snake gives you receipts.

For row 47's prediction "Délai = 7j (0.83)", the audit cell shows the 5 historical orders Snake routed into the same SAT bucket — same article family, same factory, similar quantity. The user sees WHY before they trust the answer.

The cell view

Client

Article

Qty

Prédit

Audit (lookalikes)

VITRO

44.2 Silence

180

7j (0.83)

5 similar past orders:
· LGB / 44.2 Silence / 200 → 7j
· VITRO / 44.2 rTherm / 150 → 7j
· ACTIF / 44.2 Silence / 200 → 7j
· LGB / 44.2 Silence / 160 → 6j
· VITRO / 44.2 Silence / 220 → 8j

Three audit modes

Formula	Returns	Use case
`=AUDIT(...)`	Top-5 lookalikes as a single cell, multi-line text.	Glanceable. One column added next to predictions.
`=AUDIT.SPILL(...)`	Spills 5 rows × train_columns next to each test row.	Side-by-side comparison. Power users.
`=AUDIT.WHY(...)`	The SAT clauses that fired: e.g. `article startswith "44.2" AND qty >= 150`.	Reasoning trace. For compliance / debugging.

Why the (c)/(n) split matters

Snake v5+ labels every lookalike as core (c) — routed to the bucket by SAT condition — or noise (n) — randomly injected for regularization. The audit cell can show both, with origin tags. Core lookalikes answer "what does the rule say?". Noise lookalikes answer "what does the broader population say?". Two independent signals on the same prediction.

6. `=PREDICT.CANDLE` — regression as a financial candle

Don't collapse the distribution. Show it.

Snake regression doesn't return one number — every prediction is backed by N lookalikes, each with their own target value. That's a distribution. Collapsing it to a mean throws away the most useful information: how confident, how wide, how the SAT-routed peers compare to the population baseline.

Borrow the OHLC candle from finance. Map the lookalike distribution onto Open / High / Low / Close. Drop it straight into Excel's built-in Stock chart. The user reads predicted ranges the same way a trader reads a chart — in seconds.

The mapping — (c)/(n) becomes body/wick

Candle part	Source	Meaning
Open	Mean of noise lookalikes (n)	What the population baseline says — "if I knew nothing about this row".
Close	Mean of core lookalikes (c)	What the SAT-routed peers say — "this row is similar to those".
High	P95 of all lookalikes	Upper plausible value.
Low	P05 of all lookalikes	Lower plausible value.
Color	Close > Open → green Close < Open → red	Green = SAT specificity pulls the value up from baseline. Red = pulls it down. Doji = no signal beyond baseline.

The candle tells you what the model knows: the body is the specificity premium (c minus n), the wicks are the residual uncertainty across all 30+ peers. A tall body + short wicks = strong, narrow signal. A short body + long wicks = weak signal, hedge the decision.

The cell view — predicting "Prix" in euros

Client

Article

Qty

Open

High

Low

Glyph

VITRO

44.2 Silence

180

3,100

5,400

2,890

4,720

△

LGB

33.1 LowE

2,640

2,790

2,510

2,650

▮

ACTIF

44.2 rTherm

250

7,200

7,400

5,100

5,890

▽

Row 47: SAT specificity adds +1,620€ over baseline → tall green candle, narrow wicks. Row 48: tiny body, almost a doji → standard pricing, no surprises. Row 49: SAT pulls -1,310€ below baseline → red candle — this client / article combination historically discounts.

The formula family

' Single cell — spills 4 columns: Open | High | Low | Close
=PREDICT.CANDLE(train, "Prix", test_row)

' Spilled across many test rows — (n × 4) array, drop straight into Excel Stock chart
=PREDICT.CANDLE(train, "Prix", G1:K30)

' One-cell glyph — Unicode candle, color-coded
=PREDICT.CANDLE.GLYPH(train, "Prix", test_row)     → △ / ▮ / ▽

' Distribution endpoint — spill all lookalike values for histogram / sparkline
=PREDICT.DIST(train, "Prix", test_row)             → spills [v1, v2, ..., vk]

' Confidence interval — alternative to candle when chart isn't needed
=PREDICT.CI(train, "Prix", test_row, 0.9)          → { mean, lo, hi }

Why this plugs into Excel natively

Excel feature	How `=PREDICT.CANDLE` uses it
Stock chart (Insert → Chart → Stock)	Excel's built-in OHLC chart accepts exactly the four-column layout we spill. Zero glue code.
Conditional formatting	Color cells by `(Close − Open) / Close`. Strong-signal rows light up.
Sparkline	`=PREDICT.DIST` + Sparkline column → histogram of lookalike values right next to the prediction.
Pivot / Slicer	Slicer filters the test range → candles recompute live. The TCD-style flow we already pitched.

What the user does with it — three real workflows

Quote pricing. Candle's Low = floor (don't price below this without a discount approval). High = ceiling (above this, expect customer pushback). Body = negotiation room. The salesperson reads the candle and prices in seconds.
Lead-time risk. For =PREDICT.CANDLE(orders, "Délai", upcoming): a long-wick green candle means "expected fast, but tail risk". Order it early.
Anomaly screen. If the candle's color + body shape on a quote disagrees with what was just typed, flag for review. The model becomes a second pair of eyes on every line.

Why it's better than a confidence interval

Confidence interval	Candle
Two numbers (lo, hi). Symmetric assumption.	Four numbers. Asymmetric tails captured naturally.
Doesn't show "what specificity buys you".	Body = specificity premium; you see the gap.
Renders as text or a single bar.	Renders as a real chart Excel ships natively.
Requires distributional assumptions to compute honestly.	Empirical: just percentiles of lookalike values. No parametric assumptions.

Why this is on-brand for Snake: the candle isn't bolted on — it's the natural rendering of what Snake already produces. Every bucket already stores its members with origins (c/n) and target values. =PREDICT.CANDLE just reshapes that into OHLC. The model didn't need to change. The output did.

7. `=POTENTIAL` — can this column even be predicted?

Before training a real model, ask the data a question: "is there signal here at all?" =POTENTIAL runs an 80/20 split, trains a Snake on 80%, scores the held-out 20%, returns AUROC and optimal accuracy. Two cells. Five seconds. The user knows whether to invest in this prediction or pick a different target column.

The cell view

' Excel cell:
=POTENTIAL(A1:E1000, "Délai")     → spills two cells: AUROC | Optimal accuracy

' Result spilled into B12:C12
AUROC: 0.91   |   Opt accuracy: 84% (threshold 0.55)

' Multi-target sweep — spill across rows
=POTENTIAL.SCAN(A1:E1000)            → one row per candidate target

' Result
Délai     0.91   84%   (predictable)
Prix      0.78   71%   (some signal)
Client    0.52   53%   (essentially random — don't bother)

Why this matters more than R²

Question	What `LINEST` R² tells you	What `=POTENTIAL` tells you
Does this column have signal?	Linear signal only. Categorical target? Useless. Mixed text? Crashes.	SAT-detectable signal in any datatype. Held-out validation, not in-sample fit.
How well will it generalize?	Doesn't know — R² is in-sample.	AUROC on held-out 20%. Honest number.
What's the right threshold?	N/A.	Optimal accuracy + the threshold that achieves it. Drops straight into a downstream `=IF`.
Which target is most predictable?	Run regression 5 times manually.	`=POTENTIAL.SCAN` sweeps every candidate column. One formula.

The pitch in a phrase

=POTENTIAL is the data-quality probe. Before you trust =PREDICT, run =POTENTIAL. If AUROC < 0.6 the column doesn't have signal — no model will fix that.

Under the hood

# server-side, one Lambda call
def potential(table, target):
    train, test = stratified_split(table, target, ratio=0.8, seed=42)
    snake = Snake(train, target_index=target, mode="fast")
    probs = [snake.get_probability(r) for r in test]
    auroc = roc_auc(probs, test_labels)
    opt_acc, opt_thr = sweep_threshold(probs, test_labels)
    return {"auroc": auroc, "accuracy": opt_acc, "threshold": opt_thr}

Stratified split keeps class balance honest. Seeded for reproducibility — same workbook, same number, every time. fast mode (25 layers / bucket 16) is the right tradeoff: ~1s per call on snakebatch, AUROC within 1-2pp of full heavy mode.

8. Storing the model safely in the `.xlsx`

The .xlsx format is a zip of XML parts. There are three real places we can park a Snake model. Only one is built for it.

Storage	Capacity	Survives	Visible	Verdict
Custom XML Part recommended	Multiple MB. No practical limit.	Save, Save As, copy, email, OneDrive sync, Google Sheets import.	Hidden from the user. Visible to programmatic readers.	The right answer. Built by Microsoft for exactly this use case.
Defined Name	~32 KB max.	Save, Save As, email.	Visible in Name Manager.	Tiny models only. Discoverable by curious users (good or bad).
Very Hidden Sheet	Cell limit ≈ 32K chars; tile across many cells.	Save, email. Wiped if a user runs "Inspect Document".	Invisible from UI but detectable by VBA.	Crude. Only if Custom XML Parts aren't reachable.

The Custom XML Part layout

┌─ workbook.xlsx (zip) ──────────────────────────────┐
│  /xl/worksheets/sheet1.xml         — user data     │
│  /xl/customXml/item1.xml           — model_stripped.json  │
│  /xl/customXml/itemProps1.xml      — schema id     │
│  /xl/customXml/_rels/item1.xml.rels                  │
│  /xl/_rels/workbook.xml.rels       — binds part     │
└────────────────────────────────────────────────────┘

Namespace: urn:monceai:snake:v1
Schema id: {ec2db94a-snake-monceai-2026}
Payload:   gzip+base64 of model_stripped.json   # 30-80% smaller than raw JSON
Header:    {"v":"5.4.5", "id":"v6-...", "sha256":"...", "trained_at":"..."}

Safety properties — what we get for free, what we add

Property	Built-in?	How we enforce it
Persistence across save/share	Yes	Custom XML Parts are part of the OOXML spec. Excel preserves them blindly.
Tampering detection	No	SHA-256 of payload stored in the header. `=PREDICT` verifies before evaluating — mismatch → `#CORRUPT`.
Confidentiality (read-protect)	No	Optional AES-256 with a key derived from the workbook's password (if set). Without password, model is readable — same as the data.
Versioning	No	Header carries Snake version + schema. `=PREDICT` from a v5 model in a v6 add-in still works (read-compat is held).
Size	Yes	gzip the JSON. A 200 KB raw model becomes ~50 KB on disk — smaller than most embedded images.
"Inspect Document" survival	Caveat	Office's Document Inspector lists Custom XML Parts and offers to remove them. We name our part visibly (`monceai-snake-model`) so the user knows what it is. If they remove it, `=PREDICT` falls back to cloud.

The cloud fallback — never break the cell

def predict_cell(test_row):
    model = read_custom_xml_part("monceai-snake-model")
    if model and verify_sha256(model):
        return evaluate_local(model, test_row)        # instant, offline
    if model_id in workbook_metadata:
        return snakebatch_predict(model_id, test_row)   # cloud, ~150ms
    return "#NO_MODEL"

Three layers of safety: local first (fast, offline), cloud fallback (the model_id is also stored as a workbook custom property, so even if the embedded JSON is wiped we can re-fetch), graceful failure (#NO_MODEL instead of #REF! — tells the user to retrain).

Industrial relevance: a quote-validation workbook trained on a factory's history can be sent to the sales team. The model knows what "44.2 Silence" means in that factory. It works offline, on the train, in a meeting. The factory knowledge ships with the file — integrity-checked, version-tagged, and recoverable from cloud if the part is stripped.

9. Billing — tokens, the equation, the spec

Goal: a token unit that maps cleanly to AWS reality, leaves a healthy margin, and scales linearly — so 100 customers don't break us, and 1 customer doesn't underpay. Every cell carries its AWS pass-through cost back to the workbook. The user must be logged in to use Excel Desktop — that's the billing hook.

8.1 The token equation

One token = $0.00001 of underlying AWS cost (compute + storage + requests). Sale price: $0.00005 / token — a flat 5× markup. Bulk pack: $50 / 1M tokens.

# Authoritative cost-to-tokens function. Lives in api/billing.py.
# Constants from /economics: ARM64 Graviton2, eu-west-3, April 2026.
GB_SEC     = 0.0000166667     # Lambda 10GB-class compute, $/GB·s
INVOKE     = 0.0000002        # Lambda request, $/invoke
S3_GBMO    = 0.023             # S3 standard, $/GB·month
S3_PUT     = 0.000005         # S3 PUT, $/op
S3_GET     = 0.0000004        # S3 GET, $/op
DDB_RW     = 0.00000125       # DynamoDB on-demand write, $/op

TOKEN_USD  = 0.00001          # 1 token = $0.00001 AWS cost (cost basis)
SELL_USD   = 0.00005          # 1 token sold for $0.00005 (5× markup)

def tokens_for(call):
    aws = (
        call.lambda_gb_seconds * GB_SEC
      + call.lambda_invokes    * INVOKE
      + call.s3_mb_month * 1e-3 * S3_GBMO
      + call.s3_puts           * S3_PUT
      + call.s3_gets           * S3_GET
      + call.ddb_writes        * DDB_RW
    )
    return math.ceil(aws / TOKEN_USD)        # round UP — never under-charge

8.2 What things actually cost, in tokens

Action	AWS cost	Tokens	Sale ($)	Margin
`=POTENTIAL` on 1K rows	$0.003	300	$0.015	5×
`=POTENTIAL` on 150K rows	$0.50	50,000	$2.50	5×
`=PREDICT` train, 15K rows	$0.05	5,000	$0.25	5×
`=PREDICT` 1 inference	$0.0000007	1 (rounded)	$0.00005	71×*
`=FILL`, 5 sparse cols, 1K rows	$0.015	1,500	$0.075	5×
`=AUDIT`, embedded model	$0	0	$0	∞
Storing 1 model (200 KB) for 1 year	$0.000055	6	$0.0003	5×
Storing 1000 models for 1 year	$0.055	5,500	$0.275	5×

* The 1-token ceil() floor on inference creates a small over-charge on tiny calls — intentional, prevents abuse via spam-clicking. At 71× markup on individual cells but 5× in any realistic batch, the average user pays the headline 5× rate.

8.3 The 150K-token starter pack

cdana@monce.ai gets 150,000 tokens on signup. That maps to $1.50 of AWS cost, sells for $7.50. Concretely it allows:

You can do	Approximate count	Tokens used
Full `=POTENTIAL.SCAN` on a 150K-row sheet	~3 sweeps	~150,000
Train one big model (15K rows, 15L)	~30 trains	~150,000
Train one small model (1K rows)	~500 trains	~150,000
Inference cells	~150,000	~150,000
Store 50 trained models for 1 year	flat	~300

Generous enough for evaluation, finite enough that we know who's serious. Starter pack is one-time. Refills are paid (or paid-in-credit for design partners).

8.4 Per-call response — the workbook sees both axes

# Every snakebatch endpoint returns this envelope.
{
  "result": ...,
  "usage": {
    "lambda_calls": 17,
    "lambda_gb_seconds": 81.2,
    "s3_puts": 2, "s3_gets": 0,
    "ddb_writes": 1,
    "cost_usd": 0.0247,
    "tokens": 2470,
    "endpoint": "/v6/train",
    "model_id": "v6-fa9e3-...",
    "user_email": "cdana@monce.ai"
  },
  "balance": { "tokens": 147530, "refills_at": "2026-06-01" }
}

8.5 The workbook usage panel

Cell

Formula

Λ calls

GB·s

Tokens

$ cost

=POTENTIAL(A:E, "Délai")

1.4

$0.0003

=PREDICT(A:E, "Délai", G:K)

81.2

2,470

$0.0247

=FILL(A1:E120)

5.7

170

$0.0017

Session total

88.3

2,670

$0.0267

10. Secure Lambda design — auth, balance, S3 metering

The user must be logged in to use Excel Desktop. No anonymous access on cell formulas. Every call carries an sk-snake-* key, every key is bound to an email, every charge is atomic. Designed today, built before v0.5.

9.1 New AWS resources (all in eu-west-3, account `202533543033`)

Resource	Type	Purpose
`snake-batch-users`	DynamoDB	One row per user. PK=`email`. Holds api_key, balance, status, monthly grant.
`snake-batch-keys`	DynamoDB GSI	Inverse index: PK=`api_key` → email. Used on every authenticated request.
`snake-batch-auth`	Lambda (512 MB, 10s)	Signup / magic-link / verify / rotate-key. Stateless, calls SES + DDB.
`snake-batch-billing`	Lambda (256 MB, 5s)	Atomic balance decrement via DDB `ConditionExpression`. Called inline by api Lambda.
`snake-batch-meter`	Lambda (1 GB, 60s)	EventBridge cron, daily 00:30 UTC. Scans S3 model-storage, charges storage tokens.
`snake-batch-models/`	S3 prefix	Per-user prefix: `jobs/<sha256(email)>/<model_id>/...`. IAM scoped per key.
SES sender	Verified domain	`noreply@monce.ai` for magic links. DKIM signed.

9.2 `snake-batch-users` schema

{
  "email":           "cdana@monce.ai",                    # PK
  "api_key":         "sk-snake-cdana-7Y3bF...4qV",         # 32 random bytes b58
  "api_key_sha256":  "a3f9...b1",                          # indexed, key never stored raw
  "balance_tokens":  150000,                              # NUMBER, atomic decrement
  "monthly_grant":   0,                                   # refill on 1st of month
  "created_at":      "2026-05-17T...",
  "status":          "active",                             # active | suspended | revoked
  "tier":            "design_partner",                     # maps to markup & cap
  "workbook_ids":    [...],                                # SHA-256 of customXmlPart sha
  "models_owned":    ["v6-fa9e3-..."],                    # for storage charging
  "last_call_at":    "2026-05-17T...",
  "daily_call_count": 0                                  # reset by meter Lambda
}

The raw api_key is shown to the user once at issuance, never read back. We store SHA-256 of it as a GSI — on every request we hash the bearer token, look up the user, decrement, log. Same pattern as Stripe / GitHub PAT.

9.3 Atomic balance decrement

# In api/handler.py, BEFORE invoking the worker Lambda:
def charge(api_key_hash, tokens):
    try:
        users.update_item(
            Key={"email": lookup_email(api_key_hash)},
            UpdateExpression="SET balance_tokens = balance_tokens - :t, "
                              "last_call_at = :now",
            ConditionExpression="balance_tokens >= :t AND #s = :a",
            ExpressionAttributeNames={"#s": "status"},
            ExpressionAttributeValues={":t": tokens, ":a": "active",
                                       ":now": now()},
        )
    except ConditionalCheckFailed:
        raise HTTP402("Insufficient tokens or account suspended")

Atomic via DDB ConditionExpression — no race, no double-charge. Pre-charge happens BEFORE the worker fires; if the worker over-runs the estimate, we charge the difference at completion. If the worker under-runs (rare), we credit the difference back. Estimates from /estimate are deliberately ~10% conservative so users don't 402 mid-job.

9.4 The auth flow — magic link, no password

║ Excel cell types =PREDICT(...) for the first time
║
║  add-in checks workbook custom property "monceai-api-key" — absent
║
║  call POST /v6/train without Authorization → 401 + www-auth-hint
║
║  add-in opens taskpane: "Sign in to monceai"
║       email field → cdana@monce.ai
║       submit → POST /auth/signup
║
║  snake-batch-auth Lambda:
║       1. validate email format + DNS MX
║       2. generate link_token = secrets.token_urlsafe(32)
║       3. write users[email].magic_link = sha256(link_token), expires=now+10min
║       4. SES send magic link → https://excel.aws.monce.ai/verify?t=<link_token>
║
║  user clicks link → GET /auth/verify?t=...
║       1. lookup users by hash(link_token), check expiry
║       2. if first-time: issue api_key, set balance_tokens=150000
║       3. respond with HTML page: "Return to Excel and click Continue"
║       4. add-in polls /auth/poll?email=... every 2s
║
║  add-in receives api_key, writes it to:
║       - workbook customXmlPart monceai-api-key  (per-workbook)
║       - Excel registry HKCU\Software\Monce\monceai\api_key  (Win, per-user)
║       - Mac: ~/Library/Containers/com.microsoft.Excel/.../monceai-key.json
║
║  =PREDICT retries with Authorization: Bearer sk-snake-... → result

9.5 Why magic-link, not OAuth, not password

Option	Pro	Con	Verdict
Magic link	No password to leak. Works on any device. SES is one Lambda away.	Email round-trip (10-30s).	Pick this. One-time cost on install.
Microsoft OAuth	Native to Excel via `Office.context.auth.getAccessToken`.	Requires Azure AD app registration, tenant approval — IT politics.	v2 add-on for enterprise customers only.
Password	Familiar.	We'd hold it. Reset flow. Brute force. Storage compliance.	No.
API key only (no email)	Simplest.	Can't recover; can't bill; can't refill; can't notify.	No — we want the email for billing.

9.6 S3 model storage — metered daily

Every model trained gets written to s3://snake-batch-monce/jobs/<hash(email)>/<model_id>/model_stripped.json. The user-prefix is required for IAM — the api Lambda only has s3:GetObject on the user's own prefix.

# snake-batch-meter Lambda — daily cron 00:30 UTC
def handler(event, ctx):
    for user in users.scan():
        prefix = f"jobs/{sha256(user.email)}/"
        total_bytes = sum(o.size for o in s3.list(prefix))
        mb_day = total_bytes / 1e6 / 30          # 1/30th of monthly cost per day
        tokens = math.ceil(mb_day * 23 / 10)        # 1 MB·day = 0.77 tokens
        charge(user.api_key_hash, tokens, kind="storage")
        log_storage_metering(user, total_bytes, tokens)

Storage is cheap — ~1 token per MB·day. 50 stored models at 200KB = 10MB × 30 days = ~230 tokens / month. Visible to the user via =ASK_BALANCE() (an Excel formula that hits /auth/balance) and on excel.aws.monce.ai/account.

9.7 Security hardening, declared upfront

Threat	Mitigation
Stolen api_key	SHA-256 stored, never raw. Per-key rate limit (1000 calls/min). User can rotate via taskpane → old key revoked atomically.
Magic link replay	Single-use. Hash of token stored, deleted on first use. 10-min TTL.
Email spoofing	SES with DKIM + SPF. Inbound bounce protection. Disposable-email blocklist (e.g. mailinator).
Balance race	DDB `ConditionExpression` — atomic compare-and-decrement.
Cross-user model read	S3 IAM policy restricts api Lambda to `jobs/<hash(email)>/*`. User cannot list or fetch another user's prefix.
Cost-bomb attack	Per-user daily call cap (default 10K). Per-user dollar cap (default $50/day). Both override-able per tier. Hit the cap → 429 + email alert.
Workbook leakage	Custom property `monceai-api-key` is excluded from `=AUDIT` output and from any cell-spill. The add-in scrubs it on workbook export to PDF.
Model exfiltration	Embedded model JSON has SHA-256 in header (already in §7). Optional AES-256 at rest with key derived from workbook password.

9.8 Where this code lives

lambda_functions/
  api/handler.py            # existing — ADD: charge() inline, /v6/train wraps in try/charge
  auth/handler.py           # NEW — signup, verify, poll, rotate, balance
  billing/handler.py        # NEW — tokens_for(), charge(), refund()  (importable layer)
  meter/handler.py          # NEW — daily S3 scan, storage charging
terraform/
  auth.tf                   # NEW — users + keys tables, auth + meter Lambdas, SES domain
  iam.tf                    # NEW — per-user S3 prefix policies, billing Lambda invoke perms

excel-addin/
  taskpane/auth.html        # email entry, polling, key persistence
  functions.js              # adds Authorization header on every fetch

9.9 Provisioning cdana@monce.ai — the first record

When the auth Lambda is live, the bootstrap call is a single CLI:

aws dynamodb put-item --region eu-west-3 --table-name snake-batch-users \
  --item '{
    "email":          {"S": "cdana@monce.ai"},
    "api_key_sha256": {"S": "<sha256 of generated key>"},
    "balance_tokens": {"N": "150000"},
    "monthly_grant":  {"N": "0"},
    "tier":           {"S": "design_partner"},
    "status":         {"S": "active"},
    "created_at":     {"S": "2026-05-17"}
  }'

# Generated key (printed once, then discarded server-side):
# sk-snake-cdana-<32 random base58 chars>

The key is shown once, the SHA-256 lives in the table, and cdana@monce.ai signs in to the Excel add-in with email + the magic-link flow. Starter pack of 150,000 tokens (= $7.50 face / $1.50 AWS cost) covers a thorough evaluation. This row is not provisioned today — designed only.

11. The seamless install — one click on any Excel device

Goal: from "I want this" to "=PREDICT works in my sheet" in under 60 seconds, with no admin rights, on Windows Excel, Mac Excel, and Excel on the web. One install path. Three OSes covered.

The chosen path: Office Add-in (Custom Functions)

VBA .xlam is Windows-only and triggers IT alerts. Office Add-ins via the Custom Functions API are the only mechanism Microsoft supports across Windows desktop, Mac desktop, and Excel for the web. They install from a manifest URL or AppSource, sandbox in their own runtime, and register first-class functions with IntelliSense and async support.

Path	Windows	Mac	Web	IT-friendly	Verdict
VBA `.xlam`	✓	partial	✗	often blocked	backup only
xlwings / pyxll	✓	✓	✗	requires Python install	power users
Office Add-in (Custom Functions)	✓	✓	✓	signed manifest, no admin needed	this is the answer

The 3-step install

1. Click
monce.ai/excel

→

2. "Open in Excel"
browser hands off

→

3. "Trust this add-in"
one click

⇒

=PREDICT available
in every workbook

What's behind the click

# 1. The user lands on https://monce.ai/excel
#    → serves a tiny page with three buttons: Windows / Mac / Web

# 2. Windows / Mac — the page redirects to:
ms-excel:ofe|u|https://excel.aws.monce.ai/manifest.xml
#    Excel auto-launches and shows the "Trust" dialog

# 3. Excel on the web — the page calls:
Office.context.requirements.isSetSupported("CustomFunctionsRuntime", "1.4")
#    and sideloads the manifest via Insert → Add-ins → Upload My Add-in

# 4. The manifest declares:
<CustomFunctions>
  <Function>
    <Name>PREDICT</Name>
    <Description>Snake classification on a train/test range</Description>
    <Parameters> ... </Parameters>
    <Result><Type>matrix</Type><Dimensionality>matrix</Dimensionality></Result>
  </Function>
  <!-- POTENTIAL, FILL, AUDIT same shape -->
</CustomFunctions>

What ships with the manifest

Asset	Where it lives	Size	What it does
`manifest.xml`	excel.aws.monce.ai/manifest.xml	~6 KB	Declares the four functions, points at functions.json + functions.js
`functions.json`	same host	~3 KB	Function metadata for Excel
`functions.js`	same host	~25 KB	The thin client — `fetch()` against snakebatch + Custom XML Part read/write
`taskpane.html`	same host	~40 KB	Optional sidebar for API key entry, usage panel, model manager

Total payload: ~75 KB. Hosted as static files on the same EC2. Excel updates them automatically on every workbook open (HTTP HEAD revalidation) — users never download a new version manually. We ship a fix at 10 AM, every workbook in the field uses it by 10:01.

The seamless paths, ranked

Rank	Path	Steps	Time	Caveat
1	AppSource listing	Excel → Insert → Get Add-ins → "monceai" → Add	~30s	Requires Microsoft store review (~2 weeks one-time)
2	Direct sideload via `ms-excel:ofe\|u\|...`	Click link in email/page	~15s	Works today, no review needed
3	Manual manifest upload	Insert → My Add-ins → Upload	~60s	For locked-down corp environments
4	Centralized deployment	IT admin pushes via M365 admin center	0s for end user	Enterprise tier — we provide the manifest URL

The "open in Excel" deep link, written out

# Windows + Mac Excel
<a href="ms-excel:ofe|u|https://excel.aws.monce.ai/manifest.xml">
  Install monceai for Excel
</a>

# Excel on the web
<a href="https://excel.officeapps.live.com/x/_layouts/16/AddIn.aspx?manifestUrl=https://excel.aws.monce.ai/manifest.xml">
  Install monceai for Excel Online
</a>

# Detection — one button, three OSes
if navigator.userAgent.includes("Mac") || navigator.userAgent.includes("Windows"):
    redirect_to("ms-excel:ofe|u|...")
else:
    redirect_to("https://excel.officeapps.live.com/...")

Backup paths, declared upfront

VBA .xlam for users on Excel 2016/2019 without Office Add-in support — same backend, downloadable as a fallback. ~80 KB.
Python in Excel (M365 cloud Python) — can call snakebatch directly via requests. We ship a snippet at excel.aws.monce.ai/python.txt the user pastes once.
Office Scripts (TypeScript) for Excel-on-the-web users who want to script around the add-in.

The pitch line: "Click here. Click 'trust'. Type =PREDICT. Done." — on Windows, on Mac, in a browser, and on a tablet. One pitch, every device.

12. The pitch to Mathieu

What this is NOT	What this IS
Yet another LLM-in-Excel wrapper.	SAT classification on factory data, with auditable reasoning per row.
A black-box prediction.	Five lookalikes per cell. The user sees the past before trusting the future.
A data-science tool for analysts.	A formula. Anyone who knows `=VLOOKUP` can use `=PREDICT`.
Cloud-only.	Train cloud, predict local, model embedded in the file.
Generic.	Monce-suite reliability overlay handles "44.2 Silence", "LGB Menuiserie SAS", factory IDs — industrial vocabulary native.

The killer feature isn't =PREDICT. It's =AUDIT. =PREDICT is the hook that gets the formula on the sheet. =AUDIT is the reason the user trusts it the second time.

13. What we'd build first

Phase	Scope	Time	Demo
v0.1 — `=POTENTIAL`	80/20 split, AUROC + optimal accuracy. Smallest, most demo-able formula. Proves the backend round-trip.	2 days	Score the predictability of every column in a Monce quote sheet.
v0.2 — `=FILL`	Missing-data magic. Select table, hit button, blanks fill in. One Snake per blank column.	1 weekend	Live on a real Monce quote sheet.
v0.3 — `=PREDICT` + safe storage	Train/target/test with spilled output. Datatype lock. Custom XML Part with SHA-256 + cloud fallback.	1 week	Predict délai on next-week quotes from last quarter's history. Email the file — it still predicts.
v0.4 — `=AUDIT`	Lookalikes cell, three modes. The trust layer.	1 week	Click a prediction → see the 5 past orders behind it.
v0.4b — `=PREDICT.CANDLE`	OHLC mapping over lookalike values, drops into Excel Stock chart. `.GLYPH`, `.DIST`, `.CI` siblings.	3 days	Quote-pricing sheet with candles next to predicted prix — salesperson reads risk visually.
v0.5 — Auth + billing infra	`snake-batch-users` + `snake-batch-auth` + `snake-batch-meter`. Magic-link signup. Atomic balance decrement. S3 per-user prefix + IAM. cdana@monce.ai with 150K tokens.	1 week	Login flow end-to-end. Balance shown in Excel taskpane. Insufficient-tokens 402 working.
v0.6 — Office Add-in shell	Manifest hosted on excel.aws.monce.ai. `ms-excel:ofe\|u\|` deep link. Functions registered. Auth-gated calls.	1 week	One-click install + sign-in. Works on Win, Mac, Web.
v0.7 — Ribbon	"Snake" tab in Excel. Train wizard, slicer-driven re-predict, model manager, account panel.	2 weeks	The TCD-style UX. No formulas required for non-power users.

14. The line that ties it all together

"Excel has =TREND for what's linear and =FORECAST.ETS for what's seasonal.
SnakeBatch gives Excel =PREDICT, =FILL and =AUDIT — for everything else."

SnakeBatch × Excel

1. The pitch in one sentence

2. Three problems Excel doesn't solve today

3. =FILL — missing data, magically

The cell view

Under the hood

4. =PREDICT — train / target / test

Datatype handling — the test-trickier-than-train problem

5. =AUDIT — the killer feature

Every prediction comes with its 5 closest training rows.

The cell view

Three audit modes

Why the (c)/(n) split matters

6. =PREDICT.CANDLE — regression as a financial candle

Don't collapse the distribution. Show it.

The mapping — (c)/(n) becomes body/wick

The cell view — predicting "Prix" in euros

The formula family

Why this plugs into Excel natively

What the user does with it — three real workflows

Why it's better than a confidence interval

7. =POTENTIAL — can this column even be predicted?

The cell view

Why this matters more than R²

The pitch in a phrase

Under the hood

8. Storing the model safely in the .xlsx

The Custom XML Part layout

Safety properties — what we get for free, what we add

The cloud fallback — never break the cell

9. Billing — tokens, the equation, the spec

8.1 The token equation

8.2 What things actually cost, in tokens

8.3 The 150K-token starter pack

8.4 Per-call response — the workbook sees both axes

8.5 The workbook usage panel

10. Secure Lambda design — auth, balance, S3 metering

9.1 New AWS resources (all in eu-west-3, account 202533543033)

9.2 snake-batch-users schema

9.3 Atomic balance decrement

9.4 The auth flow — magic link, no password

9.5 Why magic-link, not OAuth, not password

9.6 S3 model storage — metered daily

9.7 Security hardening, declared upfront

9.8 Where this code lives

9.9 Provisioning cdana@monce.ai — the first record

11. The seamless install — one click on any Excel device

The chosen path: Office Add-in (Custom Functions)

The 3-step install

What's behind the click

What ships with the manifest

The seamless paths, ranked

The "open in Excel" deep link, written out

Backup paths, declared upfront

12. The pitch to Mathieu

13. What we'd build first

14. The line that ties it all together

3. `=FILL` — missing data, magically

4. `=PREDICT` — train / target / test

5. `=AUDIT` — the killer feature

6. `=PREDICT.CANDLE` — regression as a financial candle

7. `=POTENTIAL` — can this column even be predicted?

8. Storing the model safely in the `.xlsx`

9.1 New AWS resources (all in eu-west-3, account `202533543033`)

9.2 `snake-batch-users` schema