SnakeBatch × Excel

Charles Dana · Monce SAS · May 2026

A predictive layer for the spreadsheet. SAT classification, missing-data inference, lookalike audit — native cell formulas.

1. The pitch in one sentence

Excel ships =TREND, =FORECAST.LINEAR, =FORECAST.ETS. None of them touch text, mixed types, categories, or industrial vocabulary. SnakeBatch does. We add three formulas to Excel and the spreadsheet becomes a predictive workbench:

=PREDICT(train, target, test)
=PREDICT.CANDLE(train, target, test)
=FILL(table)
=AUDIT(train, target, test)
=POTENTIAL(table, target)

Train via snakebatch.aws.monce.ai. Model lives inside the .xlsx as a Custom XML Part. Predictions spill across rows the same way =FILTER does. The workbook becomes a self-contained ML artifact — email it, share it, the model goes with it.

2. Three problems Excel doesn't solve today

ProblemWhat Excel doesWhat SnakeBatch does
Missing data
FILL
Leaves #N/A, drops the row from charts, breaks =AVERAGE. Trains a Snake on the columns we DO have, predicts the blank from the rest of the row, fills it with the most logical value plus a confidence.
Mixed-type classify
PREDICT
LINEST only takes numbers. Categorical features need one-hot manually. Test column with stray text? Crashes. Auto-detects N/I/B/T per column from the training set. Test rows with messier formatting are coerced via the same datatype map. No crash, no manual encoding.
Why this prediction?
AUDIT
Trendline R² tells you "fit is OK overall". Nothing tells you why row 47 got the answer it got. Snake stores lookalikes per bucket. Every prediction comes with the 5-10 training rows that look most like this one. Auditor-grade explainability, native to the cell.

3. =FILL — missing data, magically

User cost: select the table. Hit one button (or type =FILL(A1:E120)). Done.
What happens: for every column with blanks, SnakeBatch trains one Snake using the other columns as features, then predicts each missing cell.

The cell view

Client
Article
Qty
Délai
Prix
1
LGB
44.2 rTherm
120
5j
4280€
2
VME
33.1 LowE
80
2640€
3
ACTIF
44.2 Silence
200
7j
4
33.1 LowE
80
4j
2640€
5
LGB
44.2 rTherm
120
5j
4280€

=FILL(A1:E5)

Client
Article
Qty
Délai
Prix
1
LGB
44.2 rTherm
120
5j
4280€
2
VME
33.1 LowE
80
4j (0.91)
2640€
3
ACTIF
44.2 Silence
200
7j
7140€ (0.83)
4
VME (0.78)
33.1 LowE
80
4j
2640€
5
LGB
44.2 rTherm
120
5j
4280€

Under the hood

# For each column with blanks:
for col_with_blanks in table:
    train = rows_where(col_with_blanks is not blank)
    test  = rows_where(col_with_blanks is blank)
    snake = Snake(train, target_index=col_with_blanks)   # 1.8s for 891 rows
    fill[col_with_blanks] = [snake.get_prediction(r) for r in test]

One Snake per column with blanks, all trained in parallel on Lambda. A 5-column / 1000-row table with 3 sparse columns → 3 Lambdas fire concurrently → ~2s wall clock.

4. =PREDICT — train / target / test

Train range
A1:E120
Target col
"Délai"
Test range
G1:K30
Spilled predictions
L1:L30
' Excel cell:
=PREDICT(A1:E120, "Délai", G1:K30)     → spills 30 predictions down L
=PREDICT.PROB(A1:E120, "Délai", G1:K30) → spills 30 × |classes| matrix
=PREDICT.TOP(A1:E120, "Délai", G1:K30, 3) → top-3 candidates per row

Datatype handling — the test-trickier-than-train problem

Real spreadsheets are messy. Train says "5j", test says "5 jours". Train says 120, test says "120,00". Train says "LGB", test says "LGB Menuiserie SAS". SnakeBatch handles this in two layers:

LayerWhat it doesWhere it lives
Datatype lockTrain infers N/I/B/T per column. Test rows are coerced through the same map — "120,00"120.0, blanks → 0.0 for numeric, "" for text._preprocess() in app.py
Reliability overlay
(monce-suite)
For factory-bound columns (client, verre, intercalaire): canonical-ID matching first, Snake second. "LGB Menuiserie SAS"9232 before classification.claude.aws/stage_0, snake.aws/query
Why the overlay matters: a Snake trained on "LGB" doesn't recognize "LGB Menuiserie SAS". The matching layer normalizes both to 9232 before the Snake sees them. Industrial knowledge inside the predictor. This is the part Mathieu cares about.

5. =AUDIT — the killer feature

Every prediction comes with its 5 closest training rows.

This is what no other Excel ML tool can do. LINEST gives you coefficients. FORECAST.ETS gives you a confidence band. Snake gives you receipts.

For row 47's prediction "Délai = 7j (0.83)", the audit cell shows the 5 historical orders Snake routed into the same SAT bucket — same article family, same factory, similar quantity. The user sees WHY before they trust the answer.

The cell view

Client
Article
Qty
Prédit
Audit (lookalikes)
47
VITRO
44.2 Silence
180
7j (0.83)
5 similar past orders:
· LGB / 44.2 Silence / 200 → 7j
· VITRO / 44.2 rTherm / 150 → 7j
· ACTIF / 44.2 Silence / 200 → 7j
· LGB / 44.2 Silence / 160 → 6j
· VITRO / 44.2 Silence / 220 → 8j

Three audit modes

FormulaReturnsUse case
=AUDIT(...)Top-5 lookalikes as a single cell, multi-line text.Glanceable. One column added next to predictions.
=AUDIT.SPILL(...)Spills 5 rows × train_columns next to each test row.Side-by-side comparison. Power users.
=AUDIT.WHY(...)The SAT clauses that fired: e.g. article startswith "44.2" AND qty >= 150.Reasoning trace. For compliance / debugging.

Why the (c)/(n) split matters

Snake v5+ labels every lookalike as core (c) — routed to the bucket by SAT condition — or noise (n) — randomly injected for regularization. The audit cell can show both, with origin tags. Core lookalikes answer "what does the rule say?". Noise lookalikes answer "what does the broader population say?". Two independent signals on the same prediction.

6. =PREDICT.CANDLE — regression as a financial candle

Don't collapse the distribution. Show it.

Snake regression doesn't return one number — every prediction is backed by N lookalikes, each with their own target value. That's a distribution. Collapsing it to a mean throws away the most useful information: how confident, how wide, how the SAT-routed peers compare to the population baseline.

Borrow the OHLC candle from finance. Map the lookalike distribution onto Open / High / Low / Close. Drop it straight into Excel's built-in Stock chart. The user reads predicted ranges the same way a trader reads a chart — in seconds.

The mapping — (c)/(n) becomes body/wick

Candle partSourceMeaning
OpenMean of noise lookalikes (n)What the population baseline says — "if I knew nothing about this row".
CloseMean of core lookalikes (c)What the SAT-routed peers say — "this row is similar to those".
HighP95 of all lookalikesUpper plausible value.
LowP05 of all lookalikesLower plausible value.
ColorClose > Open → green
Close < Open → red
Green = SAT specificity pulls the value up from baseline. Red = pulls it down. Doji = no signal beyond baseline.

The candle tells you what the model knows: the body is the specificity premium (c minus n), the wicks are the residual uncertainty across all 30+ peers. A tall body + short wicks = strong, narrow signal. A short body + long wicks = weak signal, hedge the decision.

The cell view — predicting "Prix" in euros

Client
Article
Qty
Open
High
Low
Close
Glyph
47
VITRO
44.2 Silence
180
3,100
5,400
2,890
4,720
48
LGB
33.1 LowE
80
2,640
2,790
2,510
2,650
49
ACTIF
44.2 rTherm
250
7,200
7,400
5,100
5,890

Row 47: SAT specificity adds +1,620€ over baseline → tall green candle, narrow wicks. Row 48: tiny body, almost a doji → standard pricing, no surprises. Row 49: SAT pulls -1,310€ below baseline → red candle — this client / article combination historically discounts.

The formula family

' Single cell — spills 4 columns: Open | High | Low | Close
=PREDICT.CANDLE(train, "Prix", test_row)

' Spilled across many test rows — (n × 4) array, drop straight into Excel Stock chart
=PREDICT.CANDLE(train, "Prix", G1:K30)

' One-cell glyph — Unicode candle, color-coded
=PREDICT.CANDLE.GLYPH(train, "Prix", test_row)     → △ / ▮ / ▽

' Distribution endpoint — spill all lookalike values for histogram / sparkline
=PREDICT.DIST(train, "Prix", test_row)             → spills [v1, v2, ..., vk]

' Confidence interval — alternative to candle when chart isn't needed
=PREDICT.CI(train, "Prix", test_row, 0.9)          → { mean, lo, hi }

Why this plugs into Excel natively

Excel featureHow =PREDICT.CANDLE uses it
Stock chart (Insert → Chart → Stock)Excel's built-in OHLC chart accepts exactly the four-column layout we spill. Zero glue code.
Conditional formattingColor cells by (Close − Open) / Close. Strong-signal rows light up.
Sparkline=PREDICT.DIST + Sparkline column → histogram of lookalike values right next to the prediction.
Pivot / SlicerSlicer filters the test range → candles recompute live. The TCD-style flow we already pitched.

What the user does with it — three real workflows

  1. Quote pricing. Candle's Low = floor (don't price below this without a discount approval). High = ceiling (above this, expect customer pushback). Body = negotiation room. The salesperson reads the candle and prices in seconds.
  2. Lead-time risk. For =PREDICT.CANDLE(orders, "Délai", upcoming): a long-wick green candle means "expected fast, but tail risk". Order it early.
  3. Anomaly screen. If the candle's color + body shape on a quote disagrees with what was just typed, flag for review. The model becomes a second pair of eyes on every line.

Why it's better than a confidence interval

Confidence intervalCandle
Two numbers (lo, hi). Symmetric assumption.Four numbers. Asymmetric tails captured naturally.
Doesn't show "what specificity buys you".Body = specificity premium; you see the gap.
Renders as text or a single bar.Renders as a real chart Excel ships natively.
Requires distributional assumptions to compute honestly.Empirical: just percentiles of lookalike values. No parametric assumptions.
Why this is on-brand for Snake: the candle isn't bolted on — it's the natural rendering of what Snake already produces. Every bucket already stores its members with origins (c/n) and target values. =PREDICT.CANDLE just reshapes that into OHLC. The model didn't need to change. The output did.

7. =POTENTIAL — can this column even be predicted?

Before training a real model, ask the data a question: "is there signal here at all?" =POTENTIAL runs an 80/20 split, trains a Snake on 80%, scores the held-out 20%, returns AUROC and optimal accuracy. Two cells. Five seconds. The user knows whether to invest in this prediction or pick a different target column.

The cell view

' Excel cell:
=POTENTIAL(A1:E1000, "Délai")     → spills two cells: AUROC | Optimal accuracy

' Result spilled into B12:C12
AUROC: 0.91   |   Opt accuracy: 84% (threshold 0.55)

' Multi-target sweep — spill across rows
=POTENTIAL.SCAN(A1:E1000)            → one row per candidate target

' Result
Délai     0.91   84%   (predictable)
Prix      0.78   71%   (some signal)
Client    0.52   53%   (essentially random — don't bother)

Why this matters more than R²

QuestionWhat LINEST R² tells youWhat =POTENTIAL tells you
Does this column have signal?Linear signal only. Categorical target? Useless. Mixed text? Crashes.SAT-detectable signal in any datatype. Held-out validation, not in-sample fit.
How well will it generalize?Doesn't know — R² is in-sample.AUROC on held-out 20%. Honest number.
What's the right threshold?N/A.Optimal accuracy + the threshold that achieves it. Drops straight into a downstream =IF.
Which target is most predictable?Run regression 5 times manually.=POTENTIAL.SCAN sweeps every candidate column. One formula.

The pitch in a phrase

=POTENTIAL is the data-quality probe. Before you trust =PREDICT, run =POTENTIAL. If AUROC < 0.6 the column doesn't have signal — no model will fix that.

Under the hood

# server-side, one Lambda call
def potential(table, target):
    train, test = stratified_split(table, target, ratio=0.8, seed=42)
    snake = Snake(train, target_index=target, mode="fast")
    probs = [snake.get_probability(r) for r in test]
    auroc = roc_auc(probs, test_labels)
    opt_acc, opt_thr = sweep_threshold(probs, test_labels)
    return {"auroc": auroc, "accuracy": opt_acc, "threshold": opt_thr}

Stratified split keeps class balance honest. Seeded for reproducibility — same workbook, same number, every time. fast mode (25 layers / bucket 16) is the right tradeoff: ~1s per call on snakebatch, AUROC within 1-2pp of full heavy mode.

8. Storing the model safely in the .xlsx

The .xlsx format is a zip of XML parts. There are three real places we can park a Snake model. Only one is built for it.

StorageCapacitySurvivesVisibleVerdict
Custom XML Part
recommended
Multiple MB. No practical limit. Save, Save As, copy, email, OneDrive sync, Google Sheets import. Hidden from the user. Visible to programmatic readers. The right answer. Built by Microsoft for exactly this use case.
Defined Name ~32 KB max. Save, Save As, email. Visible in Name Manager. Tiny models only. Discoverable by curious users (good or bad).
Very Hidden Sheet Cell limit ≈ 32K chars; tile across many cells. Save, email. Wiped if a user runs "Inspect Document". Invisible from UI but detectable by VBA. Crude. Only if Custom XML Parts aren't reachable.

The Custom XML Part layout

┌─ workbook.xlsx (zip) ──────────────────────────────┐
│  /xl/worksheets/sheet1.xml         — user data     │
│  /xl/customXml/item1.xml           — model_stripped.json  
│  /xl/customXml/itemProps1.xml      — schema id     │
│  /xl/customXml/_rels/item1.xml.rels                  │
│  /xl/_rels/workbook.xml.rels       — binds part     │
└────────────────────────────────────────────────────┘

Namespace: urn:monceai:snake:v1
Schema id: {ec2db94a-snake-monceai-2026}
Payload:   gzip+base64 of model_stripped.json   # 30-80% smaller than raw JSON
Header:    {"v":"5.4.5", "id":"v6-...", "sha256":"...", "trained_at":"..."}

Safety properties — what we get for free, what we add

PropertyBuilt-in?How we enforce it
Persistence across save/shareYesCustom XML Parts are part of the OOXML spec. Excel preserves them blindly.
Tampering detectionNoSHA-256 of payload stored in the header. =PREDICT verifies before evaluating — mismatch → #CORRUPT.
Confidentiality (read-protect)NoOptional AES-256 with a key derived from the workbook's password (if set). Without password, model is readable — same as the data.
VersioningNoHeader carries Snake version + schema. =PREDICT from a v5 model in a v6 add-in still works (read-compat is held).
SizeYesgzip the JSON. A 200 KB raw model becomes ~50 KB on disk — smaller than most embedded images.
"Inspect Document" survivalCaveatOffice's Document Inspector lists Custom XML Parts and offers to remove them. We name our part visibly (monceai-snake-model) so the user knows what it is. If they remove it, =PREDICT falls back to cloud.

The cloud fallback — never break the cell

def predict_cell(test_row):
    model = read_custom_xml_part("monceai-snake-model")
    if model and verify_sha256(model):
        return evaluate_local(model, test_row)        # instant, offline
    if model_id in workbook_metadata:
        return snakebatch_predict(model_id, test_row)   # cloud, ~150ms
    return "#NO_MODEL"

Three layers of safety: local first (fast, offline), cloud fallback (the model_id is also stored as a workbook custom property, so even if the embedded JSON is wiped we can re-fetch), graceful failure (#NO_MODEL instead of #REF! — tells the user to retrain).

Industrial relevance: a quote-validation workbook trained on a factory's history can be sent to the sales team. The model knows what "44.2 Silence" means in that factory. It works offline, on the train, in a meeting. The factory knowledge ships with the file — integrity-checked, version-tagged, and recoverable from cloud if the part is stripped.

9. Billing — tokens, the equation, the spec

Goal: a token unit that maps cleanly to AWS reality, leaves a healthy margin, and scales linearly — so 100 customers don't break us, and 1 customer doesn't underpay. Every cell carries its AWS pass-through cost back to the workbook. The user must be logged in to use Excel Desktop — that's the billing hook.

8.1 The token equation

One token = $0.00001 of underlying AWS cost (compute + storage + requests). Sale price: $0.00005 / token — a flat 5× markup. Bulk pack: $50 / 1M tokens.

# Authoritative cost-to-tokens function. Lives in api/billing.py.
# Constants from /economics: ARM64 Graviton2, eu-west-3, April 2026.
GB_SEC     = 0.0000166667     # Lambda 10GB-class compute, $/GB·s
INVOKE     = 0.0000002        # Lambda request, $/invoke
S3_GBMO    = 0.023             # S3 standard, $/GB·month
S3_PUT     = 0.000005         # S3 PUT, $/op
S3_GET     = 0.0000004        # S3 GET, $/op
DDB_RW     = 0.00000125       # DynamoDB on-demand write, $/op

TOKEN_USD  = 0.00001          # 1 token = $0.00001 AWS cost (cost basis)
SELL_USD   = 0.00005          # 1 token sold for $0.00005 (5× markup)

def tokens_for(call):
    aws = (
        call.lambda_gb_seconds * GB_SEC
      + call.lambda_invokes    * INVOKE
      + call.s3_mb_month * 1e-3 * S3_GBMO
      + call.s3_puts           * S3_PUT
      + call.s3_gets           * S3_GET
      + call.ddb_writes        * DDB_RW
    )
    return math.ceil(aws / TOKEN_USD)        # round UP — never under-charge

8.2 What things actually cost, in tokens

ActionAWS costTokensSale ($)Margin
=POTENTIAL on 1K rows$0.003300$0.015
=POTENTIAL on 150K rows$0.5050,000$2.50
=PREDICT train, 15K rows$0.055,000$0.25
=PREDICT 1 inference$0.00000071 (rounded)$0.0000571×*
=FILL, 5 sparse cols, 1K rows$0.0151,500$0.075
=AUDIT, embedded model$00$0
Storing 1 model (200 KB) for 1 year$0.0000556$0.0003
Storing 1000 models for 1 year$0.0555,500$0.275

* The 1-token ceil() floor on inference creates a small over-charge on tiny calls — intentional, prevents abuse via spam-clicking. At 71× markup on individual cells but 5× in any realistic batch, the average user pays the headline 5× rate.

8.3 The 150K-token starter pack

cdana@monce.ai gets 150,000 tokens on signup. That maps to $1.50 of AWS cost, sells for $7.50. Concretely it allows:

You can doApproximate countTokens used
Full =POTENTIAL.SCAN on a 150K-row sheet~3 sweeps~150,000
Train one big model (15K rows, 15L)~30 trains~150,000
Train one small model (1K rows)~500 trains~150,000
Inference cells~150,000~150,000
Store 50 trained models for 1 yearflat~300

Generous enough for evaluation, finite enough that we know who's serious. Starter pack is one-time. Refills are paid (or paid-in-credit for design partners).

8.4 Per-call response — the workbook sees both axes

# Every snakebatch endpoint returns this envelope.
{
  "result": ...,
  "usage": {
    "lambda_calls": 17,
    "lambda_gb_seconds": 81.2,
    "s3_puts": 2, "s3_gets": 0,
    "ddb_writes": 1,
    "cost_usd": 0.0247,
    "tokens": 2470,
    "endpoint": "/v6/train",
    "model_id": "v6-fa9e3-...",
    "user_email": "cdana@monce.ai"
  },
  "balance": { "tokens": 147530, "refills_at": "2026-06-01" }
}

8.5 The workbook usage panel

Cell
Formula
Λ calls
GB·s
Tokens
$ cost
L2
=POTENTIAL(A:E, "Délai")
3
1.4
30
$0.0003
L3
=PREDICT(A:E, "Délai", G:K)
17
81.2
2,470
$0.0247
L4
=FILL(A1:E120)
34
5.7
170
$0.0017
Σ
Session total
54
88.3
2,670
$0.0267

10. Secure Lambda design — auth, balance, S3 metering

The user must be logged in to use Excel Desktop. No anonymous access on cell formulas. Every call carries an sk-snake-* key, every key is bound to an email, every charge is atomic. Designed today, built before v0.5.

9.1 New AWS resources (all in eu-west-3, account 202533543033)

ResourceTypePurpose
snake-batch-usersDynamoDBOne row per user. PK=email. Holds api_key, balance, status, monthly grant.
snake-batch-keysDynamoDB GSIInverse index: PK=api_key → email. Used on every authenticated request.
snake-batch-authLambda (512 MB, 10s)Signup / magic-link / verify / rotate-key. Stateless, calls SES + DDB.
snake-batch-billingLambda (256 MB, 5s)Atomic balance decrement via DDB ConditionExpression. Called inline by api Lambda.
snake-batch-meterLambda (1 GB, 60s)EventBridge cron, daily 00:30 UTC. Scans S3 model-storage, charges storage tokens.
snake-batch-models/S3 prefixPer-user prefix: jobs/<sha256(email)>/<model_id>/.... IAM scoped per key.
SES senderVerified domainnoreply@monce.ai for magic links. DKIM signed.

9.2 snake-batch-users schema

{
  "email":           "cdana@monce.ai",                    # PK
  "api_key":         "sk-snake-cdana-7Y3bF...4qV",         # 32 random bytes b58
  "api_key_sha256":  "a3f9...b1",                          # indexed, key never stored raw
  "balance_tokens":  150000,                              # NUMBER, atomic decrement
  "monthly_grant":   0,                                   # refill on 1st of month
  "created_at":      "2026-05-17T...",
  "status":          "active",                             # active | suspended | revoked
  "tier":            "design_partner",                     # maps to markup & cap
  "workbook_ids":    [...],                                # SHA-256 of customXmlPart sha
  "models_owned":    ["v6-fa9e3-..."],                    # for storage charging
  "last_call_at":    "2026-05-17T...",
  "daily_call_count": 0                                  # reset by meter Lambda
}

The raw api_key is shown to the user once at issuance, never read back. We store SHA-256 of it as a GSI — on every request we hash the bearer token, look up the user, decrement, log. Same pattern as Stripe / GitHub PAT.

9.3 Atomic balance decrement

# In api/handler.py, BEFORE invoking the worker Lambda:
def charge(api_key_hash, tokens):
    try:
        users.update_item(
            Key={"email": lookup_email(api_key_hash)},
            UpdateExpression="SET balance_tokens = balance_tokens - :t, "
                              "last_call_at = :now",
            ConditionExpression="balance_tokens >= :t AND #s = :a",
            ExpressionAttributeNames={"#s": "status"},
            ExpressionAttributeValues={":t": tokens, ":a": "active",
                                       ":now": now()},
        )
    except ConditionalCheckFailed:
        raise HTTP402("Insufficient tokens or account suspended")

Atomic via DDB ConditionExpression — no race, no double-charge. Pre-charge happens BEFORE the worker fires; if the worker over-runs the estimate, we charge the difference at completion. If the worker under-runs (rare), we credit the difference back. Estimates from /estimate are deliberately ~10% conservative so users don't 402 mid-job.

9.4 The auth flow — magic link, no password

║ Excel cell types =PREDICT(...) for the first time
║
║  add-in checks workbook custom property "monceai-api-key" — absent
║
║  call POST /v6/train without Authorization → 401 + www-auth-hint
║
║  add-in opens taskpane: "Sign in to monceai"
║       email field → cdana@monce.ai
║       submit → POST /auth/signup
║
║  snake-batch-auth Lambda:
║       1. validate email format + DNS MX
║       2. generate link_token = secrets.token_urlsafe(32)
║       3. write users[email].magic_link = sha256(link_token), expires=now+10min
║       4. SES send magic link → https://excel.aws.monce.ai/verify?t=<link_token>
║
║  user clicks link → GET /auth/verify?t=...
║       1. lookup users by hash(link_token), check expiry
║       2. if first-time: issue api_key, set balance_tokens=150000
║       3. respond with HTML page: "Return to Excel and click Continue"
║       4. add-in polls /auth/poll?email=... every 2s
║
║  add-in receives api_key, writes it to:
║       - workbook customXmlPart monceai-api-key  (per-workbook)
║       - Excel registry HKCU\Software\Monce\monceai\api_key  (Win, per-user)
║       - Mac: ~/Library/Containers/com.microsoft.Excel/.../monceai-key.json
║
║  =PREDICT retries with Authorization: Bearer sk-snake-... → result

9.5 Why magic-link, not OAuth, not password

OptionProConVerdict
Magic linkNo password to leak. Works on any device. SES is one Lambda away.Email round-trip (10-30s).Pick this. One-time cost on install.
Microsoft OAuthNative to Excel via Office.context.auth.getAccessToken.Requires Azure AD app registration, tenant approval — IT politics.v2 add-on for enterprise customers only.
PasswordFamiliar.We'd hold it. Reset flow. Brute force. Storage compliance.No.
API key only (no email)Simplest.Can't recover; can't bill; can't refill; can't notify.No — we want the email for billing.

9.6 S3 model storage — metered daily

Every model trained gets written to s3://snake-batch-monce/jobs/<hash(email)>/<model_id>/model_stripped.json. The user-prefix is required for IAM — the api Lambda only has s3:GetObject on the user's own prefix.

# snake-batch-meter Lambda — daily cron 00:30 UTC
def handler(event, ctx):
    for user in users.scan():
        prefix = f"jobs/{sha256(user.email)}/"
        total_bytes = sum(o.size for o in s3.list(prefix))
        mb_day = total_bytes / 1e6 / 30          # 1/30th of monthly cost per day
        tokens = math.ceil(mb_day * 23 / 10)        # 1 MB·day = 0.77 tokens
        charge(user.api_key_hash, tokens, kind="storage")
        log_storage_metering(user, total_bytes, tokens)

Storage is cheap — ~1 token per MB·day. 50 stored models at 200KB = 10MB × 30 days = ~230 tokens / month. Visible to the user via =ASK_BALANCE() (an Excel formula that hits /auth/balance) and on excel.aws.monce.ai/account.

9.7 Security hardening, declared upfront

ThreatMitigation
Stolen api_keySHA-256 stored, never raw. Per-key rate limit (1000 calls/min). User can rotate via taskpane → old key revoked atomically.
Magic link replaySingle-use. Hash of token stored, deleted on first use. 10-min TTL.
Email spoofingSES with DKIM + SPF. Inbound bounce protection. Disposable-email blocklist (e.g. mailinator).
Balance raceDDB ConditionExpression — atomic compare-and-decrement.
Cross-user model readS3 IAM policy restricts api Lambda to jobs/<hash(email)>/*. User cannot list or fetch another user's prefix.
Cost-bomb attackPer-user daily call cap (default 10K). Per-user dollar cap (default $50/day). Both override-able per tier. Hit the cap → 429 + email alert.
Workbook leakageCustom property monceai-api-key is excluded from =AUDIT output and from any cell-spill. The add-in scrubs it on workbook export to PDF.
Model exfiltrationEmbedded model JSON has SHA-256 in header (already in §7). Optional AES-256 at rest with key derived from workbook password.

9.8 Where this code lives

lambda_functions/
  api/handler.py            # existing — ADD: charge() inline, /v6/train wraps in try/charge
  auth/handler.py           # NEW — signup, verify, poll, rotate, balance
  billing/handler.py        # NEW — tokens_for(), charge(), refund()  (importable layer)
  meter/handler.py          # NEW — daily S3 scan, storage charging
terraform/
  auth.tf                   # NEW — users + keys tables, auth + meter Lambdas, SES domain
  iam.tf                    # NEW — per-user S3 prefix policies, billing Lambda invoke perms

excel-addin/
  taskpane/auth.html        # email entry, polling, key persistence
  functions.js              # adds Authorization header on every fetch

9.9 Provisioning cdana@monce.ai — the first record

When the auth Lambda is live, the bootstrap call is a single CLI:

aws dynamodb put-item --region eu-west-3 --table-name snake-batch-users \
  --item '{
    "email":          {"S": "cdana@monce.ai"},
    "api_key_sha256": {"S": "<sha256 of generated key>"},
    "balance_tokens": {"N": "150000"},
    "monthly_grant":  {"N": "0"},
    "tier":           {"S": "design_partner"},
    "status":         {"S": "active"},
    "created_at":     {"S": "2026-05-17"}
  }'

# Generated key (printed once, then discarded server-side):
# sk-snake-cdana-<32 random base58 chars>

The key is shown once, the SHA-256 lives in the table, and cdana@monce.ai signs in to the Excel add-in with email + the magic-link flow. Starter pack of 150,000 tokens (= $7.50 face / $1.50 AWS cost) covers a thorough evaluation. This row is not provisioned today — designed only.

11. The seamless install — one click on any Excel device

Goal: from "I want this" to "=PREDICT works in my sheet" in under 60 seconds, with no admin rights, on Windows Excel, Mac Excel, and Excel on the web. One install path. Three OSes covered.

The chosen path: Office Add-in (Custom Functions)

VBA .xlam is Windows-only and triggers IT alerts. Office Add-ins via the Custom Functions API are the only mechanism Microsoft supports across Windows desktop, Mac desktop, and Excel for the web. They install from a manifest URL or AppSource, sandbox in their own runtime, and register first-class functions with IntelliSense and async support.

PathWindowsMacWebIT-friendlyVerdict
VBA .xlampartialoften blockedbackup only
xlwings / pyxllrequires Python installpower users
Office Add-in
(Custom Functions)
signed manifest, no admin neededthis is the answer

The 3-step install

1. Click
monce.ai/excel
2. "Open in Excel"
browser hands off
3. "Trust this add-in"
one click
=PREDICT available
in every workbook

What's behind the click

# 1. The user lands on https://monce.ai/excel
#    → serves a tiny page with three buttons: Windows / Mac / Web

# 2. Windows / Mac — the page redirects to:
ms-excel:ofe|u|https://excel.aws.monce.ai/manifest.xml
#    Excel auto-launches and shows the "Trust" dialog

# 3. Excel on the web — the page calls:
Office.context.requirements.isSetSupported("CustomFunctionsRuntime", "1.4")
#    and sideloads the manifest via Insert → Add-ins → Upload My Add-in

# 4. The manifest declares:
<CustomFunctions>
  <Function>
    <Name>PREDICT</Name>
    <Description>Snake classification on a train/test range</Description>
    <Parameters> ... </Parameters>
    <Result><Type>matrix</Type><Dimensionality>matrix</Dimensionality></Result>
  </Function>
  <!-- POTENTIAL, FILL, AUDIT same shape -->
</CustomFunctions>

What ships with the manifest

AssetWhere it livesSizeWhat it does
manifest.xmlexcel.aws.monce.ai/manifest.xml~6 KBDeclares the four functions, points at functions.json + functions.js
functions.jsonsame host~3 KBFunction metadata for Excel
functions.jssame host~25 KBThe thin client — fetch() against snakebatch + Custom XML Part read/write
taskpane.htmlsame host~40 KBOptional sidebar for API key entry, usage panel, model manager

Total payload: ~75 KB. Hosted as static files on the same EC2. Excel updates them automatically on every workbook open (HTTP HEAD revalidation) — users never download a new version manually. We ship a fix at 10 AM, every workbook in the field uses it by 10:01.

The seamless paths, ranked

RankPathStepsTimeCaveat
1AppSource listingExcel → Insert → Get Add-ins → "monceai" → Add~30sRequires Microsoft store review (~2 weeks one-time)
2Direct sideload via ms-excel:ofe|u|...Click link in email/page~15sWorks today, no review needed
3Manual manifest uploadInsert → My Add-ins → Upload~60sFor locked-down corp environments
4Centralized deploymentIT admin pushes via M365 admin center0s for end userEnterprise tier — we provide the manifest URL

The "open in Excel" deep link, written out

# Windows + Mac Excel
<a href="ms-excel:ofe|u|https://excel.aws.monce.ai/manifest.xml">
  Install monceai for Excel
</a>

# Excel on the web
<a href="https://excel.officeapps.live.com/x/_layouts/16/AddIn.aspx?manifestUrl=https://excel.aws.monce.ai/manifest.xml">
  Install monceai for Excel Online
</a>

# Detection — one button, three OSes
if navigator.userAgent.includes("Mac") || navigator.userAgent.includes("Windows"):
    redirect_to("ms-excel:ofe|u|...")
else:
    redirect_to("https://excel.officeapps.live.com/...")

Backup paths, declared upfront

The pitch line: "Click here. Click 'trust'. Type =PREDICT. Done." — on Windows, on Mac, in a browser, and on a tablet. One pitch, every device.

12. The pitch to Mathieu

What this is NOTWhat this IS
Yet another LLM-in-Excel wrapper.SAT classification on factory data, with auditable reasoning per row.
A black-box prediction.Five lookalikes per cell. The user sees the past before trusting the future.
A data-science tool for analysts.A formula. Anyone who knows =VLOOKUP can use =PREDICT.
Cloud-only.Train cloud, predict local, model embedded in the file.
Generic.Monce-suite reliability overlay handles "44.2 Silence", "LGB Menuiserie SAS", factory IDs — industrial vocabulary native.
The killer feature isn't =PREDICT. It's =AUDIT. =PREDICT is the hook that gets the formula on the sheet. =AUDIT is the reason the user trusts it the second time.

13. What we'd build first

PhaseScopeTimeDemo
v0.1=POTENTIAL80/20 split, AUROC + optimal accuracy. Smallest, most demo-able formula. Proves the backend round-trip.2 daysScore the predictability of every column in a Monce quote sheet.
v0.2=FILLMissing-data magic. Select table, hit button, blanks fill in. One Snake per blank column.1 weekendLive on a real Monce quote sheet.
v0.3=PREDICT + safe storageTrain/target/test with spilled output. Datatype lock. Custom XML Part with SHA-256 + cloud fallback.1 weekPredict délai on next-week quotes from last quarter's history. Email the file — it still predicts.
v0.4=AUDITLookalikes cell, three modes. The trust layer.1 weekClick a prediction → see the 5 past orders behind it.
v0.4b=PREDICT.CANDLEOHLC mapping over lookalike values, drops into Excel Stock chart. .GLYPH, .DIST, .CI siblings.3 daysQuote-pricing sheet with candles next to predicted prix — salesperson reads risk visually.
v0.5 — Auth + billing infrasnake-batch-users + snake-batch-auth + snake-batch-meter. Magic-link signup. Atomic balance decrement. S3 per-user prefix + IAM. cdana@monce.ai with 150K tokens.1 weekLogin flow end-to-end. Balance shown in Excel taskpane. Insufficient-tokens 402 working.
v0.6 — Office Add-in shellManifest hosted on excel.aws.monce.ai. ms-excel:ofe|u| deep link. Functions registered. Auth-gated calls.1 weekOne-click install + sign-in. Works on Win, Mac, Web.
v0.7 — Ribbon"Snake" tab in Excel. Train wizard, slicer-driven re-predict, model manager, account panel.2 weeksThe TCD-style UX. No formulas required for non-power users.

14. The line that ties it all together

"Excel has =TREND for what's linear and =FORECAST.ETS for what's seasonal.
SnakeBatch gives Excel =PREDICT, =FILL and =AUDIT — for everything else."