Nicholas Borszich

I'm Nicholas, an engineer in Los Angeles. I build production systems on top of frontier LLM APIs: currently a Bayesian A/B testing app and a multi-stage Claude vision pipeline behind a CRO audit. I write here about the engineering, statistics, and design decisions behind what I ship.

My anomaly detector was asking Claude to do the statistics
July 16, 2026

The anomaly detector I shipped hands Claude 21 days of raw conversion numbers and asks 'is anything weird.' That's asking a language model to eyeball a changepoint. This is the rebuild: a deterministic pass does the detection, and Claude's job narrows to the one thing it's actually good at.
Multi-rubric LLM scoring over a noisy data stream
May 25, 2026

How I split the judgment problem inside a job-discovery pipeline into a deterministic gate, a routing classifier, a per-line rubric, and a structured requirements extractor — and why one composite score would have hidden everything that matters.
Grading a Claude pipeline I'd already shipped: two scorer types, one vision audit
May 19, 2026

How I'm grading a multi-stage Claude pipeline I'd already shipped: exact-match scorers for classification, three independent LLM-as-judge rubrics for the vision audit, and why a single composite quality score would have hidden every failure I cared about.
One A/B testing product, two very different worlds: building for Shopify and the open web
May 12, 2026

What it actually takes to run one A/B testing product on both Shopify and arbitrary HTML sites: auth, event ingestion, available data, and why a half-dozen switch statements beat the interface I almost wrote.
Three boring Claude features inside a stats app, and the patterns that made them ship
May 5, 2026

LLM features that sit quietly inside a SaaS product: a pre-launch reviewer, an async anomaly detector that returns strict JSON, and a cached post-experiment analyzer. The wrapper is 100 lines.
Bayesian A/B testing in 200 lines of Go: what 5,000 samples actually buys you
April 28, 2026

A walkthrough of a production Bayesian A/B testing engine: Beta-Binomial for conversion, Normal for revenue, Monte Carlo sampling, and the LiftDistribution trick that makes credible intervals on dollars interpretable.
Read-back is the part I don't generate
April 21, 2026

I generate most of the code for a NetSuite-to-Shopify migration with Claude. The part I write by hand is the read-back: the code that knows what 'wrong' looks like in this specific catalog, where a bad write looks exactly like a good one.
The parts of E2E tests Claude can't write for you
April 1, 2026

I generated most of a 6,000-line Playwright suite for a Shopify app with Claude. The parts that mattered, the regression assertions and the comment about not trusting non-deterministic ingestion, are the parts I had to put back by hand.

Posts