Li Tan — Essays

The Hallucinations Hiding in My Pipeline

li.tan83033@gmail.com (Li Tan) — Fri, 10 Apr 2026 00:00:00 GMT

Last month I asked Claude to write a dbt model for me. The SQL looked clean. The join made sense. I shipped it.

Three weeks later I found out it was silently dropping 8% of the rows. An INNER JOIN where I needed a LEFT JOIN. Nobody caught it. The dashboard kept looking fine because the missing 8% happened to not move the top-line number much.

This is the problem with AI in data work. It does not know what it does not know. And it sounds confident either way.

Where it bites me

Two places, every week.

ETL. Joins on the wrong key. Filters that look right but silently drop edge cases. Type conversions that cast a timestamp to a date and kill the timezone. Window functions that use ORDER BY in a way that is "correct" but not what you meant. The code runs. The numbers come back. They look fine. They are not fine.

Modeling. Feature engineering with leakage. "Cross-validation" written in a way that sees future data. A regression that drops rows with nulls without telling you. A metric computed on a different grain than the label. The model trains. The AUC is high. It will fail on real data and you will find out in production.

The common thread: plausibility is not correctness. AI has gotten very good at plausible. A wrong answer that reads well is more dangerous than a wrong answer that reads badly, because you skip the review.

Your review is the whole job now

Here is what I keep telling junior people on my team:

The AI is only as good as you are at reviewing it.

A senior person with AI ships 10x faster. A junior person with AI ships 10x the bugs. Same tool, opposite outcome. The difference is not the prompt. It is the eye.

When I review AI output I do the same things I would do for any untrusted intern:

Read every line. Not skim. Read. Say what it does in my own words.
Run it on a known case. One I already know the answer to. Does it match?
Check row counts before and after. Surprisingly often, the bug is right there.
Look for silent failures. Missing keys, dropped nulls, implicit casts. These never throw. They just eat your data.
Ask "what did this assume". AI makes assumptions. It will not tell you. You have to dig them out.

This takes time. It is the job.

The new skill

People talk about prompt engineering like it is the thing to learn. It is not. The thing to learn is faster error detection. How quickly can you spot the lie in a page of generated code? How quickly can you feel the wrongness in a plausible-looking number?

That skill is not new. It is the old skill — reading code, knowing your data, having taste. The AI just raises the stakes. You now produce more code per day, which means more chances to be wrong per day.

I still use AI every day. I would not go back. But I have stopped treating its output as an answer. I treat it as a first draft from an intern who is very fast, very articulate, and sometimes completely wrong. My job is the red pen.

If you are not running that red pen, the AI is not helping you. It is just helping you be confidently incorrect, faster.

Beyond A/B Testing: How I Analyze Real Product Impact Without an Experiment

li.tan83033@gmail.com (Li Tan) — Sat, 26 Jul 2025 00:00:00 GMT

In an ideal world, every product change would be tested with a randomized controlled trial. Reality is messier. Sometimes you cannot randomize — the feature already shipped to everyone, legal will not let you hold out a group, or the sample is too small for the test to have any power.

When I am in that situation, I go to quasi-experimental methods. Here is my playbook.

The problem with observational data

The hard problem is confounding. People who use a new feature are not the same as people who do not. Maybe they are more engaged, more tech-savvy, or signed up during a specific campaign. Comparing adopters and non-adopters tells you almost nothing about the feature itself.

I see teams make this mistake a lot. They announce a "win" that is really just selection bias. Nobody checks.

Method 1 — Difference-in-Differences

When a feature rolls out at different times to different groups — say, by region or by platform — DiD can work well. The key assumption is parallel trends: treated and control groups would have moved together without the treatment.

Always plot the pre-trends. If they are not parallel, DiD will mislead you. As the slider above shows, even small violations produce meaningful bias in the naive comparison.

# Simplified DiD
import statsmodels.formula.api as smf

model = smf.ols("outcome ~ treated * post + C(group) + C(time)", data=df)
results = model.fit()
# The treated:post coefficient is your treatment effect

Method 2 — Synthetic control

When you have one treated unit and many possible controls, synthetic control builds a weighted mix of the controls that best matches the treated unit before the intervention. Then you measure the gap after.

I use this a lot for geo experiments. It handles the noise of real markets better than a simple comparison.

Method 3 — Regression discontinuity

If treatment is assigned by a threshold (e.g., users above some engagement score get the feature), RD uses the jump at the threshold. Users just above and just below the cutoff are nearly identical, so you get local randomization for free.

This one is underused. A lot of product features have natural cutoffs that nobody exploits.

When to use what

Method	Best when	Key assumption
DiD	Staggered rollout	Parallel trends
Synthetic Control	Single treated unit	Pre-treatment fit
RD	Assignment cutoff	Continuity at cutoff

Which method fits your case?

Answer a few questions below and I'll point you at the one that fits.

Bottom line

No method is perfect. The best you can do is combine several and see if they tell a consistent story. When they disagree, that is usually where the interesting learning happens — it means you are missing something about the underlying dynamics, and the disagreement is a hint about what.

From Insights to Actions

li.tan83033@gmail.com (Li Tan) — Wed, 21 May 2025 00:00:00 GMT

The hardest part of analytics is not finding the insight. It is getting someone to act on it.

The dead-slide problem

Every analyst has been here. You spend two weeks on a careful analysis. You present. Everyone nods. Nothing changes.

I have watched this happen many times. Including with my own work. Here is why it keeps happening.

The insight has no context

"Retention dropped 5%." OK. So what?

Is that normal for this season?
How does it compare to competitors?
What is it in dollars?

Data does not speak. You have to frame it in something the audience actually cares about — revenue, cost, a strategic goal.

There is no owner

"Someone should look into this." This is where insights go to die.

If there is no specific person accountable for doing something, nothing happens. End every analysis with a recommendation and a named owner. Even when it feels awkward to point at someone, do it.

Too many findings

Twenty bullet points dilute everything. Execs have limited bandwidth. They cannot act on all of it.

Pick the 1–3 that matter most. Lead with those. Put the rest in the appendix if you have to keep it.

ARIA

I use a short checklist before I present any insight.

Actionable — can someone actually do something about it?
Relevant — does it connect to a current priority?
Impactful — is the number big enough to matter?
Assigned — is there an owner and a timeline?

If it fails any of these, it is not ready.

Playing the long game

Driving action is not about one great analysis. It is about credibility that accumulates:

Start small. Prove value on quick wins before tackling big questions.
Follow up. Did the recommendation actually get implemented? What happened?
Be honest about uncertainty. People trust analysts who admit what they do not know.
Learn the business. The best analysts understand operations, not just data.

The goal is not to be right. The goal is to make the business better. Sometimes that means accepting — a "good enough" analysis that actually changes behavior is worth more than a perfect one that gets ignored.

A Few Thoughts on DMA Tests

li.tan83033@gmail.com (Li Tan) — Wed, 21 May 2025 00:00:00 GMT

DMA tests — experiments at the Designated Market Area level — are one of the most powerful tools for measuring marketing incrementality. They are also easy to screw up. Here are the lessons I paid for.

Why DMA tests

User-level attribution is narrow. It misses a lot: word of mouth, cross-device conversions, brand effects that show up weeks later.

DMA tests catch more of the full picture. You treat whole markets, not cookies, so you pick up:

Brand lift that takes weeks to show
Cross-device conversions
Social and word-of-mouth spillover
The full funnel from awareness to purchase

This is why I keep going back to them, even though they are a pain.

Common ways to mess this up

Not enough markets

Power in geo tests comes from the number of geo units, not total users. With 20 DMAs you need a huge effect to see anything.

Rule of thumb: at least 50 markets. Ideally 100. I know this limits the channels you can test. Underpowered tests are worse than no test — you get a non-answer you then defend.

Spillover between markets

People travel. Digital ads ignore borders. If your "holdout" market is leaking treatment, your estimate shrinks toward zero.

What I do: buffer zones, exclude border DMAs, or model the spillover explicitly. I have learned to be paranoid.

Ignoring seasonality

A November test tells you little about February. Marketing effects move with the calendar.

Run long enough to cover a cycle. Or use methods that handle seasonality directly.

Why I prefer synthetic control

Simple treatment-vs-control comparisons work sometimes. But markets are heterogeneous and trends differ by region, so the comparison is usually noisy.

Synthetic control builds a weighted mix of untreated markets that best matches the treated market before the campaign. Then you measure the gap after.

It handles:

Different baselines across markets
Region-specific trends
Noisy outcomes

I get much cleaner reads from synthetic control than from naive DiD for geo tests.

What I would tell a new analyst

More markets beats more users per market.
Design for spillover from day one. Not as an afterthought.
Pre-register the analysis. Stops you from p-hacking yourself later.
Try synthetic control before DiD.
Run a power calculation. Every time.

If you skip the power calc, you are setting yourself up for an inconclusive result you will then have to explain.

Is AI Ready to Replace Marketing Data Analysts?

li.tan83033@gmail.com (Li Tan) — Tue, 20 May 2025 00:00:00 GMT

Every week there is a new headline about AI replacing analysts. I use AI tools every day. My own output has maybe doubled because of them. So here is my honest take.

What AI is actually good at

Summarizing data. Give it a CSV, it can tell you the trends, the outliers, the patterns — faster than I can eyeball them. I use this a lot in the first hour of any new project.

Writing code. SQL, Python, viz — 30–40% faster for routine stuff, sometimes more. I have stopped writing boilerplate by hand. It was never the interesting part anyway.

Drafting writing. Methodology docs, reports, slide outlines. If you can prompt well, you get a decent first draft. You still need to edit. But the blank page problem is gone.

What AI is still bad at

Causal reasoning. It will find correlations all day. Ask it why a metric moved, or what would have happened if we had not launched the feature — it gives you something that sounds right. Usually it is wrong in a subtle way you have to know the domain to see.

Business context. AI does not know your CEO just pivoted last month. It does not know marketing and sales are not talking to each other. It does not know the reason the old metric was replaced was political, not analytical. Context matters a lot, and there is no prompt that fixes this.

Problems that are actually new. If the problem is something that was not common in the training data, AI guesses. And the guess sounds confident. This is the category I worry about most.

Stakeholder work. Convincing a skeptical VP. Pushing back on a bad decision. Reading the room. Still human work. Probably always will be.

What I actually think

AI will not replace analysts. But analysts who use AI will replace those who do not. That line is a cliché now. It is also true.

The winning split:

AI does the speed, the scale, the routine, the first draft
You do the judgment, the strategy, the relationships, the novel problems

The job title stays the same. The actual work shifts upward — better questions, better designs, more time on change management and less on pulling data.

My advice

Learn the tools deeply. Not just prompts — know what they can and cannot do.
Double down on judgment. That is the thing that does not get replaced.
Build relationships. Your value is going to come more and more from influence, not output.
Stay curious. The ground is moving fast.

I have bet my career on this being right. Still betting.

Between MTA and LTA

li.tan83033@gmail.com (Li Tan) — Mon, 19 May 2025 00:00:00 GMT

The attribution debate never ends. Here is how I think about it.

The core problem

A user sees a Facebook ad, clicks a Google ad, reads a blog post, then converts. Who gets credit?

Last-Touch (LTA). Google gets 100%.
First-Touch (FTA). Facebook gets 100%.
Multi-Touch (MTA). Some weighted split.

None of these is "correct." They are all models with different assumptions.

Why LTA sticks around

LTA gets called simplistic a lot. It has real advantages too:

Simple. Easy to explain, easy to build.
Actionable. Clear signal for what to optimize.
Conservative. Tends to favor the lower-funnel, high-intent channels.

For businesses with short consideration cycles, LTA is usually enough. I have seen teams make this too complicated when LTA would have done the job.

When MTA helps

MTA is more useful when:

The consideration cycle is long (B2B, big-ticket consumer).
You invest heavily upper-funnel (brand, content).
Customer journeys are complex — multiple devices, channels, touchpoints.

MTA tries to credit each touchpoint based on its contribution to the conversion.

The dirty secret

Here is the thing: MTA does not measure incrementality.

MTA answers: "what touchpoints appeared in the journeys that converted?"

It does not answer: "what would have happened without those touchpoints?"

A user who was going to convert anyway still has touchpoints. MTA credits them regardless. That is why I always want to calibrate MTA with an actual experiment.

A better framework

Instead of arguing about attribution models, ask:

What decision am I making?
- Reallocating budget across channels → you need incrementality testing.
- Optimizing within a channel → platform attribution is probably fine.
- Understanding journeys → path analysis.
How accurate do I need it?
- Directionally correct → LTA is often enough.
- Precise → you need experiments.
What can I actually test?
- Run holdout experiments where you can.
- Use geo tests for channels that cannot be randomized at the user level.

Where I land

Attribution models are good for monitoring and directional optimization. For budget allocation, they should be calibrated against experimental evidence.

The best measurement stack I have seen is a combination:

Attribution for day-to-day monitoring
Incrementality tests for calibration
MMM for overall budget allocation

No single method gives you truth. Triangulating across methods gets you closer.

See it for yourself

Build a customer journey below and watch how five different attribution models split a single conversion. Same data, radically different stories — which is exactly why debating models without defining the decision is a waste of breath.

Quick Walkthrough on Anomaly Detection

li.tan83033@gmail.com (Li Tan) — Mon, 19 May 2025 00:00:00 GMT

Anomaly detection sounds fancy. In my work, 80% of it is simple stuff. A z-score, a rolling window, done.

The problem

You have a metric. Revenue, signups, errors, whatever. You want to know when something is off. Not just "number went up" — number went up more than it should have.

Method 1 — Simple z-score

Compare today to the historical mean and std:

z_score = (current_value - historical_mean) / historical_std
is_anomaly = abs(z_score) > 3

Use it for: stable metrics, no strong seasonality.

Does not handle: trends, weekly patterns.

Method 2 — Rolling stats

Let the baseline move with the data:

rolling_mean = df["metric"].rolling(window=28).mean()
rolling_std = df["metric"].rolling(window=28).std()
z_score = (df["metric"] - rolling_mean) / rolling_std

Use it for: metrics with gradual drift.

Still bad at: seasonality.

Method 3 — Decompose the series

Split into trend, seasonal, residual. Flag the residual.

from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(df["metric"], period=7)
residuals = result.resid
# z-score on the residuals

Use it for: clear weekly or monthly patterns.

Method 4 — Prophet

Prophet gives you intervals out of the box:

from prophet import Prophet

model = Prophet(interval_width=0.99)
model.fit(df)
forecast = model.predict(df)
# Anything outside yhat_upper/yhat_lower is a candidate anomaly

Use it for: messy seasonality, holidays.

What actually matters

Start simple. Z-scores solve more problems than people want to admit. I start there every time.
Tune the threshold. 3-sigma is a default, not a rule. If you hate false positives, move it.
Handle missing data. Detectors hate gaps. Fill them on purpose, not by accident.
Alert fatigue is real. Better to miss a few than cry wolf every day. I learned this the hard way.
Check before you alert. Most "anomalies" have boring explanations.

The meta-problem

The hard part of anomaly detection is not the math. It is knowing what counts as an anomaly you would actually act on.

Work backwards: "If we saw this alert, what would we do?" If the answer is nothing, do not build the alert.

A Brief Intro to Building MMM with Agencies

li.tan83033@gmail.com (Li Tan) — Sun, 18 May 2025 00:00:00 GMT

Building MMM with an external agency is a common path. The relationship needs care though, or the model ends up serving someone else's needs. I have been on both sides of this. Here is what I learned.

Why use an agency

Agencies bring:

Specialized skills and tools
Benchmarks across clients
Extra bandwidth when your team is stretched
Political cover for findings nobody wants to hear

Those are real reasons.

Red flags

Black box methodology

If the agency cannot explain the model spec, the coefficients, and how they validated it — walk away. You need to understand what is driving the results.

I once inherited a model nobody internally understood. When the numbers stopped matching reality, we had nothing to debug with. It was painful and slow.

Too much precision

MMM has wide confidence intervals. Always. If an agency tells you "TV drove exactly $4.2M in incremental revenue" with no uncertainty bounds, they are selling. Be skeptical.

Optimizing for the CMO's mood

Some agencies tune the model until it shows what the client wants to see. Insist on pre-registered specs. If they push back hard, that tells you something.

What to ask for

Full model documentation — functional forms, priors, variable transforms
Holdout validation — out-of-sample accuracy, not just in-sample fit
Sensitivity analysis across reasonable parameter ranges
Raw output files, not just a polished deck
Access to the code, if they will give it

How to make the relationship work

The best agency engagements feel like partnerships. You share business context. You explain why a particular result would be surprising. You push back when something does not make sense.

Your job is not to accept deliverables. It is to understand the model well enough to defend or critique what it implies.

The agencies I have had the best results with welcomed the pushback. The ones who got defensive about it — I did not work with them again.

Experience Reporting to VPs and Above

li.tan83033@gmail.com (Li Tan) — Sun, 18 May 2025 00:00:00 GMT

Early in my career I made every mistake possible when presenting to execs. Here is what I learned the painful way.

Mistake 1 — Starting with the method

I used to open with data sources, model spec, validation approach. Carefully.

Execs do not care. At least not yet. They want to know: what should we do, and why?

Fix: lead with the recommendation. Method goes in the appendix or in the follow-up question.

Mistake 2 — Too much precision

"Revenue will increase by 3.7%, 95% CI [2.1%, 5.3%]."

What they hear: "probably 2–5%."

Fix: round. Use ranges when you are unsure. Focus on the decision, not the decimals.

Mistake 3 — Answering too literally

VP: "What caused the Q3 drop?"

Me, before: 20 minutes on every contributing factor.

Me, now: "Three things. Seasonality, a product bug we fixed in October, and more competition. The product bug was 60% of it."

Fix: give the headline first. Offer to go deeper if they want it.

Mistake 4 — Not knowing the business context

I once presented a "technically perfect" analysis that recommended something the CEO had publicly rejected two months earlier. That meeting was awkward.

Fix: before any exec presentation, talk to their team. Know what they are focused on, worried about, and have already decided.

What actually works

Pyramid principle

Situation. One sentence of context.
Complication. The problem or question.
Resolution. Your recommendation.
Support. 2–3 key points.

Everything else is backup material.

Anticipate their questions

Execs will ask:

"What is the business impact?"
"How sure are you?"
"What could go wrong?"
"What do we do next?"

Have sharp answers ready. If you do not, you are not ready to present.

Make the decision easy

Do not present five options and ask them to choose. Present a recommendation with clear reasoning. They will push back if they disagree. That is fine.

Respect the time

If you get 30 minutes, plan 15 for presentation and 15 for discussion. Execs want to engage, not just listen.

The real lesson

Technical excellence is the entry ticket. At senior levels, your value comes from:

Asking the right questions.
Communicating clearly.
Building trust over time.

The best analysts I know spend as much time on communication as on analysis. I wish someone had told me this earlier.

Is Marketing an Art or Science?

li.tan83033@gmail.com (Li Tan) — Thu, 15 May 2025 00:00:00 GMT

"Is marketing an art or a science?" I get asked this at every conference.

My answer: both. The ratio depends on what you are measuring.

The art side

Some parts of marketing are really creative:

Brand building. You cannot A/B test your way to emotional resonance.
Storytelling. A good narrative comes from human insight, not an algorithm.
Cultural timing. Knowing when to say something is instinct.
Creative execution. The gap between "fine" and "iconic."

The best marketers I know have instincts data cannot replicate.

The science side

Other parts are measurable:

Media buying. Bids, targeting, frequency caps.
CRO. Test, measure, iterate.
Channel allocation. Compare ROI across touchpoints.
Pricing. Run an experiment, get elasticity.

Here, rigor beats intuition. Discipline wins.

The messy middle

Most marketing lives in between:

A great creative idea and smart media placement.
Emotional brand messaging and a conversion-optimized landing page.
Intuitive campaign timing and a rigorous post-mortem.

The best marketing orgs do not choose. They integrate.

What this means for measurement

Here is the important part: you cannot measure art the same way you measure science.

If you try to prove brand ROI with the same rigor as performance marketing, you get:

Underinvestment in brand, because it is harder to measure
Overfitting on short-term metrics
Creative that is "data-driven" and forgettable

Instead:

Measure what you can measure rigorously
Use proxies and judgment for what you cannot
Do not let measurability decide strategy — this is where a lot of data teams go wrong

How I think about the mix

Activity	Mix	How to measure
Brand campaigns	70/30 art	Brand tracking, long-term lift
Content marketing	60/40 art	Engagement, assisted conversions
Performance marketing	30/70 science	Direct attribution, ROAS
Pricing, offers	20/80 science	A/B testing, elasticity

Bottom line

Art vs. science is a false choice. Good marketing needs both.

The analyst's job is not to kill art with data. It is to help the org make better bets — by being clear about what we know, what we do not, and what we are guessing.

Sometimes that means rigorous experiments. Sometimes it means trusting a talented marketer's instincts. Knowing when to do which is the actual skill.