How to A/B Test Your Email (the Right Way)

Why Most A/B Tests Tell You Nothing

variable per test, always

95%

confidence before you call a winner

100s

of conversions per variant, not per send

A/B testing replaces opinions with data. That is the whole point.

But most brands run tests that prove nothing. Two variables changed at once. A sample of 200. A "winner" called on a coin-flip gap.

That is not testing. That is guessing with extra steps.

Done right, testing compounds. Each real win becomes your new control, and the next test builds on top of it.

That is how a program gets better every month instead of drifting on gut feel.

Test One Variable at a Time

This is the whole game.

Change the subject line and the hero image in the same test, and version B wins. Now you have no idea which change did it.

You cannot ship the win because you do not know what the win was.

Here is what this looks like in practice.

Example A/B test setup — Test one variable at a time so you know what moved the needle.

One change. One clean read. Then the winner becomes your control and you test the next thing against it.

What to Test, and in What Order

Test the things that move the biggest numbers first.

A subject line change lifts opens across your whole send. A CTA tweak only touches the people who already opened.

Start high in the funnel and work down.

What to test	Expected impact
Subject line and preview text	High. Decides your open rate on the entire send.
Send time and day	Medium to high. Shifts opens and early clicks.
Offer or angle (discount vs value)	High. Changes who converts, not just who clicks.
Hero section and first image	Medium. Drives whether openers keep scrolling.
CTA copy and button placement	Low to medium. Refines conversion on engaged readers.

Work top to bottom. There is no point optimizing button color if half your list never opens the email.

The rule: match the metric to the variable, or you will crown a false winner.

Pick the metric that matches the variable

Judge a subject line test on open rate. Judge an offer or CTA test on clicks and revenue per recipient. Reading the wrong metric gives you a wrong winner.

Size the Test So the Result Is Real

A gap between two variants means nothing until you have volume behind it.

A 0.4 point difference on 200 sends is noise. The same gap on 20,000 sends might be real.

The fix: aim for a few hundred conversions per variant, not a few hundred sends. Wait for your platform to report roughly 95% confidence before you call it.

If the test never reaches significance, that is your answer too. The change did not matter enough to ship.

Flows vs Campaigns

Campaigns are one-shot. You split the send, read the result, and the test is over. Good for subject lines, offers, and send times where you have list volume in a single blast.

Flows are always-on, so they test differently and better. You set version A against version B inside the automation, let real traffic split over weeks, and the sample builds itself.

Because a flow runs forever, even a small win keeps paying out every day.

Campaign tests

One send, one read
Best for subject line, offer, send time
Needs list volume in a single blast

Flow tests

Runs continuously, sample builds over time
Best for hero, CTA, email order, timing
Small wins compound every day

Document Every Win So It Compounds

A test you do not write down is a test you will run again in six months.

Keep a simple log: what you tested, the two versions, the numbers, the confidence level, and what you shipped.

That log does two things. It stops you from re-testing settled questions, and it turns scattered wins into a playbook.

When a new team member asks why your emails send at 8am, the answer is a line in the doc, not a shrug.

When Not to Test

Low volume kills tests

If a split will not produce a few hundred conversions per variant in a reasonable window, do not run it. You will chase noise and ship random winners.

Small lists should not A/B test every send. You will never reach significance, so you end up making changes on luck.

Instead, apply proven best practice, grow the list, and save formal testing for the moments where you have the volume to get a clean read.

Common Mistakes

Changing two things at once. You cannot tell which one won, so you cannot ship the win.
Calling it too early. Wait for 95% confidence and enough conversions, not a first-hour lead.
Measuring the wrong metric. Judge subject lines on opens, offers on revenue per recipient.
Testing on tiny lists. No volume means no significance means no real answer.
Not documenting results. Undocumented wins get re-tested and quietly lost.
Testing trivia first. Optimize opens and offers before you touch button color.

Get Expert Help

Our team runs disciplined tests across dozens of DTC brands, so we know which variables move revenue and which just waste a send.

If you want a program built on real data instead of guesswork, we can help.

See our pricing | Apply to work with us

How to A/B Test Your Email (the Right Way)

Why Most A/B Tests Tell You Nothing

Test One Variable at a Time

What to Test, and in What Order

Size the Test So the Result Is Real

Flows vs Campaigns

Document Every Win So It Compounds

When Not to Test

Common Mistakes

Get Expert Help

Need help implementing this?

Join 2,000+ ecommerce strategists