How to A/B Test Your Email (the Right Way)
Most brands A/B test wrong: too many variables, too small a sample, no real conclusion. Learn the disciplined way ZHS tests so wins actually compound.
Why Most A/B Tests Tell You Nothing
A/B testing replaces opinions with data. That is the whole point.
But most brands run tests that prove nothing. Two variables changed at once. A sample of 200. A "winner" called on a coin-flip gap.
That is not testing. That is guessing with extra steps.
Done right, testing compounds. Each real win becomes your new control, and the next test builds on top of it.
That is how a program gets better every month instead of drifting on gut feel.
Test One Variable at a Time
This is the whole game.
Change the subject line and the hero image in the same test, and version B wins. Now you have no idea which change did it.
You cannot ship the win because you do not know what the win was.
Here is what this looks like in practice.

One change. One clean read. Then the winner becomes your control and you test the next thing against it.
What to Test, and in What Order
Test the things that move the biggest numbers first.
A subject line change lifts opens across your whole send. A CTA tweak only touches the people who already opened.
Start high in the funnel and work down.
| What to test | Expected impact |
|---|---|
| Subject line and preview text | High. Decides your open rate on the entire send. |
| Send time and day | Medium to high. Shifts opens and early clicks. |
| Offer or angle (discount vs value) | High. Changes who converts, not just who clicks. |
| Hero section and first image | Medium. Drives whether openers keep scrolling. |
| CTA copy and button placement | Low to medium. Refines conversion on engaged readers. |
Work top to bottom. There is no point optimizing button color if half your list never opens the email.
The rule: match the metric to the variable, or you will crown a false winner.
Judge a subject line test on open rate. Judge an offer or CTA test on clicks and revenue per recipient. Reading the wrong metric gives you a wrong winner.
Size the Test So the Result Is Real
A gap between two variants means nothing until you have volume behind it.
A 0.4 point difference on 200 sends is noise. The same gap on 20,000 sends might be real.
The fix: aim for a few hundred conversions per variant, not a few hundred sends. Wait for your platform to report roughly 95% confidence before you call it.
If the test never reaches significance, that is your answer too. The change did not matter enough to ship.
Flows vs Campaigns
Campaigns are one-shot. You split the send, read the result, and the test is over. Good for subject lines, offers, and send times where you have list volume in a single blast.
Flows are always-on, so they test differently and better. You set version A against version B inside the automation, let real traffic split over weeks, and the sample builds itself.
Because a flow runs forever, even a small win keeps paying out every day.
- One send, one read
- Best for subject line, offer, send time
- Needs list volume in a single blast
- Runs continuously, sample builds over time
- Best for hero, CTA, email order, timing
- Small wins compound every day
Document Every Win So It Compounds
A test you do not write down is a test you will run again in six months.
Keep a simple log: what you tested, the two versions, the numbers, the confidence level, and what you shipped.
That log does two things. It stops you from re-testing settled questions, and it turns scattered wins into a playbook.
When a new team member asks why your emails send at 8am, the answer is a line in the doc, not a shrug.
When Not to Test
If a split will not produce a few hundred conversions per variant in a reasonable window, do not run it. You will chase noise and ship random winners.
Small lists should not A/B test every send. You will never reach significance, so you end up making changes on luck.
Instead, apply proven best practice, grow the list, and save formal testing for the moments where you have the volume to get a clean read.
Common Mistakes
- Changing two things at once. You cannot tell which one won, so you cannot ship the win.
- Calling it too early. Wait for 95% confidence and enough conversions, not a first-hour lead.
- Measuring the wrong metric. Judge subject lines on opens, offers on revenue per recipient.
- Testing on tiny lists. No volume means no significance means no real answer.
- Not documenting results. Undocumented wins get re-tested and quietly lost.
- Testing trivia first. Optimize opens and offers before you touch button color.
Get Expert Help
Our team runs disciplined tests across dozens of DTC brands, so we know which variables move revenue and which just waste a send.
If you want a program built on real data instead of guesswork, we can help.
Need help implementing this?
We build and manage complete email & SMS programs for DTC brands. Get a custom plan for your brand.
Apply Now