Alexei ChernobrovovConsultant on Analytics and Data Monetization

AB test by Danish scientists on the effect of wearing masks to protect against Coronavirus (SARS-CoV-2)

I would like to share what happens when not everyone understands how to run an AB test and how to interpret its results.

Danish (not British) researchers conducted an experiment to assess whether recommending the use of a surgical mask outside the home reduces the risk of being infected with SARS-CoV-2 (Coronavirus) [1]. A total of 3030 participants were wearing masks and 2994 were not. 4,862 of the totals reached the end in this study.

Test results:

  • 42 of 2,333 (1.8%) became ill among those who wore a mask
  • 53 out of 2,529 (2.1%) got sick among those who didn't wear a mask


What conclusion did the scientists draw? There was no difference at the p=0.05 statistical significance level. Which is generally true. The study was immediately reposted by everyone who cared enough to say: "You don't have to wear a mask. There is no difference." The Medusa portal [2] discussed it in detail.


Now let's deal with two questions related to this AB test:

  1. How many experiment participants would it take, assuming the coefficients of 1.8% and 2.1% are maintained, for the study to be statistically significant?


Approximately 5 times more! In other words, 11900 for each group who reached the end of the test. Or considering that not everyone reached the end of the test about 15000 more people in each group.


  1. How much variation could be measured at this sample size to be statistically significant (at the p=0.05 level)?


In Denmark, at the beginning of the experiment, wearing masks was uncommon and not one of the recommended public health measures [1]. Therefore, it is logical to assume that it was possible to get a good estimate of the second group (2.1%). Consequently, in group one, the minimum value should have been 1.45% (instead of 1.8%) to be able to say that there is a statistically significant difference. But if that were true, it would mean that on average the probability of contracting coronavirus while wearing a mask is reduced by almost 30%. Such an assumption seems pretty fantastic, in my opinion.


What conclusions can be drawn? Unfortunately, not everyone can still plan an AB test, well. And running an AB test on small samples is almost guaranteed to result in "No difference at the p=0.05 statistical significance level." Which in turn often means that there was an insufficient sample size, so no decision can be made. Not that there is no difference between the two.


In order to carry out an AB test it is necessary at least to have an assumption on the possible effect and on this basis to calculate the sample on which to carry out the experiment. Otherwise, likely, the AB test will not provide any new information.