Influence of Experimenters on Results Less Strong than Expected

For more than 10 years now, scientists have been discussing the so-called reproducibility crisis: often, scientific findings cannot be reproduced at a later time and/or in other laboratories, although the studies are carried out under highly standardized conditions. Thereby, standardization includes for example the use of genetically identical animals, keeping the animals in identically equipped cages, and carrying out the experiments in always the same way. To uncover sources of poor reproducibility, researchers usually try to identify potential confounding factors in the experimental conditions. Thereby, confounding factor Number One is the experimenter—in other words, the person conducting the experiment. A team headed by behavioral biologists Dr. Vanessa von Kortzfleisch and Professor Helene Richter from the University of Münster (Germany) has now studied precisely this factor in behavioral experiments on mice carried out simultaneously at three different locations. Their study has now been published in the journal PLOS Biology.

To the researchers’ surprise, the influence of different experimenters on the test results was not as pronounced as earlier studies suggested. By contrast, the researchers detected other confounding factors. Thus, what plays a much greater role than the experimenter was the factor “laboratory.” Most importantly, however, most variation was explained by inexplicable differences between the individual mice. More precisely, this proportion of “unexplained variance” in the data was between 41 and 72 percent. “This is especially surprising,” says lead author von Kortzfleisch, “when you consider that the animals were tested under highly standardized conditions within the same testing cohort—in other words, by the same experimenter in the same lab and under exactly the same conditions.”

The results do certainly not mean that the experimenter does not represent a decisive factor. What they do indicate, though, is that the different test conditions in the labs—despite standardized conditions—have a considerably greater influence on the outcome than the experimenter. These conditions might include for example small differences in ambient sounds or smells. “But what our results show above all is that biological variation plays a key role in animal research—even when the animals come from inbred lines. In future, we will need better strategies for integrating this variation in a controlled way into the experimental design,” says von Kortzfleisch.

Twelve experimenters at three locations

The background: Contrary to the dogma of strict standardization, there are alternative suggestions for integrating variation systematically into the experimental design to improve reproducibility. In order to investigate whether involving multiple experimenters in a single study can increase the external validity, and hence improve the reproducibility of the outcome, this latest study was conducted by twelve different experimenters in Münster, Osnabrück and Bern, all carrying out the same behavioral test battery with mice of two inbred strains. Such phenotyping experiments are widely used in biomedical research to study the effects of different genotypes on the animals’ behavior and, thereby, draw conclusions about the genetic basis of certain human diseases. For example, in a so-called Open-Field test, researchers check whether a mouse is more or less anxious when exploring a new environment.

Specifically, the team of researchers investigated whether a strictly standardized experimental design, in which all the animals are tested by one experimenter, differs in terms of reproducibility from an experimental design in which the animals are tested by multiple experimenters. The team compared the experimental designs to see which of them yielded the more consistent results across the three different laboratories. In addition, the researchers investigated which other influencing factors might explain the variation in the data. One result was that at all three locations the researchers were not able to reproduce some of the results, regardless of whether the experiment was conducted by just one or several experimenters.

Besides the team from the Department of Behavioural Biology at Münster, other researchers involved in the study are from the Universities of Osnabrück and Bern (Switzerland), the University of Veterinary Medicine in Vienna (Austria), and the AstraZeneca company in Cambridge (UK).