Replication/Reproducibility

One component of what makes the field of psychology in particular so susceptible to controversy, distrust and skepticism is the inability of many studies to be successfully replicated and reproduced. Before going further, however, it is important to note the distinction between replication and reproduction; terms which are often used interchangeably. Chris Drummond distinguishes the two by explaining that the principal difference is replicability requires everything to be the same, while reproducibility depends on changes (1 ). To further elaborate, the goal of replicating an experiment would be to “obtain an identical result when [the] experiment is performed under precisely identical conditions” while reproducibility “refers to a phenomenon that can be predicted to recur even when experimental conditions may vary to some degree” (2). For some scientists, reproducibility is more desired because if an experiment’s results can be replicated without the conditions of the experiment remaining the same, it makes the original finding appear much more legitimate . However, in other fields replication prevails, for when results are argued to be incorrect, a successful replication of an experiment can prove such an argument to be misguided. Both reproduction and replication nonetheless, have proven to be immensely important in science, for not only aiding in new discoveries, but making original ones appear more valid .

The focus of this chapter will be primarily on replication, which seems to face the bulk of scrutiny. Replication is often seen as the standard by which scientific findings are assessed; however, scientists often do not replicate their studies in order to prove their validity. When studies are attempted to be replicated but are unsuccessful, this creates doubt not only concerning the particular experiment involved, but skepticism surrounding the field of psychology as a whole . There are two main points that should be taken away from this chapter. The first being that replication and reproducibility do not occur as often as they should ; however, by replicating and reproducing more experiments, results can be more confidently regarded as the truth. The second being that although experiments often fail to be replicated, this does not always indicate that the original experiment was flawed and its results were false. It is important to not assume that most discoveries are inaccurate, and by extension, psychology is an untrustworthy science. If the scientific community takes more steps towards more replications of studies, then hopefully the misconception that failed replications indicate false results can be erased.

One issue that arises in the field of psychology regarding the replication of studies is that they are rarely done. There are many reasons for scientists being reluctant to replicate studies such as being more interested in making their own discoveries as opposed to confirming old ones, and wanting to make a name for themselves (which for the most part does not come from double checking the work of another scientist). One author of the research study Estimating the Reproducibility of Psychological Science summed up this feeling experienced by scientists, writing, “Reproducibility is not well understood because the incentives for individual scientists prioritize novelty over replication” (3). Replicating studies is often seen as tedious and boring, and while scientists are for the most part on a consensus that replication is very important because it is considered “the scientific gold standard” (3), they are unwilling to devote the time to replicating studies because it is not efficient. Furthermore, replicating experiments can be very time consuming and costly. Mina Bissell writes “it is sometimes much easier to not replicate than to replicate studies, because the techniques and reagents are sophisticated, time-consuming and difficult to master. In the past ten years, every paper published on which I have been a senior author has taken between four and six years to complete, and at times much longer. People in my labs often need months – if not a year – to replicate some of the experiments we have done” (4).

Although much of the public believes that scientists should constantly be replicating studies, replicating studies is much easier said than done. However, despite that immense time, effort and cost that goes into replication, it is important that it be done, regardless of whether they reveal a successful replication or unsuccessful replication, for both discoveries are useful. In the community of science, the general consensus seems to be that replication should be done more often; however, scientists are reluctant to be the ones to spend their time replicating .

In November 2011, The Reproducibility Project began, with the goal of “[obtaining] an initial estimate of the reproducibility of psychological science” (5). Primary data collection was completed in December 2014, and a summary of the results was published in August 2015 in the journal Science. The Reproducibility Project consisted of replications of 100 experimental and correlational studies published in three psychology journals, with over 270 researches contributing to the project (5). The project’s findings were that only 36% percent of the replicated experiments yielded the same results as the original. News that less than half of psychology studies fail to be replicated spread to multiple news outlets, and the public suddenly became exposed to the negative light shed on psychology studies. Articles aimed towards the general public had sensational and blunt headlines, such as “Scientists replicated 100 recent psychology experiments. More than half of them failed” (6) and “Only a THIRD of scientific studies can be replicated: Experts fail to repeat the findings of the majority of psychology papers” (7). These two articles, which collectively were shared on social media sites more than 9,000 times illustrate the small and limited glimpse that everyday individuals had into replication and reproduction. In order to get an idea of what the public’s views were, the comment section of each article was read, for they contained ideas and opinions expressed by many individuals from around the world.

In the article written for the Daily Mail website (which is visited by more than 100 million unique visitors each month (11)) elaborating on the Reproducibility Project’s findings, the vast majority of the comments to the article displayed a negative perception towards psychology and the inability for many studies to be replicated. One example of this was a user who commented “If it cannot be reproduced then it remains unproven and should remain unpublished. Then again, psychology is a made up subject and should not be labeled as a ‘science’”. While the first sentence presents a reasonable belief from the commenter, the second sentence shows how the individual has no confidence in the reliability in psychology, therefore rendering the field fictional, and unrelated to science. It was clear that this user was not the only user who felt this way, for besides get multiple “thumbs ups” by other users, additional comments elaborated on this commentator’s belief. One user wrote “Psychology was ‘invented’ to give people who choose to behave contrary to society’s standards an excuse for their behavior”; another wrote “This isn’t new. The few times I had to read studies from psych journals I simply shook my head in disbelief. The lack of rigor was incredible”, and one comment stated “Two thirds of psychology studies can’t be replicated. Not two thirds of science studies. Psychology is not a hard science”. The public has approached the topic or reproducibility with distaste towards the field of psychology, for the material exposed to them about reproducibility rarely deals with replicated studies being successful. What appears to be the public’s view on replication is that replication should be done on studies before they are published and regarded to be held as fact, coupled with the belief that if a study is replicated and the results are not consistent, then the original study has no validity.

Replication of studies is important because if the results of the experiment are not tested to see if they can be reproduced, then the public may be under the false impression that the study if correct and free of flaws. One example of this is a study done by Simone Schnall in 2008 that claimed “cleanliness [reduced] the severity of moral judgements” (8). In order to prove this belief, Schnall conducted two experiments. In the first, Schnall had 40 undergraduates unscramble sentences, with half the group being assigned words that related to cleanliness (such as pure or pristine) and the other half of the group being assigned neutral words. The second experiment consisted of 43 undergraduates watching a dirty and disgusting scene from a film (disgusting in the sense of feces and filth) and half of the group being asked to wash their hands after watching the scene, while the other half was not required to do so (9). All of the participants then had to “rate the moral wrongness of six hypothetical scenarios, such as falsifying one’s résumé and keeping money from a lost wallet” (9). What Schnall and her colleagues discovered was that by having people exposed to cleanliness, their moral judgement was impacted , in that they judged the moral wrongness of the six hypothetical scenarios less harshly than the other subjects. Essentially, “the implication was that people who feel relatively pure themselves are – without realizing it – less troubled by others impurities” (9). According to Slate, a current affairs and culture magazine, Schnall’s paper was “covered by ABC News, the Economist and the Huffington Post, among other outlets, and [had] been cited nearly 200 times in scientific literature”. It would be an understatement to say that Schnall’s study was read by many people. However, it was not until five years later that the experiment would be replicated, only for the leading scientist of the replication, Brent Donnellan, to claim that “Cleanliness primes do not influence moral judgement” (10). Donnellan’s failed replication of Schnall’s results and his blog post in response to Schnall’s study caused a lot of controversy in the scientific community. To some, it raised the issue that many studies go un-replicated, and hence false results are widely spread. Others sided with Schnall, believing that her original findings should not have been seen as fabricated simply because a group of researches were unable to replicate her results. Although much of the public and many scientists regard replication as very important in determining the validity of a study’s findings, this controversy also brought together many individuals who felt that replicating experiments often resulted in the attack of the researcher in the original experiment, which was cruel and unfair. What can be learned from this example is that while it is important to replicate experiments, it is also just as important to make sure that if possible, researchers from the original experiments are involved in the replication process not only to avoid mistakes, but to avoid the stigma of failed replications indicating incapable scientists.

Lack of replication and inability to replicate studies is not solely confined to the field of psychology, however. Other hard sciences such as biology, physics and chemistry all fall victim to failed replications . One example of this occurring is in 2011, when scientists at Amgen biotech “reported that they could confirm only six of 53 landmarks studies in cancer biology”, and in 2012 when researchers at Bayer discovered that “only 15 of 67 attempts to confirm claims in oncology, women’s health and cardiovascular disease succeeded” (12). Lisa Feldman Barrett for the New York Times also wrote an article, “Psychology is Not in Crisis”, where she brought up more examples illustrating the difficulty across all fields of sciences to replicate results, with the intention of not criticizing other sciences, but revealing that there were advantages to conducting replications even if they failed. Barrett sited one experiment where scientists believed to have “identified the gene responsible for curly wings” (13) on fruit flies. When the experiment failed to replicate and the gene the scientists thought to be responsible for curly wings in fact had no effect on the wings of fruit flies, it was not seen as a failure. Evolutionary biologist Richard Lewontin noted that “failures like this helped teach biologists that a single gene produces different characteristics and behaviors, depending on the context” (13). To further prove her point that failed replications do not demonstrate a waste of time and resources, Barrett sited an example in physics, writing, “similarly, when physicists discovered that subatomic particles didn’t obey Newton’s laws of motion, they didn’t cry out that Newton’s laws had ‘failed to replicate.’ Instead, they realized that Newton’s laws were valid only in certain contexts, rather than being universal, and thus the science of quantum mechanics was born” (13). Evidently, other sciences are not immune to failed replications and reproductions.

Although much time, effort and money is thought to be wasted on the replication of experiments, in reality all three sacrifices allow for new information to be learned, which is crucial in science, where mistakes and accidents are often essential to making new discoveries. The research article Estimating the Reproducibility of Psychological Science defends the number of psychology studies that failed to be replicated in their study by writing, “If initial ideas were always correct, then there would hardly be a reason to conduct research in the first place. […] Progress occurs when existing expectations are violated and a surprising result spurs a new investigation. Replication can increase certainty when findings are reproduced and promote innovation when they are not” (14). Many scientists, such as Donnellan have been led to believe that when an experiment does not yield the same results as the original, it is an “epic fail”. The general public seems to be on the same page, as shown through their comments on articles regarding the Reproducibility Project. However, apart from noting that failed replications can provide valuable information, it is also important to acknowledge that replication attempts can be unsuccessful because the researchers and scientists working on the replication made errors. There are countless variables to be accounted for when replicating an experiment, and to keep track and successfully replicate all aspects can be a very difficult and demanding process. The researchers of the Reproducibility Project came up with a few reasons that would account for their 36% successful replication rate. One reason was that although most of the replication teams contacted and worked with the original authors in order to ensure that the procedure, and materials were the same, small, undetected errors and differences may have gone unnoticed, impacting the results of the replication. However, there is one flaw in such a close collaboration – the results may be biased, “increasing the chances of a successful replication” (6). In addition, sometimes when experiments are replicated, they have to be altered slightly in order to make more sense to the participants, therefore cultural context or using different stimuli may affect the results (6). Of course, there is always the possibility that the original results were false, however, that should not be the first assumption one makes. Johanna Cohoon, a project coordinator with the Charlottesville-based Center for Open Science, said “The [Reproducibility Project’s] findings demonstrate that reproducing original results may be more difficult than is presently assumed, and interventions may be needed to improve reproducibility” (5).

Multiple steps need to be taken in order to ensure that not only are studies replicated more often, but that when they are unsuccessful, the original author’s work is not immediately scorned. One major fault is that “some of the most prestigious journals have maintained explicit policies against replication efforts” (9), which discourages scientists from replicating. Furthermore, although the purpose of an unsuccessful replication would be to reveal that perhaps the original experiment was faulty, some journals such as Science publish “‘technical comments’ on its own articles, but only if they are submitted within three months of the original publication, which leaves little time to conduct and document a replication attempt” (9). Furthermore, when the replication of a study fails to yield the same results, this can cause conflict and tensions between the scientists of the original study and the scientists in charge of the replicated study, as seen in the case of Schnall and Donnellan. It makes sense that scientists would want to avoid the drama, and potential failure of a replicated experiment; however, there are many benefits and advantages that emerge as a result of replications, which should encourage scientists to not shy away from reproduction and replication. In order to make replication easier and more accessible, scientists should become more accustomed to including all details of an experiment , even if it seems unnecessary, in order to ensure as much accuracy as possible. Professor at Stanford University, John Ioannidis admitted that replication science could be made easier if there was more transparency and better data sharing.

In the future, the public should approach the field of psychology with more confidence (for psychology has a larger image problem than the other sciences), for what they do not realize, or do not lend their attention to, is why it is particularly difficult for studies to be reproduced and replicated. There are many factors involved which lead to a replication of a study failing to replicate the same results as the original, and this isn’t necessarily because the results in the original experiment were botched or fabricated. When the public learns of an experiment or study that failed to replicate, instead of jumping to the conclusion that the original study made false assertions, the public should consider other factors that would explain an unsuccessful replication. The public should be more open to the fact that regardless of whether a replication is successful or not, valuable clues will emerge as a result, which is precisely what science strives to do.

References

Drummond, C. (n.d.). Replicability is not Reproducibility: Nor is it Good Science. Replicability Is Not Reproducibility: Nor Is It Good Science.

Casadevall, A., & Fang, F. C. (n.d.). Reproducible Science. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2981311/#r9

Chong, L., Vignieri, S., Chin, G., & Jansy, B. (n.d.). Again, and Again, and Again …. Retrieved from http://www.sciencemag.org/content/334/6060/1225

Bissell, M. (2013, November 20). Reproducibility: The risks of the replication drive. Retrieved from http://www.nature.com/news/reproducibility-the-risks-of-the-replication-drive-1.14184

Estimating the reproducibility of psychological science . (2015, August 28). Retrieved from http://www.sciencemag.org/content/349/6251/aac4716

B. (2015, August 27). Scientists replicated 100 recent psychology experiments. More than half of them failed. Retrieved from http://www.vox.com/2015/8/27/9216383/irreproducibility- research

MailOnline, S. G. (2015, August 28). Only a THIRD of scientific studies can be replicated: Experts fail to repeat the findings of the majority of psychology papers. Retrieved from http://www.dailymail.co.uk/sciencetech/article-3214037/Only-scientific-studies- replicated-Experts-fail-repeat-findings-majority-psychology-papers.html

Schnall, S., Brenton, J., & Harvey, S. (n.d.). With a Clean Conscience (Rep.). Retrieved https://www.repository.cam.ac.uk/bitstream/handle/1810/239314/Schnall, Benton & Harvey (2008).pdf?sequence=1

Michelle N. Meyer and Christopher Chabris. (n.d.). Psychologists’ Food Fight Over Replication of “Important Findings” . Retrieved from http://www.slate.com/articles/health_and_science/science/2014/07/replication_controvers y_in_psychology_bullying_file_drawer_effect_blog_posts.html

Donnellan, B. (2013, December 11). Go Big or Go Home – A Recent Replication Attempt. Retrieved from https://traitstate.wordpress.com/2013/12/11/go-big-or-go-home-a-recent- replication-attempt/

Correction: Daily Mail website. (2013, January 05). Retrieved from http://www.economist.com/news/business/21569066-correction-daily-mail-website

Begley, S. (2014, January 27). U.S. science officials take aim at shoddy studies. Retrieved from http://uk.reuters.com/article/2014/01/27/science-reproducibility- idUKL2N0KX18S20140127

Barrett, L. F. (2015, August 31). Psychology Is Not in Crisis. Retrieved from http://www.nytimes.com/2015/09/01/opinion/psychology-is-not-in-crisis.html?_r=0

Estimating the reproducibility of psychological science. (2015, August 28). Retrieved from http://www.sciencemag.org/content/349/6251/aac4716.full

7 Replication/Reproducibility

Part 1 - Experimentation

Part 2 - Statistics

Part 3 - Results

Part 4 - Publishing

Part 5 - Broader Issues