US study offers a different explanation why only 36% of psychology studies replicate
In light of an estimated replication rate of only 36% out of 100 replication attempts conducted by the Open Science Collaboration in 2015 (OSC2015), many believe that experimental psychology suffers from a severe replicability problem.
In their own study, recently published in the open-access peer-reviewed scientific journal Social Psychological Bulletin, Drs Brent M. Wilson and John T. Wixted at the University of California San Diego (USA) suggest that what has since been referred to as a “replication crisis” might not be as bad as it seems.
“No one asks a critical question,” the scientists argue, “if all were well with psychological science, what replication rate should have been observed? Intuition suggests that it should have been ~90-95%, but a figure in this range is wildly off the mark. If so, then the perception of a replication crisis rests largely on an implicit comparison between the observed replication rate of 36% vs. a never-specified expected replication rate that is entirely unrealistic.”
In their recent paper, the scientists note that many replication failures might be due to the replication studies not having sufficient power to detect the true effects associated with the original experimental protocols. The replication studies were very well-powered to detect the originally reported effects, but those effects were inflated, as statistically significant effects must be. How much power did the replication studies have to detect the true (i.e., non-inflated) effects associated with the original studies? That is a key question, and intuition alone cannot provide the answer. The team therefore concludes that it is crucial to use a formal model, rather than relying on the current purely intuitive approach.
“Estimating the expected replication rate requires a consideration of statistical power, which is the probability that an experiment (e.g. a replication experiment) will again detect a true effect at p < .05. Obviously, a single replication experiment with low power can easily fail even if the original experiment reported a true effect,”
explain the scientists.
Similarly, 100 replication experiments with low power will yield a low replication rate even if the original experiments all reported true positives.
At one extreme, with low enough power, the observed 36% replication rate in OSC2015 could mean that 64% of the replication experiments failed to detect the true positives reported in the original studies (in which case the original-science literature would be in good shape). Alternatively, if the replication experiments had high enough power, then the observed 36% replication rate would mean that 64% of the replication experiments reported false positives (in which case the original-science literature would be in bad shape).
“With few exceptions, scientists have enthusiastically embraced the latter interpretation, thereby implicitly assuming that the OSC2015 replication experiments had high power. However, this assumption must be supported by a formal model because intuition is simply not up to the task,”
say Wilson and Wixted.
According to one simple formal model, the OSC2015 replication experiments had low power, in which case the 36% replication rate would not be particularly informative, the researchers conclude.
Although the original-science literature may be in better shape than intuition suggests, Wilson and Wixted nevertheless argue that there is a serious replication problem that needs to be addressed.
“The replication problem may not lie so much with everyday psychological science but may instead lie primarily with a small percentage of sensational findings,” say Wilson and Wixted. “Sensational findings are likely to be false positives because they are based on theories or ideas that have low prior odds of being true.”
In conclusion, the authors of the present paper argue that less focus should be placed on everyday research, which may be in better shape than intuition suggests, and more focus should be placed on conducting independent, large-N, pre-registered replications of unlikely findings that differentially attract attention. Such findings are not ready for non-scientists to consider until they have been independently replicated.
Original source:
Wilson, B. M., & Wixted, J. T. (2023). On the Importance of Modeling the Invisible World of Underlying Effect Sizes. Social Psychological Bulletin, 18, 1-16. https://doi.org/10.32872/spb.9981
JOURNAL
Social Psychological Bulletin
ARTICLE TITLE
On the Importance of Modeling the Invisible World of Underlying Effect Sizes