25 April 2018

Bad Science, Biases, and Big Data: The Shifting Landscape of Responsible Conduct of Research

Challenges associated with the responsible conduct of research (RCR) tend to change alongside a shifting landscape driven by technological advances. In the late 1990s Andrew Wakefield reported in The Lancet that a causative link exists between the use of the measles, mumps, and rubella (MMR) vaccine and the appearance of autism. By the time it was discovered that he had financial conflicts of interest, subjected children to unnecessary and invasive procedures without approval from an institutional review board, and fabricated and selectively disregarded data that rejected his hypothesis, the damage had already been done: parents were electing not to vaccinate their children against deadly viral infections, and a powerful anti-vaccination movement was born.

Many of these groups found new audiences on internet forums and message boards to spread their misinformation. Then later their outreach grew dramatically through the adoption of social media platforms and celebrity endorsements. During this time, the incidence rate of vaccinable childhood diseases was ascending. If one considers a counterfactual scenario where Wakefield published his results decades earlier – before the advent of the internet – the consequences of such an action may have been less stringent in comparison. The misconduct of one unethical researcher conducting poorly done medical research can therefore be linked to healthcare challenges eased by the increasing spread of misinformation. These challenges are likely to rise in frequency and severity in the coming decades given the increasing pervasiveness of global communication. But above all, this case study is indicative of a much larger problem.

Many scientific studies are not reproducible. Two main factors cause this phenomenon. First, the “file-drawer effect” occurs when there is a systemic publication bias toward reporting positive results and disregarding negative ones. When many independent studies test whether a relationship exists between two variables, some will undoubtedly show significance even when no relationship in fact exists. These positive results are driven by chance. In these instances, it is not the positive results that are interesting but their unpublished negative counterparts. Researchers who then repeat the study expecting a positive result will not be able to replicate it. Second, when a study initially shows no effect between two variables, researchers sometimes add more replication in the hope that a significant effect will be found. This practice often occurs under the misguided notion that by adding more observations one is getting closer to the truth. Instead, the addition of more observations increases the likelihood of getting a significant effect by chance alone. It is important to note that these factors speak to implicit biases and therefore should be separate from instances of scientific misconduct. To err is human, and non-reproducible results do not necessarily point to duplicity on the part of a research team.

Nonetheless, these factors may have unintended consequences for the advancement of science. Research groups may decide not to report study replication failures or deviations when they believe a priori that the earlier findings must represent the true state of a given phenomenon. For example, in the early 20th century physicist Robert Millikan measured the charge of the electron to a certain level of precision using an oil drop experiment. The experimental setup was elegant, but he did not correctly factor into his calculations the viscosity of air. He therefore reported an incorrect electrical charge. In the following years after Millikan’s mistake, researchers slowly landed on the true value after conducting their own experiments.

Richard Feynman proposed an argument for why it took researchers years to correct Millikan’s original estimate: when they obtained a result that deviated too much from Millikan’s, they assumed their data must in error – and they discarded the results in turn. Confirmation bias had stymied the pace of scientific progress, and it seems that sometimes there are tradeoffs when standing on the shoulders of giants. Feynman cautioned against our natural tendency to fool ourselves and urged us to stray from the precepts of “Cargo Cult Science”:

"Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can – if you know anything at all wrong, or possibly wrong – to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. In summary, the idea is to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgement in one particular direction or another."

We have the potential to improve scientific reproducibility and integrity using technology in the way Feynman had envisioned. Because integrity can unintentionally be compromised at many stages within the scientific process through failure to control for biases during hypothesis generation, improperly designed experiments including randomization and blocking, quality control while carrying out an experiment, and the analysis of data using improper statistical methods, study pre-registration is becoming increasingly popular in clinical medicine and in other fields. Pre-registration occurs when a research team details all aspects of the study design, potential outcomes, and statistical analyses before conducting the experiment proper. This framework, and any data derived from the study, are made public on the internet for others to examine and critique.

The benefits of adopting a pre-registration philosophy are multifaceted. First, it guards against the file-drawer effect. If all data are made publicly available, then negative results become discoverable by other researchers. Second, it protects against the consequences of researcher biases by requiring that analytical and statistical methodology be developed before observation of the data, so that decisions remain data-independent. The adoption of an open science philosophy of scientific honesty and transparency will do well to curb the negative effects of research misconduct and implicit researcher bias. However, other aspects of RCR including data management and analytics are becoming increasingly more important and challenging as well.

Data are becoming more voluminous. The ability to generate massive amounts of data can easily outstrip the ability to store and share them. For example, data storage and sharing are major problems for Large Hadron Collider (LHC) experiments because they generate enormous amounts of data in a short amount of time. Accordingly, the annual data rate exceeds 30 petabytes after extensive filtering. The sheer volume of data would have made these studies impossible to conduct before technology caught up with our scientific ambitions: thirty years ago it would have taken a stack of 21 billion floppy disks – 43 thousand miles high – to store one year’s worth of data; now it takes 11,000 servers with 100,000 processor cores at CERN and a worldwide computing grid. Similarly, it initially took researchers many years to process and map the human genome. Now researchers can process this data quickly at a fraction of the cost. Ambition alone cannot drive some scientific discoveries; oftentimes technological infrastructure must be in place beforehand. That said, the ability to generate and store large volumes of data is only one challenge in the big data epoch.

The allure of big data has led some to eschew the scientific method altogether. Former editor-in-chief of Wired Magazine Chris Anderson said in his provocative essay The End of Theory: The Data Deluge Makes the Scientific Method Obsolete:

"Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

Similarly, Google’s research director Peter Norvig amended the commonly-held aphorism “All models are wrong, but some are useful” by noting that we can often succeed without them. Anderson and Norvig believe that by adhering to the scientific method, one will often reach mistaken conclusions because the method itself is arbitrary: it relies on a logical framework that leaves room for subjectivity and bias. We should instead replace the scientific method with big data analytics because “with enough data, numbers speak for themselves”. Powerful computational algorithms can explore large data sets and find regularities independent of any subjectivity or bias on the part of the researcher. The effectiveness of this approach increases as the data gets larger. In my opinion, this Wild West approach to shooting from the hip is dangerous, and all that is missing is the tumbleweed.

Spurious correlations often scale with the volume of data. This notion is similar to the one I presented earlier in this essay: the a posteriori addition of more experimental replicates increases the likelihood of finding a significant effect caused by chance alone. However, the difference with big data experiments is that the sheer quantity of information generated is an effect of the process itself, not the intent of the researcher to find positive results per se. Calude and Longo proved mathematically that the number of arbitrary correlations only depend upon the size of the data set, and they can even appear from data that are randomly generated. In fact, their results imply a general trend of diminishing returns – the more information one has, the more difficult it is to glean meaning from it. Replacing the scientific method with big data analytics is therefore not advisable, and researchers must take great care to remember the maxim “correlation does not imply causation.”

Responsible conduct of research is like an intricate dance. It encompasses much more than abstaining from duplicity. To conduct science with utmost integrity we must understand and limit our biases during all stages of the scientific process: from hypothesis generation, to publication of results, to the analysis of big data. We can limit biases by adopting an open science policy of transparency, and learning proper experimental design and statistical methods to address our research questions. But most importantly we ought to adopt a philosophy of utmost humility – as my good friend and colleague Zachary Blount says, “Hypotheses are not children - they have no inherent worth or dignity, nor any right to exist. They have to justify themselves. Test the daylights out of them.”