QUEST Blog Beitrag: Planes that Don’t Land, ESP and Bad Science (in Englisch) - Aktuelles

Beitrag von Robert Nadon, Associate Professor at McGill University and Genome Quebec Innovation Centre

In 1987, I was a psychology postdoc hoping for a university tenure-track position in a tough job market. Publishing in prestigious journals was then, as it is now, the way to advance in academia. So I was really happy to see published that same year a book chapter by Daryl Bem, a world-famous social psychologist, offering advice to junior psychology researchers on how to do science and how to write a scientific journal article. What could be better for an ambitious junior academic?

I was in for a big disappointment.

Here's some of what Bem wrote:

There are two possible articles you can write: (1) the article you planned to write when you designed your study or (2) the article that makes the most sense now that you have seen the results. They are rarely the same, and the correct answer is (2). … Examine [your data] from every angle… If a datum suggests a new hypothesis, try to find further evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are subjects [participants] you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something—anything —interesting… Let us err on the side of discovery (Bem, 1987).

So why was I disappointed? What could be wrong with Bem’s vision of forward-looking, discovery-based science? On its face, nothing at all. But Bem was making a fundamental error.

Bem was conflating two key activities in science – exploration and confirmation. Early-stage science is all about exploring data creatively and trying to find unexpected links that suggest a theory or unexpected phenomena – the kind of activity that Bem was writing about. Through this creative process, important patterns can emerge that reveal something about nature that you didn’t know before. More often than not, however, these patterns are just random noise without any substantive meaning, even though they may appear to be meaningful.

This is especially true in sciences which, like psychology, study highly variable phenomena (measured with highly variable instruments) supported by weak theoretical frameworks. Everyone, scientists included, is easily fooled by random patterns that seem to make sense. We are all very good at retrospectively coming up with theories that seem to fit the data.

Patterns have also seemed to make sense in, for example, ancient Greek theological myths, astrology, mind reading, and phrenology. No one today would mistake them for scientific concepts either because they could not be tested or were rigorously tested and found wanting. The same standard applies to patterns that are observed in exploratory scientific studies, even if the results look plausible. Astrological explanations of the Black Plague offered by medieval experts, after all, seemed plausible at the time.

This is where the confirmatory part of science comes in. Ideas that are generated in early stage exploratory science are put to the test with new data with rigorous procedures to eliminate randomness or other alternative explanations. This rigorous testing minimizes the chances that scientists are fooling themselves. But this 2-stage process is not what Bem had in mind.

In the same chapter, Bem wrote that researchers could skip the second confirmatory part of science if their data were “strong enough to justify recentering [their] article around the new findings and subordinating or even ignoring [their] original hypotheses.” The problem is, if you look hard enough, data almost always seem “strong enough”, especially if you gather data on many variables with many ways to slice-and-dice the data, as is often done in psychology research. Or, as the maxim goes, “If you torture your data long enough, they will confess to anything.”

Bem was giving permission to write up scientific articles based on the exploration part of science (when the framework for coming to tentative conclusions is relatively loose) while pretending that they were based on the confirmatory part of science (when the framework for coming to firmer conclusions is relatively demanding). Everyone knew that admitting to doing this would almost certainly get your manuscript rejected for publication. So it was best to pretend that your exploratory research was actually confirmatory.

But wishing that some idea were true after trolling the data and finding some makeshift pattern does not make it any more true than the notion that coincidental celestial events caused the Black Plague. This kind of (at best) wishful-thinking leads almost inevitably to false discoveries. If everyone followed Bem's advice, the field would be inundated with nonsense and it would be impossible to distinguish the rare drops of true discoveries from floods of false ones.

What Bem was advising was very poor science and, it seemed to me, fundamentally dishonest. As I would discover, though, Bem was being surprisingly honest about how he went about his research. He was openly pushing back against a half century of methodological and statistical knowledge produced by some of the best scientific minds.

So why was Bem advocating doing science this way? A cynic might say that this was the surest way to get published. Bem was highly published and highly cited by other researchers. As will become clear, though, he was also a true believer.

Fortunately, I soon discovered while still a postdoc that all was not lost. I began to read Richard Feynman’s writings on the practice of science.

Feynman, a physicist and Nobel laureate, gave a now much-quoted commencement address to the California Institute of Technology. In it, he zeroed in on why he thought that many studies in psychology and educational research – studies following Bem’s approach – were not scientific at all. Here's part of what he said:

I think the educational and psychological studies I mentioned are examples of what I would like to call cargo cult science. In the South Seas there is a Cargo Cult of people. During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he’s the controller—and they wait for the airplanes to land. They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes land. So I call these things Cargo Cult Science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.

So what was it about the psychology and education studies that Feynman thought made them examples of Cargo Cult Science? Was it because those fields often dealt with subjective topics unlike, say, in biomedical research? No, that wasn't it. According to Feynman, the studies were examples of Cargo Cult Science because they were missing

… a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty – a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid – not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked – to make sure the other fellow can tell they have been eliminated. Details that could throw doubt on your interpretation must be given, if you know them.

Feynman insisted that “The first principle [of doing science] is that you must not fool yourself” and warned that “you [the scientist] are the easiest person to fool.”

Daryl Bem fooled himself big time many years after writing his advice to junior psychology researchers. Toward the end of his prolific and influential career, Bem published nine experiments on extra sensory perception (ESP) (Bem, 2011). Those experiments, he claimed, showed that future events could influence people’s behavior in the present. As silly as the results were, they could not be readily dismissed. They were published in a prestigious scientific journal. The article followed broadly-accepted research practices in psychology. But if accepted research practice led Bem to obviously wrong conclusions, then something was obviously very wrong with psychology research.

To his credit, Bem adopted a distinctly Open Science posture. He made available all of the ESP experimental data, software, and procedures and he actively encouraged replication attempts. Not surprisingly, replication attempts were not successful. Still, the paper should not have been published as a scientific paper — not because of its conclusions but because of the way the conclusions were arrived at. It is one thing to put ideas out there but it is quite another to pretend that they are based on solid scientific practice. At a minimum, a lot of effort and resources could have been put to better use had Bem followed Feynman’s simple recipe.

Bem, now 80, opened up to Daniel Engber in a recent interview published in Slate magazine under the title Daryl Bem Proved ESP Is Real: Which means science is broken. Bem is quoted as saying, “I’m all for rigor, but I prefer other people do it. I see its importance—it’s fun for some people—but I don’t have the patience for it. …If you looked at all my past experiments, they were always rhetorical devices. I gathered data to show how my point would be made. I used data as a point of persuasion, and I never really worried about, ‘Will this replicate or will this not?’”.

This damning admission made by Bem in 2018 – science as mere rhetoric – echoes the Cargo Cult advice he gave to junior scientists three decades earlier. Once again, he is simply saying out loud what many psychological researchers are thinking and doing. Plus ça change, plus c'est pareil. *

The ESP results were easily dismissed because they were not plausible. But what of studies with more plausible results that are produced by the same flawed logic and procedures? One major problem among many is that the scientific articles of those who don’t have the patience for rigor look exactly like the articles of those who do have the patience (for rigor and for what Feynman called “scientific integrity”). Both types of articles (rhetorical and scientific) get published in the same high prestige journals because both types look scientific – but of course, only one actually is.

Using the scientific method the way it is intended, with “utter honesty”, takes time – time that researchers in highly competitive environments may not think they have. It also frequently leads to disappointing results that are generally not publishable (at least not in prestigious journals). The Cargo Cult approach, by contrast, is faster and produces more high-profile publishable results.

Perversely, there is little or no consequence paid by low scientific integrity researchers for being predictably and frequently wrong. On the contrary, they receive disproportionally more research funds to continue their Cargo Cult ways because getting research funds depends much more on getting published than on getting it right.

Young scientists I talk with are enthusiastic about a vision for science that Feynman would have approved of. They are concerned, however, that having high scientific integrity will harm their careers. A way must be found that rewards scientific integrity over raw output of dubious science.

* The more things change, the more they stay the same.

Bem, D. J. (1987). Writing the Empirical Journal Article In M. P. Zanna & J. M. Darley (Eds.), The compleat academic: A practical guide for the beginning social scientist (pp. 171-201). Random House.

Bem, D. J. (2011). Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect. Journal of Personality and Social Psychology, 100, 407-425.