Mind-reading salmon, the law of truly large numbers, and two recent Daubert rulings
Have you ever gotten an email saying that a large amount of money will be transferred to you if you first make some initial deposit? And if you immediately recognized that message as a scam, did you also wonder why people are still sending these emails?
With today’s technology, the cost of compiling even a large mailing list is small. The cost of sending out messages is even smaller. While the probability that 90% of the recipients will send money to a stranger is almost nil and the senders probably know that, they also know that the more people get the email, the better the odds that someone will be tricked. That potential payoff likely explains why such mailings continue.
In his book The Improbability Principle, David Hand, Emeritus Professor of Mathematics and a Senior Research Investigator at Imperial College London, calls this phenomenon “the Law of Truly Large Numbers.” He defines the principle succinctly: “With a large enough number of opportunities, any outrageous thing is likely to happen.”
The principle is at work more generally. In a 2011 Scientific American article, “The Mind-Reading Salmon,” Professor Charles Seife of Columbia University illustrated this point with drug efficacy testing. This testing compares the effectiveness of a drug with that of a placebo. Professor Seife pointed out that with a sufficiently large number of comparisons, it is almost guaranteed that at least one comparison will appear to show that a drug is “effective,” when, in fact, it is not. In the same article, Professor Seife also recounted a study that a team of neuroscientists once conducted on a salmon:
When they presented the fish with pictures of people expressing emotions, regions of the salmon’s brain lit up. …; however, as the researchers argued, there are so many possible patterns that a statistically significant result was virtually guaranteed, so the result was totally worthless. …, there was no way that the fish could have reacted to human emotions. The salmon in the fMRI happened to be dead.
If proper care is not taken in such circumstances, statistical analysis runs the risk of what is known as “data snooping.” Other colorful names for this concept include “data dredging” and “data torturing.”  The term “data mining” is sometimes also used, although it means something different in other fields.
This topic has received attention in two recent Daubert rulings. In In re Processes Egg Products Antitrust Litigation, the plaintiffs’ expert used a regression model to relate prices to other factors. The defendants’ expert applied the same model to “just one certain defendant’s transactions” and argued that the model was unreliable because some aspects of the regression results had changed. The plaintiffs countered that the defendants’ results were “the product of inappropriate ‘data mining.’” District court Judge Gene E. K. Pratter (E.D. Pa.) found the plaintiffs’ argument to be intuitive and convincing for the purpose of assessing reliability under Daubert standards.
In In re Karlo v. Pittsburgh Glass Works, an age discrimination case, Judge Terrence F. McVerry (W.D. Pa.) found one expert’s analysis of impact to be “improper” because it did not correct for “the likelihood of a false indication of statistical significance.” He added that it was “data-snooping, plain and simple.”
With only public information, it is impossible to comment on the merits of the arguments in these cases. But it is encouraging that important yet sometimes subtle statistical issues are being discussed in the courts. In particular, given the precedents established in the two cases cited here, I will not be surprised if this topic is discussed more often and in other types of litigation.
Finally, I should mention that there is also a fun side to the Law of Truly Large Numbers. Just think of the Twin Strangers project, which tries to bring together total strangers who look like identical twins.
[Disclaimer: Bates White was not and is not, at the time of writing, involved in either case cited here.]
 Technically, the false inference of “effectiveness” mentioned here arises when it is based on some unadjusted statistical procedure. Certain adjustment or corrections to account for the phenomenon discussed here are well-known.
 While these terms may have a negative connotation, the concept itself is just a statistical phenomenon.