Nature’s call for poorer statistical standards – An exercise on deconstructing the industrial ideology of science

“Armiamoci e partite!” is a proverbial Italian phrase that can be roughly translated as “Let’s get armed, and then you go!”. It can be found in a poem by Olindo Guerrini:

«Tell us, why do you say “Let’s go!”
and then stay at home?
Why, far away from the blows and the conflicts,
comfortably suffer from putting on weight
while inciting the poor recruits
“Let’s get armed, and then you leave!”»

The “Prince of Laughter” Antonio de Curtis a.k.a. Totò added to this:

«I will follow you later.»

That’s basically Nature’s attitude in its postmodern call for a change in the use of statistics on its journal:

Nature 567, 305, 21 March 2019

Here completely sound and agreeable points are glued together into an overall irresponsible and interested position, so we have to do a bit of work to disentangle the single agreeable facts and the ideology that hides behind. The devil is in the details, and we will go after them.

Before digging into it, for all those interested in how scientists misuse statistics let me suggest the great book by Alex Reinhard, Statistics Done Wrong, which is also available online, on which I learnt most (but not all) of this subject matter.

“When was the last time you heard a seminar speaker claim there was ‘no difference’ between two groups because the difference was ‘statistically non-significant’? If your experience matches ours, there’s a good chance that this happened at the last talk you attended.”

Nice rhetorical strategy to establish a connection to the reader. However, the two experiences cannot possibly be more different: the perception of a scientist who is evaluating the internal mechanisms of the work of a peer from his or nearby communities, and that of an editor who has no specific understanding of the subject matter, and who attends conferences to capture a second-degree “overall message” and to impose his own “vision of the future” on scientists, are two completely different forms of communication. Overemphasis on derivative forms of knowledge may hinder true understanding, and the external pressure exerted by publishers on communities of scholar may bias and derail research.

“We hope that at least someone in the audience was perplexed if, as frequently happens, a plot or table showed that there actually was a difference.”

Yeah, so why do we do science after all, if things “actually” are one way or another by naked eye? What does “actually” mean to Nature’s editors? Isn’t it the whole purpose of science to find rigorous methods (among which, statistics) that allow to “see the invisible”, and at the same time to avoid seeing what is not there? Well seems like Nature editors have better criteria to propose, a new form of postmodern scientific method…

“How do statistics so often lead scientists to deny differences that those not educated in statistics can plainly see?”

Seriously? Are you kidding? Centuries after Galileo turned the telescope to the moon and the whole question of “what it means to see” kicked in, giving rise to epistemology as a discipline and shipping us into modernity (for the good and the bad) we are just now revealed by a journal that has the ambition to call itself “Nature” that all we need to do is to take a quick look at things and they will reveal themselves for what they are. In an incredible twist, it’s the prejudice of the old boring scientific method (that uses statistics as one of its tools) that leads us astray. We should return children and take away those glasses that make us blind, and see things for what they are, plain and clear!

Wow you really didn’t expect this from the leading scientific journal, did you?

The epistemological slovenliness of these few lines is disconcerting.

“For several generations, researchers have been warned that a statistically non-significant result does not ‘prove’ the null hypothesis (the hypothesis that there is no difference between groups or no effect of a treatment on some measured outcome). Nor do statistically significant results ‘prove’ some other hypothesis.”

Definitely so, let’s keep teaching this for several more generations.

“Such misconceptions have famously warped the literature with overstated claims and, less famously, led to claims of conflicts between studies where none exists.”

So here is Nature’s main point: let’s not deal with that huge and overwhelming amount of bad literature that generates from misusing statistics in an audacious and nonconservative way, giving rise to fake positives, truth inflation, and overstated claims (the kind of literature that has been proven to be selected for the worse on the top journals including Nature, which are known to have a huge problem with scientific reproducibility). No: let’s cherry-pick a few cases where over-conservative and shallow use of statistics has hindered “truth” (whatever that is…).

After all, being a publishing company that makes profit after selling to the academics the results of their own work with little to no editorial work (well, at least Nature re-draws the pictures – and they are damn nice!), it’s not surprising that Nature’s interest into the whole P-value discussion is to make it into an opportunity to turn things loose and deregulate scientific publishing a little bit further. It is this subtle logical twist of things that may have passed unnoticed to the 800 signatories of Nature’s appeal (see below), and that I want to make evident.

“We have some proposals to keep scientists from falling prey to these misconceptions.”

Good! So let’s see what these proposals are.

* * *

“Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P-value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not.”

No one of the old boring statisticians with glasses has ever taught anything like that.

(A little reminder: The p-value is the probability that the observed effect has been caused by a “null hypothesis”, which is the most reasonable neutral scenario. For example if we test whether a coin is biased it’s reasonable to assume as null hypothesis 50%-50%; if we study the effect of a drug we might compare it to placebo effect, or to the best drug already available on the market. Building the null hypothesis is not easy and it involves a lot of subjective trimmings that will hide in the statistical analysis. If the P-value goes below a conventional value (say, 5%), then the study is called “significant”. This 5% can be interpreted as the amount of false positives that are tolerated by whomever wishies to publish results (notice that deciding whether 5% makes sense is an editorial work: here Nature is re-positioning itself, which is perfectly legit – if only they could take some blame and explain why they have to refocus!). It would be best if this could be the maximal amount of false positives tolerated – that is, that good practices beyond reporting the P-value could keep the amount of false positives way below this threshold. Unfortunately, bad practices make 5% the minimal amount of false positives. Cautious estimates show that not publishing negative results and other systematic biases due to the pressure to publish may boost this value to an actual 50% of true negatives being sold as false positives. So, for example, is Nature willing to publish negative results to create more reliable science?)

But, if two studies on the same subject come out one significative and one not, and they only report the P-value, there’s not much you can do about it. The only message you can draw is that that scientific community should come up with a better common strategy to plan their trial or experiment and give more convincing results on that same subject matter. But there is no way one can re-use those data: it’s called “double-dipping” and it is a source of systematic error (the whole exploding literature of systematic reviews is affected by this problem – therefore including this very article of Nature, where a ludicrous “systematic analysis” consisting of only two articles is conducted…).

“These errors waste research efforts and misinform policy decisions.”

My opinion: what wastes research effort is the pressure to publish fast, more, and in higher-impact journals, in an environment of perpetual competition between groups that are not incentivized to collaborate to one common goal, share data, plan strategies, and produce “powerful” experiments, but rather are incentivized to atomize research into tiny fractions of under-powered experiments and trials that have insufficient sample sizes and that are doomed to generate insignificant results, because each individual group – and each individual in the group – has to publish his own thing to constantly update CVs, in a system of evaluation of careers where Nature and other publishers retain monopoly over such a delicate thing as “scientific reputation”.

Does Nature have anything to say about this?

“For example, consider a series of analyses of unintended effects of anti-inflammatory drugs. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was “not associated” with new-onset atrial fibrillation (the most common disturbance to heart rhythm) and that the results stood in contrast to those from an earlier study with a statistically significant outcome. Now, let’s look at the actual data. The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% (P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).”

Here Nature uses the strategy to go technical so to make it difficult to counter the arguments, and to distract from the systematic error they commit, which is cherry-picking their favorite case study of a “true positive” that has been made into “false negative”. I don’t think cherry-picking visible cases does good to the cause of better use of statistics: maybe we should be more interested into the overwhelming hidden cases of “true negatives” being made into “false positives” by pressure to publish.

I cannot really evaluate the case under exam and that’s not the point. But I am dazzled by P-value P = 0.0003 (which is remarkably low for an epidemiological study). Is it plausible given the sample size of 32 602 and the hypothesis being tested? I don’t know, the statistical analysis in the original paper is complex and intertwined with clinical considerations. The danger of too advanced statistics is that errors may hide here and there, for example in the hundreds of trimmings with the null hypothesis, and that one could just resort to the statistical tool he favours to turn things the way he wants. In any case the readership will trust his analysis, because rarely real statisticians check on the use of statistics in scientific publishing. And why should they? They are too busy creating even more advanced tools…

So, establishing a serious pipeline of statistical-peer-review would definitely be up to the editor if he cares about his market-share. Here is a constructive proposal to Nature (so that I won’t be pointed at as just a troublemaker): hire statisticians to create a third-party pipeline that systematically double-blindly reviews submitted papers’ consistency with claimed P-values (and design power). If we have to pay to buy our own research, let’s get some added value at least (apart from the nice figures)!

In absence of such serious chain of statistical checking, P-value is the simplest thing anybody can understand and evaluate (if statistical training is so poor that even that is not understood, how do you expect reporting confidence intervals – if even possible – might help?).

But of course the P-value is not all. Coming back to the cherry-picked controversy above, one way towards making a third-party judgement would be to have the second-simplest statistical tool. The power (see Chapter 2 in Statistics Done Wrong) is a tool for meta-analysis which is hardly ever reported. It is the probability that, assuming that the effect is true, the experiment can reach a given P-value. While it is a little tricky to calculate it, it’s not crazy difficult if one puts his mind into it.

This probability basically depends on the hypothesis to test and on the sample size. So, if for example the probability of reaching P = 0.0003 comes out 99%, you can pat the authors and say: this was a well-designed experiment, congratulations! If it’s 50%, you’ll say: damn you got lucky! And if it’s 1%, well you know there’s something fishy going on. Maybe they did 100 such experiments and they just reported on the one that came out significative?! Nobody knows, because Nature & Friends certainly did not publish the other 99 papers.

Thus knowledge of the power would probably solve the controversy over these two papers, as (assuming everything was done properly) the power calculation is an indicator of the quality of the experiment. If you don’t know the power or other similar measures of quality, you cannot make any such claim. So, going back two lines, notice that Nature makes an assessment on the quality of the study based on the P-value, and not on any structural property that qualitfies and quantifies the design study:

“That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).”

You cannot claim that the study was more precise because it was more significant! That could just be due to luck, and the whole point of science is that we want to gauge-out luck! Unless we return childish and “take away the glasses”…

But of course Nature columnists cannot be required to have proper statistical training. If hiring real statisticians were “impractical” to Nature, a first good strategy on their side would be request as mandatory that the power of the study be calculated and denounced along with the P-value. This would also force people to think a bit more consciously about what a P-value is at all, and at the same time would create a higher simple universal standard, inducing people to better design their experiment, maybe collaborate to reach reasonable sample sizes, and definitevely dropping the number of publishable papers by a considerable amount etc.

But this is not among Nature’s proposals. They go for something more postmodern.

“It is ludicrous to conclude that the statistically non-significant results showed “no association”, when the interval estimate included serious risk increases; it is equally absurd to claim these results were in contrast with the earlier results showing an identical observed effect. Yet these common practices show how reliance on thresholds of statistical significance can mislead us (see ‘Beware false conclusions’). These and similar errors are widespread. Surveys of hundreds of articles have found that statistically non-significant results are interpreted as indicating ‘no difference’ or ‘no effect’ in around half (see ‘Wrong interpretations’ and Supplementary Information).”

This is definitely a mistake: finding no significance does not imply that the hypothesis being tested is not true.

But is this the real problem after all? We have so many hypothesis to test: modern sampling tools allow entire fields to basically generate random hypothesis by the millions (genetic correlations? neural patterns? metabolic pathways? correlations between dietary patterns? you name it…). If we don’t impose more rigid standards and provide tools to thin out what is plausible, and to disqualify useless hypothesis somehow, if we keep everything always open, even those results that have been deemed insignificant by very generous standards, how are we to ever make progress?

Even high-energy physics has had this problem back in the ’70s: they were generating way too many hypothesis of new particles and interactions compared to the 3-sigma confidence interval (one order of magnitude higher than the P = 0.0003 reported above…), and the new machines were observing lots of them just because they were so many. What they did was not to abolish the P-value and the notion of statistical significance altogether, but to make it much stricter, establishing the present 5-sigma standard.

“In 2016, the American Statistical Association released a statement in The American Statistician warning against the misuse of statistical significance and P values. The issue also included many commentaries on the subject. This month, a special issue in the same journal attempts to push these reforms further. It presents more than 40 papers on ‘Statistical inference in the 21st century: a world beyond P < 0.05’. The editors introduce the collection with the caution “don’t say ‘statistically significant’”.

Yep, they write “don’t say ‘statistically significant’”. They don’t write “don’t say ‘statistically insignificant’”. Turning this sentence the other way around is an obvious example of the dishonest twist Nature gives to this whole issue.

“Another article with dozens of signatories also calls on authors and journal editors to disavow those terms.”

The first postmodern proposal of Nature is to rename things. This is often the case in our society, where complex social problems are dealt with by creating a linguistic taboo and changing words, and not by establishing better practices. “To change everything in order for nothing to change”, says another famous Italian sentence from Il Gattopardo.

“We agree, and call for the entire concept of statistical significance to be abandoned. We are far from alone. When we invited others to read a draft of this comment and sign their names if they concurred with our message, 250 did so within the first 24 hours. A week later, we had more than 800 signatories — all checked for an academic affiliation or other indication of present or past work in a field that depends on statistical modelling (see the list and final count of signatories in the Supplementary Information). These include statisticians, clinical and medical researchers, biologists and psychologists from more than 50 countries and across all continents except Antarctica. One advocate called it a “surgical strike against thoughtless testing of statistical significance” and “an opportunity to register your voice in favor of better scientific practices”.”

Of course bad use of statistics is commonly agreed upon to be a huge problem, and impulsively I would also sign a petition to make things better (and to end war, poverty, carbon emissions etc.). But I wonder all signatories took time to reflect on the particular twist Nature gave to this issue.

“We are not calling for a ban on P-values. Nor are we saying they cannot be used as a decision criterion in certain specialized applications (such as determining whether a manufacturing process meets some quality-control standard). And we are also not advocating for an anything-goes situation, in which weak evidence suddenly becomes credible. Rather, and in line with many others over the decades, we are calling for a stop to the use of P values in the conventional, dichotomous way — to decide whether a result refutes or supports a scientific hypothesis.”

“We agree, and call for the entire concept of statistical significance to be abandoned”;  “We are not calling for a ban on P values”. Wow that’s almost symptoms of bipolar disorder! Jokes aside, what exactly is Nature proposing? What is Nature going to do? Do something!

* * *

“The trouble is human and cognitive more than it is statistical: bucketing results into ‘statistically significant’ and ‘statistically non-significant’ makes people think that the items assigned in that way are categorically different6–8. The same problems are likely to arise under any proposed statistical alternative that involves dichotomization, whether frequentist, Bayesian or otherwise.”

Yep humans tend too much to categorize expecially when all social pressures impose them to categorize, by quantifying the impact factor of the Journal they publish to, optimize the H-index, etc. Proposal to Nature: quit you own obsession at maximizing the impact factor: return child! Be the first to give good example.

“Unfortunately, the false belief that crossing the threshold of statistical significance is enough to show that a result is ‘real’ has led scientists and journal editors to privilege such results, thereby distorting the literature.”

Indeed. So, again, what is Nature going to do about it?

“Statistically significant estimates are biased upwards in magnitude and potentially to a large degree, whereas statistically non-significant estimates are biased downwards in magnitude. Consequently, any discussion that focuses on estimates chosen for their significance will be biased.”

This is a very subtle example of the logical mistake of mixing causation and correlation. Here Nature suggests that the bias is intrinsic to the tool (the P-value), which somehow on its own pushes for more false positives and for less true negatives. But the tool is obviously neutral (I’ve never seen P-values threatening Ph-D’s). So its misuse is due to social pressures that may bias the very way we conduct scientific discussion. Using the same logic I could just rewrite the last sentence as: “Consequently, any discussion that focuses on estimates chosen for their significance papers published in Nature will be biased”. But while I do believe that Nature is part of the problem, it certainly is not all of the problem.

“On top of this, the rigid focus on statistical significance encourages researchers to choose data and methods that yield statistical significance for some desired (or simply publishable) result, or that yield statistical non-significance for an undesired result, such as potential side effects of drugs — thereby invalidating conclusions.”

Indeed: overanalysis, and choosing the statistical test at will would create ever worse troubles. That’s why P-value was agreed upon as a universal tool and simple enough for all. What higher standards of acceptance is Nature going to impose?

“The pre-registration of studies and a commitment to publish all results of all analyses can do much to mitigate these issues. However, even results from pre-registered studies can be biased by decisions invariably left open in the analysis plan9. This occurs even with the best of intentions.”

Pre-registration is a great tool, though not perfect. Where is Nature’s preregistration protocol and requirements? Here we find: “Authors who wish to publish their work with us have the option of a registered report.” It’s an option, not a requirement; and there is no real incemptive. It’s just a possible opportunity, like so-called ‘open-access’ and all those tools of empowerment that the industry is more than willing to have a share of.

“Again, we are not advocating a ban on P-values, confidence intervals or other statistical measures — only that we should not treat them categorically. This includes dichotomization as statistically significant or not, as well as categorization based on other statistical measures such as Bayes factors. One reason to avoid such ‘dichotomania’ is that all statistics, including P values and confidence intervals, naturally vary from study to study, and often do so to a surprising degree. In fact, random variation alone can easily lead to large disparities in P values, far beyond falling just to either side of the 0.05 threshold. For example, even if researchers could conduct two perfect replication studies of some genuine effect, each with 80% power (chance) of achieving P < 0.05, it would not be very surprising for one to obtain P < 0.01 and the other P > 0.30. Whether a P value is small or large, caution is warranted.”

Again some of this completely random pseudo-technical discourse that is all smoke in the eyse. Yet no proposal, apart from very vague “we should not treat them categorically”.

“We must learn to embrace uncertainty.”

How nice! But I would rather rephrase this as: “We will support uncertainty (= freedom) in all the preparatory phases of the experiment, and we will enforce higher certainty standards in the communication of results”. We should not allow publishers to interfere with the creation of scientific hypothesis!

“One practical way to do so is to rename confidence intervals as ‘compatibility intervals’ and interpret them in a way that avoids overconfidence.”

Again with this renaming business. I can hardly confuse “confidence” with “certainty”, and actually “compatibility” sounds to me as a much more dichotomous word than “confidence”, which was more humanly and humble. But maybe it’s just me.

“Specifically, we recommend that authors describe the practical implications of all values inside the interval, especially the observed effect (or point estimate) and the limits. In doing so, they should remember that all the values between the interval’s limits are reasonably compatible with the data, given the statistical assumptions used to compute the interval. Therefore, singling out one particular value (such as the null value) in the interval as ‘shown’ makes no sense.”

We are heading towards the grand finale. This is Nature’s new protocol: to encourage authors to write even more words so they can bullshit* their way through publication in a more postmodern way.

“We’re frankly sick of seeing such nonsensical ‘proofs of the null’ and claims of non-association in presentations, research articles, reviews and instructional materials.”

Hear the voice of the master! All of a sudden the tone of the article goes from friendly to patronizing. Sounds like: You are working for us, you are doing a bad job, and we are sick and tired of it: behave! Zero self-criticism.

“An interval that contains the null value will often also contain non-null values of high practical importance. That said, if you deem all of the values inside the interval to be practically unimportant, you might then be able to say something like ‘our results are most compatible with no important effect’.”

What is “high practical importance” if we canot discern it with the tools of the scientific method? Again, by naked eye? What does it mean to “deem”: isn’t the whole scientific process an attempt to get rid of personal subjective opinions?

Furthermore, notice that here Nature is asking you to not downplay values that are neither statistical significant nor relevant to your opinion – you just have to apply some cosmetics on words here and there. They really don’t want to renounce to any asset.

“When talking about compatibility intervals, bear in mind four things. First, just because the interval gives the values most compatible with the data, given the assumptions, it doesn’t mean values outside it are incompatible; they are just less compatible. In fact, values just outside the interval do not differ substantively from those just inside the interval. It is thus wrong to claim that an interval shows all possible values.”

On the same page as above.

“Second, not all values inside are equally compatible with the data, given the assumptions. The point estimate is the most compatible, and values near it are more compatible than those near the limits. This is why we urge authors to discuss the point estimate, even when they have a large P value or a wide interval, as well as discussing the limits of that interval.”

Again, bullshit* your way through and we’ll give you a pass.

“For example, the authors above could have written: ‘Like a previous study, our results suggest a 20% increase in risk of new-onset atrial fibrillation in patients given the anti-inflammatory drugs. Nonetheless, a risk difference ranging from a 3% decrease, a small negative association, to a 48% increase, a substantial positive association, is also reasonably compatible with our data, given our assumptions.’ Interpreting the point estimate, while acknowledging its uncertainty, will keep you from making false declarations of ‘no difference’, and from making overconfident claims.”

Bla bla bla. Again, rephrasing. How is this going to help make a point?

“Third, like the 0.05 threshold from which it came, the default 95% used to compute intervals is itself an arbitrary convention. It is based on the false idea that there is a 95% chance that the computed interval itself contains the true value, coupled with the vague feeling that this is a basis for a confident decision”. A different level can be justified, depending on the application. And, as in the anti-inflammatory-drugs example, interval estimates can perpetuate the problems of statistical significance when the dichotomization they impose is treated as a scientific standard.”

See how Nature is very good at riding the wave of statistical unrest: yes, 95% is not the probability that the interval contains the *true* idea (whatever that is). So what is it according to Nature, and how is Nature going to make this fundamental, defining property of P-value into an actualy policy? No clue given.

“Last, and most important of all, be humble:”


“compatibility assessments hinge on the correctness of the statistical assumptions used to compute the interval. In practice, these assumptions are at best subject to considerable uncertainty. Make these assumptions as clear as possible and test the ones you can, for example by plotting your data and by fitting alternative models, and then reporting all results. Whatever the statistics show, it is fine to suggest reasons for your results, but discuss a range of potential explanations, not just favoured ones. Inferences should be scientific, and that goes far beyond the merely statistical. Factors such as background evidence, study design, data quality and understanding of underlying mechanisms are often more important than statistical measures such as P values or intervals.”

Of course, in princple this is all right. But is it doable in practice if Nature and other journals do not raise their own standards? Abandoning P-value in favour of more refined statistics; emphasis on discoursive analysis etc. may have a terrible impact on that unregulated shithole that is the marketplace of scientific ideas. We don’t even have statisticians checking on the credibility of simple statistical tests, so how will we ever be better placed when even more discoursive blabla kicks in?

“The objection we hear most against retiring statistical significance is that it is needed to make yes-or-no decisions. But for the choices often required in regulatory, policy and business environments, decisions based on the costs, benefits and likelihoods of all potential consequences always beat those made based solely on statistical significance.”

Yet another level of postmodernism. The question of regulatory, policy and business use of scientific knowledge should be completely external to the mechanisms of the production of scientific knowledge itself. Keep this whole stuff away, don’t even weight in this argument, or science is doomed to be come an industrialized product (as is nearly every aspect of our life…) And my god do policy-makers need some categorical yes/no inputs! If also scientific facts become liquid, these guys will be allowed to just say anything they want, whenever they want.

“Moreover, for decisions about whether to pursue a research idea further, there is no simple connection between a P value and the probable results of subsequent studies.”

But this is not an editor’s problem! The discoursive flexibility in shaping one’s own research hypothesis, and of trusting one’s own intuition is definitely the most fun part of science: the long and artisanal way of constructing and designing an experiment. But editors: keep away from it, it’s none of your business! Early preparation should not be published. Only the final, less poetic study results should be published in the most dichotomous way as possible.

A wonderful example of this is given by physicist in the gravitational wave community, who just earned a Nobel prize. When they receive a signal, before opening it they go through a highly creative and flexible “playtime” period when they decide how to analyze the signal. When agreed upon, they first write the full paper with all details on the analysis and all in place for a dicothmous message. And only then they open signal box. No flexibility whatsoever is allowed after that moment, and it all comes down to the very dicotomous: yes we have seen it / no we have not (and in fact they do not publish in Nature).

The problem with pressure-to-publish is that it does not allow people to clearly separate these two phases, and it forces people to publish even speculative attempts that should be in the “playtime” area of their activities, but are not yet real science. Rushing results disrupts the possibility of even more robust results later on, because of the “double-dipping” problem hinted above. Unless one just hides what he’s doing…

“What will retiring statistical significance look like? We hope that methods sections and data tabulation will be more detailed and nuanced. Authors will emphasize their estimates and the uncertainty in them — for example, by explicitly discussing the lower and upper limits of their intervals.”

“We hope”… Armiamoci e partite!

“They will not rely on significance tests. When P values are reported, they will be given with sensible precision (for example, P = 0.021 or P = 0.13)”

For what purpose? To give a semblance of rigor? And why do we require the second digit in decimals, while we could choose for other system of representations of numbers? In base ten, the second decimal digit can only be achieved if more statistics is collected. Is Nature going to enforce standards (e.g. on the power of studies) by which the second decimal digit in base ten is reached?

“— without adornments such as stars or letters to denote statistical significance and not as binary inequalities (P  < 0.05 or P > 0.05). Decisions to interpret or to publish results will not be based on statistical thresholds. People will spend less time with statistical software, and more time thinking.”

No they will spend less time trying to have their software spit out some number whatsoever and they will spend more time trying to have their prose spit out some rhetorical figure whatsoever.

“Our call to retire statistical significance and to use confidence intervals as compatibility intervals is not a panacea.”

Definitively not.

“Although it will eliminate many bad practices, it could well introduce new ones. Thus, monitoring the literature for statistical abuses should be an ongoing priority for the scientific community.”

Again: Armiamoci e partite!

“But eradicating categorization will help to halt overconfident claims, unwarranted declarations of ‘no difference’ and absurd statements about ‘replication failure’ when the results from the original and replication studies are highly compatible. The misuse of statistical significance has done much harm to the scientific community and those who rely on scientific advice. P values, intervals and other statistical measures all have their place, but it’s time for statistical significance to go.”

So, to conclude, Nature will not do anything at all, will not impose higher standards, will not coordinate with other public institutions to make pre-recording of clinical trials mandatory, will not hire statisticians to do what an editor is supposed to do – editorial work. Because of course they don’t want to change even a comma in their business model. Nature’s only real concern is to appeal to the audiencewho had their false negatives rejected or contested – which is a noble intent, I admit – but Nature has hardly anything to say about the monstrous problem of false positives, which infest its own journals. And the way they think to attack this is by asking people to rename things and renounce to quantitative measures to focus on qualitative discourse, thus buying out the internal narrations of the preparatory phases of scientific work, and not setting any quantitative standard to evaluate those results. No shadow whatsoever of self-criticism: the call is on the scientific community. That this journal serves so well.

* Bullshit is an officially scientific word. In physics corridors and in all those informal situations at conferences and workshops where the ear of the friendly editor is not allowed, Nature has a reputation of being a receptacle of boasted claims and of self-referential communities. People writing papers for Nature focus much more on story-telling than on the actual message; they are obsessed with getting nice pictures and making connections to fags and fashionable topics; it is crucial to be able to write in proper English and to have an elegant exposition – which creates an obvious bias against researchers from non-English-speaking countries. But this is my perception, and it’s not scientific.

Barber paradox at the time of “excellence”

“A measure of the flexibility of excellence is that it allows the inclusion of reputation as one category among others in a ranking which is in fact definitive of reputation. The metalepsis that allows reputation to be 20 percent of itself is permitted by the intense flexibility of excellence; it allows a category mistake to masquerade as scientific objectivity.”

Bill Readings, The University in Ruins.

Una visita alla Rocchetta Mattei

Sono stato in visita alla Rocchetta Mattei, un castello in finto stile moresco edificato a partire dal 1850 da un personaggio giustamente dimenticato dalla storia. Un tizio che aveva molto più potere e molti più soldi di quanto meritava. Il personaggio in sè sarebbe interessante: inventore di una delle prime pseudoscienze di massa, guaritore dei vips dell’epoca. L’archivio della sua corrispondenza sarebbe stato un oggetto di studio interessante, ma è andato perso, e l’umanità ne soffrirà immensamente…

Un volontario molto volenteroso ci ha introdotto ai misteri del castello, con un gruppo di visitatori. Si mostrava erudito e ci ha raccontato un sacco di cose fantastiche, nel vero senso della parola.

A partire dal caro vecchio senso comune: mai dimenticare la lezione del nonno che ammoniva di parlare soltanto di ciò che si sà, altrimenti meglio stare zitti.


Passando poi dalla numerologia: 1 il numero dell’unione delle religioni; 2 il binomio bianco/nero, buono/cattivo, alto/basso, uomo/dio: 3 la trinità; 4 i punti cardinali; 5 il pentagono, 6 l’esagono simbolo di equilibrio, 8 l’ottagono che è ottagonale, per non parlare del 7, dell’11 del 216 etc. etc. E incredibile! Ogni singolo dettaglio nella costruzione della Rocchetta è composto SOLO di questi numeri, 1,2,3,4,5,6,7,8,9… Ma non è incredibile?!?!!

Siamo poi passati alla geografia: e pazzesco, a voler tracciare una retta da qualsiasi punto a qualsiasi altro della Rocchetta si intercetta un altro punto sulla carta geografica mondiale! Talvolta è La Mecca, talvolta la piramide di Giza, Gerusalemme, Roma, Bisanzio, etc. Poi la retta non dev’essere per forza dritta: per esempio sembrerebbe che ci sia una linea che connette alcuni templi dedicati a San Michele, passando per Torino (alcuni costruiti prima dell’invenzione della proiezione Mercadore…). Coincidenza? Boh. Ma che c’entra Bologna visto che la linea prosegue per la Toscana? Però con poco sforzo la si può far passare da Vergato, basta qualche energia cosmica per piegare lo spazio-tempo fin sotto la Rocca Mattei…

La storia: Paolo Costa sarebbe stato espulso dall’Università di Bologna in quanto filosofo ermetico e non in quanto carbonaro. Ci sarebbero documenti che testimoniano che Leonardo, Matilde di Canossa, etc. si riconoscono tra i cosidetti “illuminati”. etc.

La cucina: a fare il risotto in una piramide si conserva meglio.

La fisica: la meccanica quantistica, il piccolo e il grande. I campi elettromagnetici che animerebbero lo studiolo del Mattei per via di un’attività sismica sottostante, che solo a San Michele in Torino se ne possono trovare di equivalenti…

La matematica: improvvisamente la sezione aurea è un numero periodico.

Potremmo andare avanti con l’antropologia, la zoologia, la biologia, etc. etc. Pazzesco di quante cose era capace il Mattei, questo genio – così definito dalla guida, dimostrando di non avere nessuno distacco critico, e avvallato dalle varie “pubblicazoni” disponibili al bookshop.

Concludiamo però con la ciliegina sulla torta. La medicina: Mattei il guaritore universale il cui segreto medico è stato perso dall’ottusità dei suoi eredi. E qui veramente uno storico della medicina dovrebbe metterci il naso, perché questo fenomeno è successo, e succede ancora. Mattei come prima manifestazione di mali che in maniera molto più sottile affliggono la nostra società. Mattei come cancro, non come guaritore.

Anyway, mai sentite in vita mia così tante stronzate tutte insieme. Molto più probabile la tesi che sento in giro da vari amici non scemi: Mattei era un ricco megalomane che costruì un bordello per ricchissimi nel mezzo dell’Appennino, edificando palazzine anonime nei dintorni e raccogliendo i servizi sessuali delle contadine di tutta la zona.

Rimane però il mistero intorno alla fascinazione che la sua “cura” produsse tra i più potenti del tempo, tant’è che il Mattei è menzionato pure nei Karamazov. Il che porta a due interpretazioni alternative del fenomeno: 1) le manie e delle pulsioni dello strato più ricco e potente della società sono sintomatiche del mondo (a questa tesi si è dedicata, per esempio, la psicoanalisi); 2) i ricchi e potenti sono cretini e ignoranti tanto e quanto la plebe, ma hanno i soldi per erigere monumenti come la Rocchetta Mattei, un luogo temporaneo di aggregazione dell’arroganza mondiale, e che altrimenti non hanno altra vita propria (infatti il restauro della rocchetta è finanziato da un altro di questi organismi).

Secondo la mia guida i nazisti avrebbero saccheggiato la rocchetta. Ecco, meglio avrebbero fatto a raderla al suolo.

PhD course on large deviations, spectral methods, and thermodynamics

I gave a 4h PhD webinar course covering probabilities, large deviations, spectral methods and thermodynamics of stochastic systems for the online school on Complex Systems and Spectral Methods, organized by m friend Francesco Caravelli. This is a first experiment of an online school, and I really wish that it will be replied. You can find the complete videos here. Unfortunately I didn’t have time to go through the lectures of the other speakers yet, but I’m looking forward to it.

In the meantime, here my own lectures:


Lecture notes can be downloaded here (it’s just a draft, please don’t spread around. And please if you find mistakes report them to me!).



Conversations with Gyorgy Scrinis



As part of the activities of Scienceground, yesterday we met Gyorgy Scrinis, an expert on global and conceptual aspects of nutrition. In the afternoon he gave an academic seminar on the theme of corporate influence of food companies in the public health agenda, and later in the evening he engaged in a public discussion on nutritionism – aka nutritional reductionism – which he identifies as one of the dominant ideologies of our time. In between the two events, we organized an open session of discussion to prepare an informed conversation. All of the recordings are available here:

[A day with Gyorgy Scrinis]

As comes to the public discussion (FoodMania.wav), there is one moment I would like to focus on. It starts at 1:17:53, when a person from the audience (who qualifies as a researcher in statistics) questions some of the conclusions of Scrinis. In particular, while agreeing on the misrepresentation that the industry and the system of media provide of scientific results, he defends scientific reductionism arguing that it is based on a solid statistical methods, asking Scrinis whether he imputes the responsibility of the bad state of affairs on scientists. Scrinis has a very precise answer:

Yes, I’m criticizing the scientists…

Here’s my two cents of the scientists’ burden, with a very ideological twist, I admit.

The arena of corporations’ tailoring of products and propaganda is moving more and more into the heart of sciences. After exploiting family, sports, music, yoga, coolness, etc. and whatever other image to convey its products, now the industry is dipping into science, which remains the last “authoritative” discourse in our society. They do so by creating products explicitly targeted to meet the functional needs of our body, on the assumption that science knows how specific nutrients affect metabolism. An example of such a very advanced product was brought forward by one in the audience:


Here is what Scrinis said about it:

It’s not bad nutrients, it’s actually…

My own take is as follows. The problem, to put it mildly, is that it is very questionable that we actually have such detailed knowledge of how nutrients work. To put it strongly, we know nothing about how human metabolism works, apart from islands of respectable knowledge here and there, some at the cellular level, some at the physiological level, some at the epidemiological level and so on. These islands of knowledge do not really communicate very well among themselves. To date, the only grand-unified theory of nutrition is the old-style motto “eat like your grandmother would tell you” (which makes perfect sense from an evolutionary point of view… but evolutionary time-scales are certainly not the time-scales of interest for the industry).

So here’s the scientists’ fault: What keeps these pieces of knowledge tied together is some sort of story-telling, some inside story, some narrative within the science of nutrition that makes the scientists believe (or claim) that they work on the same thing, that their research is solid, and that they are a community of scholars. Maybe they publish on the same journals, maybe they go to the same conferences etc. But is this narrative strong enough against external pressures?

Because, check this out: The global industry that wastes the resources of planet Earth is now postmodern. It is not interested in the specific product, the product is flexible. Want a smart watch? We’ll make you a smart watch. You prefer coffee in capsules? We have that. A very precise food-item that meets your dietary needs? No problem, we’ll package it with a watch that monitors your “nutritional” needs and reminds you it’s time for vitamins…

Production of material goods will scale, eventually. But that’s not the point. The industry is not chasing the next product. The industry is chasing the next story-telling, ahead of manufacturing the product. It experiments and creates trends in the rich countries, and then mass-export them in “developing” ones.

Anybody can judge the quality of the products that this system produces.

So that’s what the industry is after, and people around have to be weary. All those little beliefs and creeds, and sayings and stories, and back-talks etc., all of those things that make up the untold story of a community’s life, all of a sudden get blown up to industrial scale and sold as truth – as if the process of creating that truth did not matter. That’s the definition of pornography. Science used to be prudish, today it is becoming pornographic.

How can such industry “buy out” the internal narrations of a community? What are the mechanisms of how this happens? This is at the core of Scrinis’s research.

So, in the end, I agree with Scrinis, yes, if science lends itself to a reductionist and decontextualized extrapolations, then it is the fault of scientists working on nutritionism, not because they work for the industry (some do, implicitly or explicitly, but of course it’s more complex than that…), but because nutritionists have not established themselves as a serious community that is capable of withstanding pressure from the outside and give a some authenticity to their work.

One final remark is the following. Going back to the milieu of the person questioning Scrinis, I believe this debate has also to include how statistics is used and abused in academia. Unfortunately we did not have the time to discuss this issue (on which we broadly discussed in the activities proposed at Festivaletteratura), nor Scrinis has the competence on this theme, which from my point of view is his major weakness in the argumentation, at least as it comes to bringing further “inside” arguments to his otherwise perfectly sound analysis.

Big Food – corporate influence on the public health agenda

As part of the activities organized around the visit of Gyorgy Scrinis, together with Centro di Salute Internazionale and Dipartimento di Storia Culture e Civiltà we have been organizing this focus on corporate influence on the public health agenda.



Ore 15.30, Aula Capitani – Dipartimento di Storia, Culture e Civiltà,


L’incidenza dei disturbi connessi all’alimentazione come obesità, diabete, ipertensione e cardiopatie è in aumento così come crescono le preoccupazioni riguardo alle implicazioni per la salute dei prodotti alimentari ultraprocessati. A questi problemi le grandi multinazionali del settore alimentare, denominate Big Food, hanno risposto con una serie di strategie volte a spostare l’attenzione dalla qualità del cibo ai singoli nutrienti, identificati sia come il problema da risolvere che la soluzione agli stessi disturbi connessi all’alimentazione, e, in ultima analisi, a depoliticizzare il discorso sull’alimentazione deviando l’attenzione dalle cause strutturali delle patologie legate all’alimentazione, come il ruolo della stessa Big Food nel creare un contesto alimentare insalubre attraverso la produzione di cibo ultraprocessato e confezionato. Parallelamente, le multinazionali del cibo hanno utilizzato diversi canali di influenza per interagire con i processi politici e di governance con l’obiettivo di produrre e mantenere un ambiente normativo favorevole per le loro pratiche e la commercializzazione dei loro prodotti. Esempi di strategie utilizzate da Big Food per influenzare le agende politiche, compresa quella della salute pubblica, vanno dalle azioni di lobby alla partecipazione nelle cosiddette partnertership pubblico-private, dall’utilizzo del loro potere economico nei contesti globali e locali fino alla partecipazione diretta o indiretta ad attività di definizione di leggi, regole e di politiche. Infine, le grandi aziende alimentari sono impegnate attivamente nel plasmare il discorso pubblico e definire i termini del dibattito politico sulle questioni alimentari e nutrizionali attraverso pubblicità di prodotti, campagne sulla salute e benessere, finanziamenti per la ricerca e sponsorizzazione di gruppi di professionisti e di cittadini.