Book review: "Calling Bullshit"
Calling bullshit: The art of scepticism in a data-driven world
Authors: Carl Bergstrom & Jevin West
With the age of Big Data comes the age of Big Bullshit. Nowadays we frequently encounter nonsensical report of numbers or data analysis in the news, social media. Or we have all heard a dude bragging about the “novel application of machine learning” in a rooftop party. Yes, we do live in the era of big data but more data can actually produce more bullshit, thanks to our “click economy” that favour sensational reporting of data over sober and nuanced analysis. This is the core message of this underrated pop statistics book. The authors, both statistics professors in Washington University, see themselves as crusaders against the “new age bullshit” that is rapped around fancy data and scientific language. I loved it when they singled out “Ted Talk” as the new brand of bullish mixing “sound-bite science, management speak and techno-optimism”. Check out the book’s cool website.
My highlights:
-The book starts with a deep philosophical question: what is the nature of bullshit? Nodding to Harry Frankfurt’s famous book “on bullshit”.
-Machine learning models are only as good as the data that we feed them. If we give algorithms biased data (sample selection problem), they give us biased results.“Machines are not free of human biases; they perpetuate them depending on the data they're fed”. Hence the need for algorithm transparency and accountability.
-Unsurprisingly, several chapters are devoted to “sampling error”, “selection bias” and “correlations not being causation”
-Two separate chapters on bad reporting and visualisation of descriptive data.
-“When a measure becomes a target; it ceases to be a good measure”. As for an example, if you promote academics on the basis of citations to their papers, they turn citation into a tool for favouritism and alliance building. Hydrogens don't react to the way they're measured, but people do!
-Machine learning models are prone to overfitting (classifying noise as meaningful pattern). So sometimes simpler models can do a better job of predicting the future.
-Numbers don’t speak for themselves. No matter the size of our data, we always need a sound story (theory) to make sense of them! Without theory, our models are vulnerable to pick up “spurious correlation” that breakdown on accelerated time scales. The example they use is the Google Flue Trend which was initially able to predict the spread of flue cases on the basis of forty five search queries. But the model had no underlying story as to how and why these forty five variables were related to the spread of the flue (no causal chain). It just know there was a statistical link between these variables and the flue contagion. The model eventually lost its predictive power as people changed their search behaviour and the digital environment changed.
-A whole chapter devoted to publications bias and p-hacking, which is a hot topic in economics these days.
-The book offers some rule for spotting bullshit:
Question the source of information (Sample selection, biased data)
Be aware of claims that are too bad or too good to be true
Think in the order of magnitude (loved this one)
-And suggest some tips for better dialogue:
Keep it simple
Take it offline
Find common ground
Think more, share less
Don’t be a smart ass (very relevant for economists!)
I give it 8 out of 10.

