Welcome to European Tribune. It's gone a bit quiet around here these days, but it's still going.

How To Lie With Numbers - a short guide to politics and other things

by JakeS Mon Oct 15th, 2007 at 10:05:42 AM EST

A cause of increasing concern to me is the way numbers and statistics are handled in the public political debate. On the one hand, opinion shapers of various kinds cling to studies and statistics that they consider favourable to their cause, however questionable such statistics may be, with all the desperate fervour of a vampire hunter clinging to his crucifix and wooden stake. On the other hand, there is an impression among most people that 'statistics can be made to say anything,' an impression that allows partisan hacks to get away with sweeping dismissal of entirely valid studies underpinned by solid statistics.

Especially irritating is, of course, the tendency of those same partisan hacks to switch between these two views of statistics, frequently within the same interview or column, relying on the unfortunately short half-life of public memory to conceal this rhetorical two-step.

In the Ideal World(TM), reporters, politicians, pundits and - most importantly - the general public would have a sufficiently solid schooling in mathematics and statistics to render this sort of abuse of statistics a swift form of political suicide. In the real world, unfortunately, this is not the case, and most people have to rely on deconstructions, such as the ones frequently posted here on ET.

Promoted by Colman


But however regular and excellent those deconstructions are, it is impossible to cover the entire span of the public debate, partially because those who post them are doing their work pro bono and have to attend to a day job as well, while the political hacks and professional liars in various think tanks are paid lavish salaries to purposefully muddy the waters of public discourse, but mostly because people unburdened by principles or honesty have one great advantage: It takes far less time to cobble up a phony 'study' (or to tell an outright lie, for that matter) than it does to provide a convincing and comprehensive refutation.

With that fact in mind, and with all possible regard for the excellent and needed work of those who spend time deconstructing hack job statistics here and elsewhere, I propose a different approach: Arming people who do not necessarily have formal schooling in math, science or economics with sufficiently sensitive BS detectors to spot irregular use of numbers in their daily newspaper (I'll leave it as an exercise to the reader to determine whether it is the honest or the dishonest treatment of numbers that can be said to be 'regular' in their local paper).

To that end, I am going to present what will hopefully become a series of examples of shoddy or outright dishonest use of statistics. The main point will not be to deconstruct them, however, although of course that will be part of the exercise. The main point will be to attempt to extract some more general warning signs that one is dealing with sub-par number handling, and hopefully thereby equip the non-mathematically inclined reader with a set of red flags that will serve more general use than the deconstruction of these specific examples.

Picking on the Swedes

Bar graphs, highlighting and correlations

About a year and a half ago, a friend of mine sent me a breathless e-mail with an attached study (pdf) which he claimed 'disproved' the Danish welfare model. Such strong language naturally set my mental antennae on edge so I decided to take a look at what this study of his actually said. Needless to say, I was underwhelmed.

Those readers who are swedish-challenged need not worry. Understanding what they write in the text part of their report is not actually necessary in order to follow this deconstruction or the following extraction of indicators for your BS detector. Suffice is to say that the text itself is actually rather moderate. They couch their report in language that is very careful to not actually state any inflammatory conclusions in so many words, instead relying on the reader to draw the wrong conclusions based on their dishonest presentation of data.

So let's take a look at their graphs.

First off, they show a bar graph comparing a the GINI score of a number of countries (GINI is a measure of income inequality - high GINI means unequal income distribution). Notice that Sweden is highlighted in red:

Not being an economist, I cannot speak as to the suitability of using GINI as a measure in this case. In the following, we shall assume that it is, indeed, an appropriate measure, as the report claims.

Then, in rapid succession, come another three bar graphs, three showing the disposible income of the first, second through fourth and fifth quintile, respectively (I've swapped the last two relative to the original report, hence the funny numbering).

(For the non-mathematically inclined a 'quintile' is a fifth of the population you're looking at - i.o.w. the first quintile of the income distribution is the fifth of the population with the lowest incomes, while the fifth quintile is the fifth of the population with the highest income.)

Disposible income for the poorest fifth of population. Sweden is highlighted.

Disposible income for the middle three-fifths of population. Sweden is highlighted.

Disposible income for the richest fifth of population. Sweden is highlighted.

Lastly they have a graph showing growth in the disposible income for the first quintile. Again, Sweden is highlighted in red:

Looking at these graphs, it's quite clear that:

a) Sweden has very low income inequality.

b) The disposible income in Sweden isn't all that great compared to the other countries in the study. Not even for the poorest fifth of its population.

c) The growth in the disposible income for the poorest fifth of the Swedish population is very small.

The casual reader may be forgiven for concluding that these data show income equality to make society poorer across the board - including the poor people it was supposed to help! And conversely, concluding that income inequality makes society - including the poor - richer across the board (the second conclusion does not, in fact follow from the first, but I digress). This is, at least, the conclusion that my friend came to, based on his presumably cursory examination of the study. The casual reader would be wrong, however. [NB: This paragraph was revisited on 14/10-07 - Jake]

To illustrate what's wrong with making that conclusion based on the highlighting of Sweden, let's repost the images. But now I've made a slight modification. I've highlighted Norway in bright green as well as leaving the original red highlight of Sweden:

Looks kinda different now, doesn't it?

The reason it does, of course, is that highlighting a single country out of a set in a bar graph is the entirely wrong way to go about this kind of data analysis. What they should have done was make a real measure of statistical correlation, or at the very least made scatterplots instead of bar graphs.

That means, of course, that my own little highlighting stunt does not prove that income equality increases wealth, is correlated with increased wealth, or even that it doesn't hamper the increase of wealth. All it proves is that the analysis Svenskt Näringsliv provided is seriously flawed.

There are a couple of other things about the study that are worth consideration (such as the number of countries they have chosen to sample, the utter lack of transparency w.r.t. the method by which the countries in the sample were chosen, etc.), but I think I've presented enough to conclude this essay.

Conclusions

Here I hope to summarize the analysis above, and if possible extract the lessons that are most applicable for everyday use. In this case, I have two lessons that the reader is encouraged to take to heart:

Beware of bar graphs - if someone tells you that X causes Y and presents you with bar graphs, scrutinize them carefully. The proper graph to show correlation is in most cases a scatterplot.* If he's using something else, chances are he's trying to pull a fast one on you.

Especially beware of highlighting - I'm sure highlighting single data points has legitimate uses, but off the top of my head, I cannot think of a single one. A very good indication that Someone Is Up To No Good.

*As an aside, the current incarnation of the wikipedia page on scatterplots shows another thing to beware of: Inappropriately drawn regression curves. I hope to expand a bit on that topic in the future, for now it's sufficient to note that the straight line connecting the two clusters of points on the Old Faithful-graph on the wiki-page is meaningless. At best.

Quite apart from these lessons, there are two points that I wish to drive home:

First, notice that I managed to deconstruct this study purely by looking at the way it treated its data. I never once had to question the underlying assumptions or the validity of the data used to underpin their analysis. This showcases a very important point: It may be easy to lie with statistics, but if the reader is minimally numerate, it's agonizingly hard to lie convincingly.

Second, lest the reader assume that this is some idle intellectual exercise, or that I'm picking on an easy target, let me assure you that I have seen mainstream, serious news outlets from all over the political spectrum citing studies that were at least as questionable as this one as primary sources. My hope is that the reader should now be able to spot at least some of them as well. Because there's only one way to make them go away: Stop buying into shoddy studies, stop trusting the newsies that do buy into them, and make very sure that the newsies in question know why you don't trust them. If their subscriber base decides to demand integrity in their number-crunching, they might just wake up and smell the coffee.

An aside: I did plot the data from the figures above myself and did a little chi-by-eye analysis. If you're prepared to take my word on the issue, you may rest assured that the data does not in any way, shape or form support the notion that increased income disparity is beneficial. If you're not prepared to take my word for it, I am, of course, prepared to show my work in the comments upon request, but I've left it out here, as it really is beside the point of this diary.

Another aside: There was another figure in the study that I thought I'd include for the general amusement of the readership. I'll leave it as an exercise to the reader to figure out why it made me laugh my butt off:

(Hint: It has something to do with significant figures, sample size and goodness-of-fit.)

[Update 15-10-07]

The Entire Series:

How To Lie With Numbers - a short guide to politics and other things - introduction - bar graphs - highlighting.

How To Lie With Numbers 1½ - more bar graphs - a cautionary note

European Tribune - How To Lie With Numbers 2 - Laffer Nonsense From The WSJ - scatterplots- fitting methods - data grouping

- Jake

Display:
...that you'd like to see fisked this way, you can post them in the comments, and I may get around to it (if I have time and if I think I can extract a few new red flags for the BS detection kit).

- Jake

Friends come and go. Enemies accumulate.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Sat Oct 13th, 2007 at 09:06:48 PM EST
Excellent post, JakeS, thanks!

Since we're talking about presentation, I hope you won't mind a nitpick -- when you say:

The casual reader may be forgiven for coming to the conclusion that these data show that income equality does not makes society poorer across the board - including the poor people it was supposed to help!

(emphasis mine), I think I understand your turn of phrase but it might be preferable to be clearer by leaving out the negation -- the story those little pictures tell is that increased income equality goes with lower income across the board (er, you know, socialism etc...), while, if you want the poor to get (a little bit) less poor, you need income inequality (to motivate the rich who are the only ones who create wealth, blah blah). In other words, the standard trickle-down narrative.

by afew (afew(a in a circle)eurotrib_dot_com) on Sun Oct 14th, 2007 at 04:32:01 AM EST
Yeah, sorry, missed that one. Was tired when I finally posted. Need to rewrite that paragraph.

- Jake

Friends come and go. Enemies accumulate.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Sun Oct 14th, 2007 at 06:17:29 AM EST
[ Parent ]
Ok, first of all, nice diary, thanks.

Next, re: the last graph....  I'm thinking that claiming an R2 significant to four digits with a sample size of what, 15 (?) is just silly.  Is there something else?  IANAS.

So, under what auspices was this study done?  Are they serious?  Because it's hard for me to imagine that any serious researcher or academic would suggest generalizing from a single case.

Finally, I would actually be interested in seeing your scatterplot, just to satisfy my curiosity....

by the stormy present (stormypresent aaaaaaat gmail etc) on Sun Oct 14th, 2007 at 07:01:51 AM EST
The study is made by the company Inregia AB on behalf of Svenskt Näringsliv, that is the corporations organisation of Sweden. So it is made for media, not for academia.

Sweden's finest (and perhaps only) collaborative, leftist e-newspaper Synapze.se
by A swedish kind of death on Sun Oct 14th, 2007 at 12:32:53 PM EST
[ Parent ]
Next, re: the last graph....  I'm thinking that claiming an R2 significant to four digits with a sample size of what, 15 (?) is just silly.  Is there something else?  IANAS.

That's certainly one thing that should provoke amusement. There are a couple of other silly things, given here in no particular order:

  1. Fitting a logarithm instead of a straight line. Considering the paucity of data and the way it's distributed, that's just silly, especially when the base number of the logarithm is as big relative to the data range as the one they're getting here.

  2. The fact that they even present something with an r-square of .72 as a result, much less with four decimal places on their fitted parametres.

  3. Look at the way the data is clustered. In the leftmost end, there's two or at most three distinct clusters, while at the rightmost end they have two clusters (and composed of only three points in total!). If any one of the right-hand-side points is off by even a relatively small margin, it'll throw their fit into the crapper (in slightly more technical terms, r-square fitting is rather sensitive to outliers). And they have absolutely no way of knowing whether those points are off.

By way of analogy, what they've done is roughly equivalent to putting together a room of twenty ordinary people, adding in Bill Gates and gasping in surprise at the average net worth of the people in the room.

In the room example, using the median would make far more sense, and in the same fashion there are measures of goodness-of-fit that are less sensitive to outliers than r-squared. But of course those are more complicated to compute (and they are not included in any standard Office suite I know).

So, under what auspices was this study done?

It's hard to say anything definitive about this point, since I did not get the original context, if any, of this study. I suspect, however, that it was simply a case of someone seeing something he thought said something he liked to hear.

The references in the study indicate that the data was culled from some propaganda leaflet coming out of some office in Luxembourg with the aim of attracting investments to Luxembourg. I don't know what they did to the data in the first place, but given the choice of countries to include, I suspect that I wouldn't like it.

Are they serious? Because it's hard for me to imagine that any serious researcher or academic would suggest generalizing from a single case.

I am very much afraid that they are serious. While you're right about how a serious academic researcher would handle things, you must remember that these guys aren't serious researchers, much less academics. They are an employer's union, and they have a political agenda.

This kind of hack job isn't designed to convince anybody who actually takes the time to read it and has the minimal numerical literacy to understand what's going on. It's designed to provide political cover for people who already support its conclusions.

The chain of events goes something like this:

Think tank writes study -> political operative pitches study to friendly newsie, exaggerating a bit in the process -> newsie reports study, exaggerating and simplifying in the process -> other newsies interview politician about study -> politician uses study as justification for policy/protection against criticism/blunt instrument to bash opponents.

The maddening thing about this is that if you attempt to criticize the exaggerations and simplifications, the politicians/political operatives/newsies will respond by saying that it's complicated, technical stuff and they have to simplify matters for the end-users to understand. And it's not like a thorough debunking of the original study will ever make it to the papers of your local newspaper - it is, after all, "too technical" for readers to understand.

Even if you do get to challenge the study publicly, the authors will cling to the weasel words they used in the text of the study, bring out the fourth hand of the Deck of Cards and generally try to muddy the waters.

This works, because if you can muddy the waters badly enough, and few enough people can understand the technical issues, then the side that shouts louder has the better chance of 'winning' the argument.

The fact that we can't shout our opponents down (and the fact that even if we could it would be an intellectually dishonest way of doing business) means that we have to educate people - starting with the newsies - instead.

Finally, I would actually be interested in seeing your scatterplot, just to satisfy my curiosity....

OK, here goes. Remember that the claim in the report was that inequality correlated with wealth? Well, if we want to test that, we should plot their chosen measures of wealth - disposible income and growth - against their chosen measure of inequality - the GINI.

To make the graphs below I had to extract the numbers from their figures, which of course involves a certain uncertainty. Furthermore, the figures didn't come with any error-bars, precluding a straightforward statistical analysis (besides, even if they had, I must admit to being too lazy to roll out the big guns for a hack study like this). Fortunately, as you'll see, it wasn't necessary to go beyond chi-by-eye.

(Recall that a correlation would show up by clustering the data points around a line from the lower left to upper right - an anti-correlation would show up as a clustering of points around a line from upper left to lower right.)

GINI vs. disposible income for poorest quintile.

GINI vs. disposible income for middle three-fifths.

GINI vs. disposible income for richest quintile.

GINI vs. growth.

As you can see, the three first figures do not exactly show impressive correlation. The casual reader could be tempted to say that there is even a slight anti-correlation in the first two, but the reader is warned to be careful in the extreme - if there is any such trend in the data, it is slight and the sampling methods used in this study are suspect anyway, so any conclusions drawn on the basis of this data set should be viewed with the utmost caution.

The growth over GINI plot is what physicists call a "shotgun plot" - for reasons that should be obvious...

- Jake

Friends come and go. Enemies accumulate.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Sun Oct 14th, 2007 at 01:18:50 PM EST
[ Parent ]
Remarkable diary, Jake!

I think that developing the statistical bullshit awareness of the citizens is an urgent task. However, it is difficult for us to reach a significant number of them. Therefore I think we should focus on two targets: journalists and teachers.

There is a French association which has been fighting against the warped use of figures and statistics in the media for almost fifteen years. Its name is Pénombre and they publish regular letters debunking biased uses of statistics in the press. And they do it with humour. In my opinion, reading their publications should be compulsory in the schools teaching journalism. Here is their website (for those who read French): Pénombre

"Dieu se rit des hommes qui se plaignent des conséquences alors qu'ils en chérissent les causes" Jacques-Bénigne Bossuet

by Melanchthon on Sun Oct 14th, 2007 at 10:57:34 AM EST
Melanchthon:
I think that developing the statistical bullshit awareness of the citizens is an urgent task. However, it is difficult for us to reach a significant number of them. Therefore I think we should focus on two targets: journalists and teachers.

True, I think - although for some values of 'journalist' DIY may be a better option.

Realistically, you can never expect most of the population to keep up with statistical arguments. Aside from basic rules of thumb (e.g. ignore percentages, think absolute numbers) most people don't have the cognitive skills or the education to understand statistical analysis.

But they do understand, and can repeat, narratives that they've been fed. So creating narratives will have more of an effect.

by ThatBritGuy (thatbritguy (at) googlemail.com) on Mon Oct 15th, 2007 at 05:52:14 AM EST
[ Parent ]
Realistically, you can never expect most of the population to keep up with statistical arguments. Aside from basic rules of thumb (e.g. ignore percentages, think absolute numbers) most people don't have the cognitive skills or the education to understand statistical analysis.

This comes perilously close to saying that in the age of information there will be a small, numerate aristocracy who actually understand what's going on, and vast numbers of rubes who're being fed just-so-stories and moved around like so many pieces in a game of chess by the elites.

Surely the future is not so dark?

- Jake

Friends come and go. Enemies accumulate.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Mon Oct 15th, 2007 at 09:42:25 AM EST
[ Parent ]
The future?
Nah, this is the past, present, and future!
We do not need better information/facts/data or dissemination thereof. We need better propaganda.
by someone (s0me1smail(a)gmail(d)com) on Mon Oct 15th, 2007 at 09:52:24 AM EST
[ Parent ]
The 'future' in the sense that numeracy will become increasingly important as access to information increases.

Furthermore, I am not as pessimistic as some here with respect to the possibility of enhancing numeracy. Consider how monumental a task it is to ensure basic literacy. Yet that is possible. We do not today make an effort towards ensuring that the broad population possesses basic numeracy that is a fraction of the effort we make towards ensuring that the same people possess basic literacy.

- Jake

Friends come and go. Enemies accumulate.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Mon Oct 15th, 2007 at 10:07:29 AM EST
[ Parent ]
I think we might be at about equal levels for basic numeracy and literacy. People might be able 'to read', but they are not able to read critically, even when the sources do not contain numbers. Just as they are able to compare and manipulate simple numbers, but not be able to think critically about what these numbers say.
by someone (s0me1smail(a)gmail(d)com) on Mon Oct 15th, 2007 at 10:12:40 AM EST
[ Parent ]
I'd have to politely differ here. It's pretty clear that when presented with a report, people split clear into two groups - those that read the numbers (because it's faster and usually gives a fairly complete picture even without the text) and those who read the text because the text is all they can read; the numbers is so much word salad to them (if you'll excuse the expression).

Furthermore, if somebody said 'Bill Gates is very rich - therefore all Americans are rich,' which is the rough plain-text equivalent of what these bozos in Svenskt Näringsliv did with their numbers, I am fairly confident that not only most newsies, but the majority of the general population would notice. And yes, their treatment is actually that bad.

Perhaps it is because there is less of a soft middle road with numbers - to some extend, you either get it or you don't. With text, OTOH, there's the middle road, where you know what it says but not what it means.

- Jake

Friends come and go. Enemies accumulate.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Mon Oct 15th, 2007 at 03:36:33 PM EST
[ Parent ]
Well, this is an excellent discussion, thoughtful, thank you.

Can't believe that you did not use the classic line with this discussion.  "There are lies, damned lies, and statistics ..." Seems so apropro.

Fertile (somewhat already tilled) ground for this type of work:  Global Warming deniers; and people like Bjorn Lomborg.

Blogging regularly at Get Energy Smart. NOW!!!

by a siegel (siegeadATgmailIGNORETHISdotPLEASEcom) on Mon Oct 15th, 2007 at 12:35:04 PM EST
I didn't use that line because I dislike it intensely. Remember what I said about the pervasive attitude that all statistics can be cooked up? Me, I prefer, 'there's liars, damned liars and politicians.'

The reason I don't have any concrete plans for digging into Lomborg et al in any major way is that there are plenty of other people on the 'net who do that a lot better than I can. But I might cull a couple of examples from them in the future.

The reason I chose to pick on the Swedish industrialists this time was because I already had all the figures ready on my hard drive - it's so atrociously obviously a hack job that I use it as an example every time I can get away with it...

- Jake

Friends come and go. Enemies accumulate.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Mon Oct 15th, 2007 at 03:23:05 PM EST
[ Parent ]


Display:
Go to: [ European Tribune Homepage : Top of page : Top of comments ]