Display:
your advice on bar graphs is working already.

although I can't find the bar graph that set me off yesterday. The graph was a graph of US casualties that showed a dramatic decrease in casualties for last couple of months following a steady rise for the rest of the year. comentators were pointing out that the casualties for last month were about half that of the peak and using this as evidence of the success of the current surge.

Looking at it, I more instantly thought, hmmm, bar graph, they're pulling a fast one. The obvious choice was the graph had been chosen to start in January. I thought, wonder if the weather a year ago had any effect, so went to have a look at the figures, and no I was wrong, but looking back further, the half casualties are still at the top edge of those figures from further back than a year.

2005    107    58    35    52    80    78    54    85    49    96    84    68
2006    62    55    31    76    69    61    43    65    72    106    70    112
2007    83    81    81    104    126    101    78    84    65    24    0    0


Any idiot can face a crisis - it's day to day living that wears you out.
by ceebs (ceebs (at) eurotrib (dot) com) on Thu Oct 18th, 2007 at 08:22:44 PM EST
I'd be very interested in seeing that graph, if you can dig it out. That being said, from what you've presented so far it doesn't actually sound like the fault was the data format (bar graphs strike me as an entirely appropriate way of displaying that kind of data - it's a comparison, which is what bar graphs are good for remember?) so much as the data range. Leaving out inconvenient parts of the data set is another truly classic way of scamming with numbers. (As an aside, truncating data sets has always struck me as something of an exercise in black magic even at the best of times, and that's when everyone involved is being honest!)

The way these guys you're talking about picked out a single month for comparison, however, is precisely the kind of dishonest highlighting I was talking about in the first installment. When doing data analysis, you are not permitted to pick and choose single points and compare them to other single points or to peaks or whatever, because every data set has outliers and random fluctuations, and there is zero guarantee that the point you decide to pick is actually representative of anything.

A couple of general things to keep in mind with the casualty data from Vietraq is that it fluctuates somewhat from month to month (that's why I'd prefer to use a scatterplot rather than a bar graph myself) and it also depends on operational posture, number of troops in the field, etc. It seems like a reasonable proxy for how well the war is going, but one should be careful not to take it too far - after all, the Americans could simply sit in their compounds and get everything they need by air, and casualties would plummet. They would also, however, lose the war that way (to any extend that it isn't already lost, that is).

I'm tired and going to bed now, but I may return tomorrow (well, later today, technically) with some plots of data on Vietraq casualties over time.

- Jake

If you only spend 20 minutes of the rest of your life on economics, go spend them here.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Thu Oct 18th, 2007 at 09:27:44 PM EST
[ Parent ]
I don't know about you, but to me this looks like an upwards trend:

FWIW, when GNUPLOT fits a first order polynomial to these data, it gives an upwards slope that is more than two asymptotic standard errors greater than zero. While this is not a proper statistical significance analysis, it does show that calling it an upwards trend is not grossly misleading.

- Jake

If you only spend 20 minutes of the rest of your life on economics, go spend them here.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Fri Oct 19th, 2007 at 09:50:26 AM EST
[ Parent ]
Oh, forgot to tell what's on the axes and where I got the data: x axis is the month, with the first full month after invasion being labeled 1 and excluding the first and last month of war since they are incomplete. The y axis is the number of casualties during that month.

Data source.

Apologies for the double post.

- Jake

If you only spend 20 minutes of the rest of your life on economics, go spend them here.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Fri Oct 19th, 2007 at 09:54:05 AM EST
[ Parent ]
I'd have some trouble fitting a downwards curve to that data ....

Day by day data might draw an interesting curve.

by Colman (colman at eurotrib.com) on Fri Oct 19th, 2007 at 10:01:14 AM EST
[ Parent ]
I'd have some trouble fitting a downwards curve to that data ....

Not if you were with the American Enterprise Institute :-P

Day by day data might draw an interesting curve.

Nah, random scatter would obscure any trend if you go to that resoluation.

I tried running two- and three-month running averages, but that doesn't make things a lot prettier, so I decided to just go ahead an post the raw data. This leads me to believe that a resolution of about one month (maybe you could push it down to two weeks, or even one week, but I don't think you could go much further) is about optimal as far as grouping goes.

Of course, one could use running averages (i.e. have a data point for each day that represents the average deaths pr. day for the 30-day period leading up to the point). But I'm not convinced that there is sufficient additional information to be obtained by doing so to justify the bother.

- Jake

If you only spend 20 minutes of the rest of your life on economics, go spend them here.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Fri Oct 19th, 2007 at 10:47:51 AM EST
[ Parent ]

Not saying it's meaningful though ;-)

the subtle mathematical technique used is called drawing a random line that shows the message you intend  and bluffing competence.  I think it's the same technique used in the original graph in the story.

Any idiot can face a crisis - it's day to day living that wears you out.

by ceebs (ceebs (at) eurotrib (dot) com) on Fri Oct 19th, 2007 at 10:59:16 AM EST
[ Parent ]
In the previous plot of casualties from Vietraq, I accidentally only displayed data going up to month 35 (but the fit was from the full data set). Full plot here:

Unfortunately, the trend only becomes clearer if you include all the data...

Apologies all around for lack of proofreading.

- Jake

If you only spend 20 minutes of the rest of your life on economics, go spend them here.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Fri Oct 19th, 2007 at 11:03:53 AM EST
[ Parent ]
Is that the same data set from iCasualties.org that this graph uses?

Iraqi Deaths By Year (iCasualties.org as of 2007/10/24)

Truth unfolds in time through a communal process.

by marco (cowannar at gmail punkt com) on Wed Oct 24th, 2007 at 09:06:13 AM EST
[ Parent ]
It's the same source, so If they use their own data, it is. But their graph only goes back two and a half years, and I disagree with the way they've presented the data (not that it's wrong, but I'm not sure it's particularly meaningful to do month-by-month comparisons - you'd normally do that if you thought that there was a reproducible pattern that depended on time of year, and it doesn't look that way to me), so I made my own to cover the entire war and occupation.

- Jake

If you only spend 20 minutes of the rest of your life on economics, go spend them here.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Wed Oct 24th, 2007 at 03:49:02 PM EST
[ Parent ]
It was posted by bruno-ken in  yesterday's Salon.

by afew (afew(a in a circle)eurotrib_dot_com) on Fri Oct 19th, 2007 at 03:13:29 AM EST
[ Parent ]
Thanks for that, I knew I'd seen it somewhere, but it was a complete blank as to where.

Any idiot can face a crisis - it's day to day living that wears you out.
by ceebs (ceebs (at) eurotrib (dot) com) on Fri Oct 19th, 2007 at 03:15:07 AM EST
[ Parent ]
As I surmised, the issue here is not with the use of bar graphs but with dishonest highlighting (of course the dishonest highlighting in question was made possible by the use of bar graphs) and dishonest truncation of data.

- Jake

If you only spend 20 minutes of the rest of your life on economics, go spend them here.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Fri Oct 19th, 2007 at 07:52:32 AM EST
[ Parent ]
This graph is better:

http://icasualties.org/oif/IraqiDeathsByYear.aspx

(I could not figure out how to insert it as a static image file.)

Truth unfolds in time through a communal process.

by marco (cowannar at gmail punkt com) on Fri Oct 19th, 2007 at 07:18:02 AM EST
[ Parent ]
I don't know about that, fhe figures for Iraqui casualties are I think too unreliable, and have been messed around with too much to produce any useful data.we have groups in differnt areas producing casulaty figures thatsuggest the differing success of their own internal factors.

I think the only probably reasonably accurate figures are those for US casualties, as its hard to hide bodies turning up on the home front.

Any idiot can face a crisis - it's day to day living that wears you out.

by ceebs (ceebs (at) eurotrib (dot) com) on Fri Oct 19th, 2007 at 08:49:50 AM EST
[ Parent ]
I agree, coalition regulars are the only variable counted in any resonably fair manner. This plot does:


(Click for details)

The bars are week by week casualties and the blue line is a four week moving average. It might look a bit of to the right because it is the average of the current week and the three weeks preceding this week. Following the blue line we can see that it has been an extended higher level of violence starting around september 2006. We also see that the level of violence varies greatly and picking four relatively calm weeks in september 2007 as proof of anything is highly dubious.

A vote for PES is a vote for EPP! A vote for EPP is a vote for PES! Support the coalition, vote EPP-PES in 2009!

by A swedish kind of death on Sat Oct 20th, 2007 at 12:25:47 PM EST
[ Parent ]
Wev'e been talking a lot about the presentation of data without a lot of discussion about data collection methods. Granted, that's not the topic at hand, but once we get into Iraq deaths, we reallyneed to remember that it's fantasyland.

The collection of data for Iraqi and US deaths is about as fraudulent as one could imagine- what the tame Iraqis have done, along with the PCA is to delete large pieces of data and massage the rest, thereby cooking the books.- such as deaths by IED,(gone) and, in  the case of Civilian deaths, redefining criminal vs. sectarian deaths by preposterous criteria- like whether they got shot in the front of the head vs. the back of the head.

Garbage in- garbage out.

Capitalism searches out the darkest corners of human potential, and mainlines them.

by geezer in Paris (risico at wanadoo(flypoop)fr) on Wed Oct 24th, 2007 at 07:57:01 AM EST
[ Parent ]
...and is one of the hardest things for end-users of news media to detect.

It is possible to write entire books on the subject of data collection, and I decided that it was beyond the scope of this guide to include it - especially considering the fact that many of the techniques to detect doctored data acquisition require that you get your hands on the primary sources, which is a lot of bother for a newspaper aticle.

And often hacks will employ both bad data and bad presentation. It's usually easier to nail them on the presentation side of things...

But you're certainly right that any total figure for Iraqi casualties less than half a million or so is pure fiction. The official numbers certainly are. By at least an order of magnitude.

- Jake

If you only spend 20 minutes of the rest of your life on economics, go spend them here.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Wed Oct 24th, 2007 at 04:02:40 PM EST
[ Parent ]
Sometime back I read, or heard, the US military isn't counting military deaths occurring outside Iraq in with the Iraqi count.  This little practice means a soldier who is severely wounded in Iraq but latter dies in a hospital in Germany, say, isn't included.

I have not been able to verify this.

by ATinNM on Thu Oct 25th, 2007 at 12:57:00 AM EST
[ Parent ]
That is certainly interesting, if true.

OTOH, soldier deaths are useful (to the extent that deaths can be useful...) primarily as a proxy for how things are going in general. So it does not really matter whether they lie a bit about the real numbers, as long as they've been lying in the same way since the war started.

The absolute values of coalition fatality figures from Vietraq are suspect anyway due to the fairly widespread employment of mercenary militias by the Coalition, as their numbers do not count towards casualties when they get killed.

- Jake

If you only spend 20 minutes of the rest of your life on economics, go spend them here.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Thu Oct 25th, 2007 at 08:19:03 AM EST
[ Parent ]
Yep, eight out of the past ten months have been above the 34-month average. With October not over yet, make that eight out of nine.
(13 September to 13 October was Ramadan.)

-----
sapere aude
by Number 6 on Mon Oct 22nd, 2007 at 06:44:02 AM EST
[ Parent ]

Display:
Login
. Make a new account
. Reset password
Occasional Series