Welcome to European Tribune. It's gone a bit quiet around here these days, but it's still going.
Display:
What you are doing above is taking three linear fits of data sets with one data point each, and averaging their slope. If you plot that operation, it looks something like the graph on the left:

When you actually plot it like this, it's pretty obvious that this isn't a sound way to analyse data. If anybody is confused about why such a plot is nonsense, I'll be happy to make an instalment of How To Lie With Numbers about fitting to something that isn't your entire data set. But for the moment, I'll assume that the graph speaks for itself.

The way you fit three independent points properly is all together. Then it looks like the graph on the right. And the slope you get from this fit might actually say something meaningful about the data.

If you do this for Serbian indictees and non-Serbian indictees, you get slopes of 0.0019 indictees/civilian and 0.0019 indictees/civilian, respectively. Uncertainties are impossible to determine, as long as we don't have even the vaguest guesstimate for the uncertainty of the underlying data.

Now, instead of fitting to a linear function, you can also fit to a linear function with offset. This may or may not be meaningful, however: On the one hand, there might be a systematic positive offset - if, say, it takes a certain minimum number of people to plan and carry out war crimes. On the other hand, one might argue that if there is no war, then there are no war criminals (by definition), which would be an argument for forcing the fits through the origin.

(As an aside, fitting techniques exist that incorporate both of these considerations, but they are a lot more work than is justified given the crudity of the data.)

Here's what I get from my fits:

Assuming proportionality (i.e. f(x) = a*x):

Serbian indictees: 0.0019 indictees/civilian
Non-Serbian indictees: 0.0019 indictees/civilian

Serbian convicts: 0.00093 convicts/civilian
Non-Serbian convicts: 0.0009 convicts/civilian

The fits where I permit an offset actually gives slopes that would naïvely indicate a bias in favour of Serbian suspects (0.0017 vs. 0.0021 and 0.00085 vs. 0.0012, for indictments and convicts, resp.). But I don't think they should be given much weight because a) we don't have any good model for what an offset might mean vis-a-vis bias, b) the relative uncertainties on the points in the lower left corner of the plot are almost certainly greater than on the upper right corner (and the lower-left points play a great part in determining the offset) and c) the number of fitted parameters is already perilously close to the total number of data points.

[All fits carried out in GNUPLOT]

- Jake

Addendum: I actually have graphs for all the fits mentioned in this comment, and if anybody is interested, I'll be happy to post them. But for reasons of space, I decided to post just the two most illustrative examples.

Addendum 2: There is a certain sense of irony to the fact that after we have jumped through all these hoops to accommodate an assumption that I found specious from the outset (namely that these three wars constitute independent data points), we can note that doing so gives a result that is even less favourable to the hypothesis advanced in this diary than the result of simply aggregating the Serb and non-Serb data into two ratios, like I did in the very first comment I made...

Friends come and go. Enemies accumulate.

by JakeS (JangoSierra 'at' gmail 'dot' com) on Tue Mar 17th, 2009 at 09:02:52 AM EST

Others have rated this comment as follows:

Display:

Top Diaries

Occasional Series