Thinking about it, it occurs to me that you shouldn't do it to an ensemble either... In fact, you shouldn't do it at all.
If you can get your hands on lower-order data, you fit to that. Partly because of the foregoing, and partly because higher-order data is always more noisy. Subtracting two large numbers from each other (which you have to do to obtain the higher-order data) gives a very high relative uncertainty on the result.
- Jake If you only spend 20 minutes of the rest of your life on economics, go spend them here.
A linear fit of what? There is only one variable here, assuming a stationary model, and that is the return of the index. If you remove the outliers to do the fit you get the same effect Taleb and Mandelbrot are illustrating. A man of words and not of deeds is like a garden full of weeds; a man of deeds and not of words is like a garden full of turds — Anonymous
What your figure illustrates is what happens when you take the difference in GDP between any two measurements, subtract it, remove all outliers in that data set, and then average. If you remove all outliers in the GDP data set, and then run a linear fit you remove fewer points, and get less noisy data on your fit. What's not to like?
To illustrate: Suppose you have GDP numbers for twenty years, indexed to year 1 (indexing is merely a matter of units of measurement - it does not affect the behaviour of the data).
01 100 02 099 -1 03 101 2 04 102 1 05 103 1 06 104 1 07 103 -1 08 102 -1 09 103 1 10 102 -1 11 104 2 12 100 -4 13 102 2 14 102 0 15 104 2 16 105 1 17 107 2 18 108 1 19 109 1 20 109 0
I want to fit the second column to the first column, after removing any outliers (there are none in this case). Your sarcastic suggestion is that economists might want to average the third column, after removing the -4 (because it is "obviously" an outlier).
To a first approximation you can assume the succesive differences are uncorrelated, and then you could try to do a linear fit. Which in this case means an average of the 3rd column. And then you remove the outlier because it has a large Mahalanobis distance.
What you should try to do is filter (e.g., taking successive differences is a filter) the original series untill you get something that presumably is stationary and then fit some sort of ad-hoc model. ARIMA models are ways to reduce the model to some linear regression or other, and you always have the issue of outlier rejection. A man of words and not of deeds is like a garden full of weeds; a man of deeds and not of words is like a garden full of turds — Anonymous
But then again, GDP growth isn't uncorrelated either...