This is the plot of the number of votes received by Clinton and Obama by precinct And this is the plot of the same data on a log/log scale I think it is immediately apparent that the log/log plot is better. The line is the result of the following regression:
Call: lm(formula = log(Obama..d) ~ log(Clinton..d), data = NHDemVote, subset = Obama..d * Clinton..d > 0) Residuals: Min 1Q Median 3Q Max -1.898487 -0.252916 -0.003111 0.249842 1.066195 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.69014 0.10753 6.418 5.41e-10 * log(Clinton..d) 0.87164 0.01939 44.955 < 2e-16 * --- Signif. codes: 0 `*' 0.001 `*' 0.01 `' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.3756 on 298 degrees of freedom Multiple R-Squared: 0.8715, Adjusted R-squared: 0.8711 F-statistic: 2021 on 1 and 298 DF, p-value: < 2.2e-16
Residuals: Min 1Q Median 3Q Max -1.898487 -0.252916 -0.003111 0.249842 1.066195
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.69014 0.10753 6.418 5.41e-10 * log(Clinton..d) 0.87164 0.01939 44.955 < 2e-16 * --- Signif. codes: 0 `*' 0.001 `*' 0.01 `' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.3756 on 298 degrees of freedom Multiple R-Squared: 0.8715, Adjusted R-squared: 0.8711 F-statistic: 2021 on 1 and 298 DF, p-value: < 2.2e-16
If so this would blow the urban vs. rural hypothesis for explaining Clinton's better performance in machine counted votes. Index of Frank's Diaries
I believe some of the other models attempted to address it, showing precinct size did not explain things. And think about it: What black candidate has ever done better in rural precincts compared with urban ones (with the possible exception of Alan Keyes in his loss to Obama in 2004)?
And why, knowing the demographics Clinton and Obama play well with, would we bet that bigger, wealthier precincts (thought to have machines) would support Clinton? Conservatives want live babies so they can raise them to be dead soldiers. - George Carlin
If you use Obama's vote percentage as a predictor of Clinton's vote percentage you get a regression line that's much closer to 1:1 - this is because it is different to minimize the variation in Clinton's vote given Obama's than Obama's given Clinton. The correlation is 93% (explained to be that high because precinct size correlates with both vote counts) and that should be the geometric mean of the two regression slopes.
In fact, linear regression is not the proper tool here as we're not really trying to use one of them as predictor for the other but rather find a relationship between the two that treats them on an equal footing. Principal component analysis would be much better.
In any case, there seems to be a very slight slope here, favouring Clinton in large precincts.
I think I'm going to replace that chart with one in which Machine vs. Hand counting is represented by different colours. We have met the enemy, and he is us — Pogo
Here's a chart of vote percentages:
We have met the enemy, and he is us — Pogo