Just signed up. It's nice to see this issue being seriously debated!
Anyway I re-did the analysis using more reliable data. Same results. I then added various combinations of available socio-demographic data (household income, age, education) and so on.
I have found one factor - the percent of the population holding bachelor's degrees - very significantly linked with Clinton's results. However, it is weaker than the Diebold which effect still remains with about the same strength.
PS. Thanks to all the persons who cared to post election data. Nice forum BTW.
I have to get an answer to Kucinich tonight, so any help you can offer is greatly appreciated. Conservatives want live babies so they can raise them to be dead soldiers. - George Carlin
I just wanted to alert you to ANOTHER factor (other than social-economic data) which is probably the main reason why some townships have Diebold machines and other do not: their location in the state.
Here is a graph showing where the machines are used:
And, here is a blog post I made to discuss the results: http://electionstats.wordpress.com/2008/01/14/vote-counting-methods-drawn-on-a-nh-map/
Basically, it kind of looks like there is no fraud, since the machine usage is so biased in terms of location. BUT, if you only look at townships with 500-800 democrat votes, their usage of machines is close to even split with the hand-counters, AND their distribution in the state is more random.
So, I think we should somehow encourage the NH SOS to only do recounts for these medium sized towns, since the bias still exists, and there are not as many votes to count so it would be a lot cheaper.
Did that (thanks to brfox), doesn't cut the mustard.
I also hand-merged the data, correcting town names, and using official data on voting machine usage.
The percentage of population holding bachelor's degrees is now extremely well correlated with Clinton's score (maybe too well in fact...); the Diebold still has an important effect...
You can get a .tar.gz with R scripts and data from the link on the blog entry:
http://call-with-current-continuation.blogspot.com/2008/01/diebold-effect-sticks-around-need.html
Now this is about the limit of my statistical knowledge so I'll let experts talk.
> summary(model4) Call: lm(formula = nh$delta ~ nh$totalpopulation * nh$total * nh$machine + nh$unemploymentrate + nh$percentholdingbachelorsdegree + nh$lat * nh$long) Residuals: Min 1Q Median 3Q Max -0.30281 -0.07168 -0.00144 0.07717 0.40634 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.929e+01 1.290e+02 0.459 0.6464 nh$totalpopulation 5.890e-06 7.438e-06 0.792 0.4293 nh$total -4.993e-06 7.575e-05 -0.066 0.9475 nh$machine 8.760e-02 3.521e-02 2.488 0.0136 nh$unemploymentrate -4.817e-04 2.334e-04 -2.064 0.0403 nh$percentholdingbachelorsdegree -4.559e-03 6.477e-04 -7.038 2.74e-11 nh$lat -1.197e+00 2.982e+00 -0.401 0.6886 nh$long 8.176e-01 1.805e+00 0.453 0.6510 nh$totalpopulation:nh$total 7.043e-09 1.727e-08 0.408 0.6838 nh$totalpopulation:nh$machine -9.572e-06 7.865e-06 -1.217 0.2249 nh$total:nh$machine 1.604e-05 7.627e-05 0.210 0.8337 nh$lat:nh$long -1.649e-02 4.171e-02 -0.395 0.6929 nh$totalpopulation:nh$total:nh$machine -6.929e-09 1.727e-08 -0.401 0.6887 (Intercept) nh$totalpopulation nh$total nh$machine * nh$unemploymentrate * nh$percentholdingbachelorsdegree * * * nh$lat nh$long nh$totalpopulation:nh$total nh$totalpopulation:nh$machine nh$total:nh$machine nh$lat:nh$long nh$totalpopulation:nh$total:nh$machine --- Signif. codes: 0 `* * *' 0.001 `* *' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.1159 on 209 degrees of freedom (37 observations deleted due to missingness) Multiple R-Squared: 0.3802, Adjusted R-squared: 0.3446 F-statistic: 10.68 on 12 and 209 DF, p-value: <2.2e-16>anova(model4) > anova(model4) Analysis of Variance Table Response: nh$delta Df Sum Sq Mean Sq F value Pr(>F) nh$totalpopulation 1 0.11603 0.11603 8.6443 0.003650 nh$total 1 0.00695 0.00695 0.5177 0.472632 nh$machine 1 0.36967 0.36967 27.5398 3.769e-07 nh$unemploymentrate 1 0.14791 0.14791 11.0191 0.001064 nh$percentholdingbachelorsdegree 1 0.62402 0.62402 46.4883 9.718e-11 nh$lat 1 0.00133 0.00133 0.0992 0.753048 nh$long 1 0.37698 0.37698 28.0843 2.940e-07 nh$totalpopulation:nh$total 1 0.00209 0.00209 0.1559 0.693390 nh$totalpopulation:nh$machine 1 0.07083 0.07083 5.2769 0.022601 nh$total:nh$machine 1 0.00024 0.00024 0.0182 0.892720 nh$lat:nh$long 1 0.00241 0.00241 0.1795 0.672213 nh$totalpopulation:nh$total:nh$machine 1 0.00216 0.00216 0.1610 0.688670 Residuals 209 2.80545 0.01342 nh$totalpopulation * * nh$total nh$machine * * * nh$unemploymentrate * * nh$percentholdingbachelorsdegree * * * nh$lat nh$long * * * nh$totalpopulation:nh$total nh$totalpopulation:nh$machine * nh$total:nh$machine nh$lat:nh$long nh$totalpopulation:nh$total:nh$machine Residuals --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Race may not be a huge factor in NH because my anecdotal impression (I've never been there) is that it's pretty darn overwhelmingly white.
That won't necessarily show up on census data, will it? Though it is worth getting the data just in case a 1% shift in the gender ratio from town to town actually explains something. We have met the enemy, and he is us — Pogo
Can you do a correlation matrix of the predictor variables? We have met the enemy, and he is us — Pogo
Oh no please don't call me like that. You'll be really disappointed at my credentials. I was trying to word my findings carefully to avoid this, but let me put here a full disclaimer:
That being said...
> Is the Diebold affect still showing about 4.6% on Clinton's tally?
Slightly less. With a coefficient of 3.18 percentage points, Diebold is still the non-political variable having the highest coefficient in the model.