Display:
Continuation!  I've been trying to hunt you down for two days, Oh Guru of Teh Regression.  Glad to see you.

I have to get an answer to Kucinich tonight, so any help you can offer is greatly appreciated.

Conservatives want live babies so they can raise them to be dead soldiers. - George Carlin

by Drew J Jones (myfriends@thisispancakes.com) on Sun Jan 13th, 2008 at 05:49:08 PM EST
[ Parent ]
I just signed up, too and I've been corresponding with mr. continuation guy via reddit.

I just wanted to alert you to ANOTHER factor (other than social-economic data) which is probably the main reason why some townships have Diebold machines and other do not: their location in the state.

Here is a graph showing where the machines are used:

And, here is a blog post I made to discuss the results:
http://electionstats.wordpress.com/2008/01/14/vote-counting-methods-drawn-on-a-nh-map/

Basically, it kind of looks like there is no fraud, since the machine usage is so biased in terms of location.  BUT, if you only look at townships with 500-800 democrat votes, their usage of machines is close to even split with the hand-counters, AND their distribution in the state is more random.

So, I think we should somehow encourage the NH SOS to only do recounts for these medium sized towns, since the bias still exists, and there are not as many votes to count so it would be a lot cheaper.

by brfox (goto reddit and send message to brfox) on Mon Jan 14th, 2008 at 07:23:06 PM EST
[ Parent ]
That is very interesting. I think I'm going to throw Latitude and Longitude into the regression as explanatory variables.

We have met the enemy, and he is us — Pogo
by Migeru (migeru at eurotrib dot com) on Tue Jan 15th, 2008 at 03:54:20 AM EST
[ Parent ]
> That is very interesting. I think I'm going to throw Latitude and Longitude into the regression as explanatory variables.

Did that (thanks to brfox), doesn't cut the mustard.

I also hand-merged the data, correcting town names, and using official data on voting machine usage.

The percentage of population holding bachelor's degrees is now extremely well correlated with Clinton's score (maybe too well in fact...); the Diebold still has an important effect...

You can get a .tar.gz with R scripts and data from the link on the blog entry:

http://call-with-current-continuation.blogspot.com/2008/01/diebold-effect-sticks-around-need.html

Now this is about the limit of my statistical knowledge so I'll let experts talk.

by continuation (continuation pretzel ouvaton point org) on Tue Jan 15th, 2008 at 06:12:41 PM EST
[ Parent ]
Can you use (Clinton% - Obama%) as the response variable rather than just Clinton's score?

We have met the enemy, and he is us — Pogo
by Migeru (migeru at eurotrib dot com) on Tue Jan 15th, 2008 at 06:23:54 PM EST
[ Parent ]
> Can you use (Clinton% - Obama%) as the response variable rather than just Clinton's score? OK, here it is: delta = clinton - obama
> summary(model4)

Call:
lm(formula = nh$delta ~ nh$totalpopulation * nh$total * nh$machine + 
    nh$unemploymentrate + nh$percentholdingbachelorsdegree + 
    nh$lat * nh$long)

Residuals:
     Min       1Q   Median	 3Q	 Max 
-0.30281 -0.07168 -0.00144  0.07717  0.40634 

Coefficients:
					 Estimate Std. Error t value Pr(>|t|)
(Intercept)				5.929e+01  1.290e+02   0.459   0.6464
nh$totalpopulation			5.890e-06  7.438e-06   0.792   0.4293
nh$total			       -4.993e-06  7.575e-05  -0.066   0.9475
nh$machine				8.760e-02  3.521e-02   2.488   0.0136
nh$unemploymentrate		       -4.817e-04  2.334e-04  -2.064   0.0403
nh$percentholdingbachelorsdegree       -4.559e-03  6.477e-04  -7.038 2.74e-11
nh$lat				       -1.197e+00  2.982e+00  -0.401   0.6886
nh$long 				8.176e-01  1.805e+00   0.453   0.6510
nh$totalpopulation:nh$total		7.043e-09  1.727e-08   0.408   0.6838
nh$totalpopulation:nh$machine	       -9.572e-06  7.865e-06  -1.217   0.2249
nh$total:nh$machine			1.604e-05  7.627e-05   0.210   0.8337
nh$lat:nh$long			       -1.649e-02  4.171e-02  -0.395   0.6929
nh$totalpopulation:nh$total:nh$machine -6.929e-09  1.727e-08  -0.401   0.6887
(Intercept)				  
nh$totalpopulation			  
nh$total				  
nh$machine			       *  
nh$unemploymentrate		       *  
nh$percentholdingbachelorsdegree       * * *
nh$lat					  
nh$long 				  
nh$totalpopulation:nh$total		  
nh$totalpopulation:nh$machine		  
nh$total:nh$machine			  
nh$lat:nh$long				  
nh$totalpopulation:nh$total:nh$machine	  
---
Signif. codes:	0 `*  *  *' 0.001 `*  *' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Residual standard error: 0.1159 on 209 degrees of freedom
  (37 observations deleted due to missingness)
Multiple R-Squared: 0.3802,	Adjusted R-squared: 0.3446 
F-statistic: 10.68 on 12 and 209 DF,  p-value: <2.2e-16>anova(model4)
> anova(model4)
Analysis of Variance Table

Response: nh$delta
					Df  Sum Sq Mean Sq F value    Pr(>F)
nh$totalpopulation			 1 0.11603 0.11603  8.6443  0.003650
nh$total				 1 0.00695 0.00695  0.5177  0.472632
nh$machine				 1 0.36967 0.36967 27.5398 3.769e-07
nh$unemploymentrate			 1 0.14791 0.14791 11.0191  0.001064
nh$percentholdingbachelorsdegree	 1 0.62402 0.62402 46.4883 9.718e-11
nh$lat					 1 0.00133 0.00133  0.0992  0.753048
nh$long 				 1 0.37698 0.37698 28.0843 2.940e-07
nh$totalpopulation:nh$total		 1 0.00209 0.00209  0.1559  0.693390
nh$totalpopulation:nh$machine		 1 0.07083 0.07083  5.2769  0.022601
nh$total:nh$machine			 1 0.00024 0.00024  0.0182  0.892720
nh$lat:nh$long				 1 0.00241 0.00241  0.1795  0.672213
nh$totalpopulation:nh$total:nh$machine	 1 0.00216 0.00216  0.1610  0.688670
Residuals			       209 2.80545 0.01342		    
					  
nh$totalpopulation		       * * 
nh$total				  
nh$machine			       * * *
nh$unemploymentrate		       * * 
nh$percentholdingbachelorsdegree       * * *
nh$lat					  
nh$long 			       * * *
nh$totalpopulation:nh$total		  
nh$totalpopulation:nh$machine	       *  
nh$total:nh$machine			  
nh$lat:nh$long				  
nh$totalpopulation:nh$total:nh$machine	  
Residuals				  
---
Signif. codes:	0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
by continuation (continuation pretzel ouvaton point org) on Wed Jan 16th, 2008 at 03:49:47 AM EST
[ Parent ]
Thanks. This means the following: there are three statistically significant effects. In order of decreasing significance:
  • percentage holding bachelor's degrees - with a coefficient in favour of Obama of 46±6 e-4. Assuming the percentage is expressed from 0 to 100 and not from 0 to 1, each 10% increase in the proportion of people with bachelor's degrees results in a 4.6%±0.6% change in the vote percentage difference in favour of Obama. This is consistent with the fact that this variable explains 22% of the variance as per the ANOVA table, which would not be the case assuming the variable is expressed from 0 to 1. This is an extremely significant effect.

  • machines - the presence of machines results in an 8.8%±3.5% swing towards Clinton. This effect is significant to 98% and explains 13% of the variance.

  • unemployment rate - with a coefficient in favour of Obama of 5±2 e-4. That is, a 10% increase in the unemployment rate translates into a 0.5%±0.2% change in the vote percentage difference in favour of Obama. This is significant to 95% and explains 5% of the variance.

Note that it statistical significance and explanatory power correlate, but it is possible to have statistically significant coefficients not explaining much, and it is possible to have coefficients not significantly different from zero esplaining large fractions of the variance.

We have met the enemy, and he is us — Pogo
by Migeru (migeru at eurotrib dot com) on Wed Jan 16th, 2008 at 05:57:10 AM EST
[ Parent ]
Did anyone run gender?
by the stormy present (stormypresent aaaaaaat gmail etc) on Wed Jan 16th, 2008 at 06:27:03 AM EST
[ Parent ]
Apparently that one is missing, as is race. At least they're missing from continuation's regression.

We have met the enemy, and he is us — Pogo
by Migeru (migeru at eurotrib dot com) on Wed Jan 16th, 2008 at 06:50:37 AM EST
[ Parent ]
From what I've read elsewhere, it seems there was a significant female swing toward Clinton in NH.  The assumption in leaving it out of these regressions may be that gender wouldn't correlate with geographical distribution, but I'm not sure that would turn out to be true.

Race may not be a huge factor in NH because my anecdotal impression (I've never been there) is that it's pretty darn overwhelmingly white.

by the stormy present (stormypresent aaaaaaat gmail etc) on Wed Jan 16th, 2008 at 07:24:03 AM EST
[ Parent ]
The CNN exit polls showed a 57% female/43% male distribution, as pointed out by Dataguy in his BooTrib diary.

That won't necessarily show up on census data, will it? Though it is worth getting the data just in case a 1% shift in the gender ratio from town to town actually explains something.

We have met the enemy, and he is us — Pogo

by Migeru (migeru at eurotrib dot com) on Wed Jan 16th, 2008 at 07:29:28 AM EST
[ Parent ]
It is not unusual for the gender balance in small towns, rural areas and inner cities to skew in one direction or another.  I'm not sure about NH, though.  But it could also be interesting to see if the exit polls break down gender turnout by region, i.e. to see if urban women were more likely to vote than rural or small-town women, or something like that.
by the stormy present (stormypresent aaaaaaat gmail etc) on Wed Jan 16th, 2008 at 07:33:46 AM EST
[ Parent ]
I have one nagging concern with all this.

Can you do a correlation matrix of the predictor variables?

We have met the enemy, and he is us — Pogo

by Migeru (migeru at eurotrib dot com) on Wed Jan 16th, 2008 at 06:49:38 AM EST
[ Parent ]

Display:
Login
. Make a new account
. Reset password
Occasional Series