Welcome to European Tribune. It's gone a bit quiet around here these days, but it's still going.
Display:
> That is very interesting. I think I'm going to throw Latitude and Longitude into the regression as explanatory variables.

Did that (thanks to brfox), doesn't cut the mustard.

I also hand-merged the data, correcting town names, and using official data on voting machine usage.

The percentage of population holding bachelor's degrees is now extremely well correlated with Clinton's score (maybe too well in fact...); the Diebold still has an important effect...

You can get a .tar.gz with R scripts and data from the link on the blog entry:

http://call-with-current-continuation.blogspot.com/2008/01/diebold-effect-sticks-around-need.html

Now this is about the limit of my statistical knowledge so I'll let experts talk.

by continuation (continuation pretzel ouvaton point org) on Tue Jan 15th, 2008 at 06:12:41 PM EST
[ Parent ]
Can you use (Clinton% - Obama%) as the response variable rather than just Clinton's score?

We have met the enemy, and he is us — Pogo
by Carrie (migeru at eurotrib dot com) on Tue Jan 15th, 2008 at 06:23:54 PM EST
[ Parent ]
> Can you use (Clinton% - Obama%) as the response variable rather than just Clinton's score? OK, here it is: delta = clinton - obama
> summary(model4)

Call:
lm(formula = nh$delta ~ nh$totalpopulation * nh$total * nh$machine + 
    nh$unemploymentrate + nh$percentholdingbachelorsdegree + 
    nh$lat * nh$long)

Residuals:
     Min       1Q   Median	 3Q	 Max 
-0.30281 -0.07168 -0.00144  0.07717  0.40634 

Coefficients:
					 Estimate Std. Error t value Pr(>|t|)
(Intercept)				5.929e+01  1.290e+02   0.459   0.6464
nh$totalpopulation			5.890e-06  7.438e-06   0.792   0.4293
nh$total			       -4.993e-06  7.575e-05  -0.066   0.9475
nh$machine				8.760e-02  3.521e-02   2.488   0.0136
nh$unemploymentrate		       -4.817e-04  2.334e-04  -2.064   0.0403
nh$percentholdingbachelorsdegree       -4.559e-03  6.477e-04  -7.038 2.74e-11
nh$lat				       -1.197e+00  2.982e+00  -0.401   0.6886
nh$long 				8.176e-01  1.805e+00   0.453   0.6510
nh$totalpopulation:nh$total		7.043e-09  1.727e-08   0.408   0.6838
nh$totalpopulation:nh$machine	       -9.572e-06  7.865e-06  -1.217   0.2249
nh$total:nh$machine			1.604e-05  7.627e-05   0.210   0.8337
nh$lat:nh$long			       -1.649e-02  4.171e-02  -0.395   0.6929
nh$totalpopulation:nh$total:nh$machine -6.929e-09  1.727e-08  -0.401   0.6887
(Intercept)				  
nh$totalpopulation			  
nh$total				  
nh$machine			       *  
nh$unemploymentrate		       *  
nh$percentholdingbachelorsdegree       * * *
nh$lat					  
nh$long 				  
nh$totalpopulation:nh$total		  
nh$totalpopulation:nh$machine		  
nh$total:nh$machine			  
nh$lat:nh$long				  
nh$totalpopulation:nh$total:nh$machine	  
---
Signif. codes:	0 `*  *  *' 0.001 `*  *' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Residual standard error: 0.1159 on 209 degrees of freedom
  (37 observations deleted due to missingness)
Multiple R-Squared: 0.3802,	Adjusted R-squared: 0.3446 
F-statistic: 10.68 on 12 and 209 DF,  p-value: <2.2e-16>anova(model4)
> anova(model4)
Analysis of Variance Table

Response: nh$delta
					Df  Sum Sq Mean Sq F value    Pr(>F)
nh$totalpopulation			 1 0.11603 0.11603  8.6443  0.003650
nh$total				 1 0.00695 0.00695  0.5177  0.472632
nh$machine				 1 0.36967 0.36967 27.5398 3.769e-07
nh$unemploymentrate			 1 0.14791 0.14791 11.0191  0.001064
nh$percentholdingbachelorsdegree	 1 0.62402 0.62402 46.4883 9.718e-11
nh$lat					 1 0.00133 0.00133  0.0992  0.753048
nh$long 				 1 0.37698 0.37698 28.0843 2.940e-07
nh$totalpopulation:nh$total		 1 0.00209 0.00209  0.1559  0.693390
nh$totalpopulation:nh$machine		 1 0.07083 0.07083  5.2769  0.022601
nh$total:nh$machine			 1 0.00024 0.00024  0.0182  0.892720
nh$lat:nh$long				 1 0.00241 0.00241  0.1795  0.672213
nh$totalpopulation:nh$total:nh$machine	 1 0.00216 0.00216  0.1610  0.688670
Residuals			       209 2.80545 0.01342		    
					  
nh$totalpopulation		       * * 
nh$total				  
nh$machine			       * * *
nh$unemploymentrate		       * * 
nh$percentholdingbachelorsdegree       * * *
nh$lat					  
nh$long 			       * * *
nh$totalpopulation:nh$total		  
nh$totalpopulation:nh$machine	       *  
nh$total:nh$machine			  
nh$lat:nh$long				  
nh$totalpopulation:nh$total:nh$machine	  
Residuals				  
---
Signif. codes:	0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
by continuation (continuation pretzel ouvaton point org) on Wed Jan 16th, 2008 at 03:49:47 AM EST
[ Parent ]
Thanks. This means the following: there are three statistically significant effects. In order of decreasing significance:
  • percentage holding bachelor's degrees - with a coefficient in favour of Obama of 46±6 e-4. Assuming the percentage is expressed from 0 to 100 and not from 0 to 1, each 10% increase in the proportion of people with bachelor's degrees results in a 4.6%±0.6% change in the vote percentage difference in favour of Obama. This is consistent with the fact that this variable explains 22% of the variance as per the ANOVA table, which would not be the case assuming the variable is expressed from 0 to 1. This is an extremely significant effect.

  • machines - the presence of machines results in an 8.8%±3.5% swing towards Clinton. This effect is significant to 98% and explains 13% of the variance.

  • unemployment rate - with a coefficient in favour of Obama of 5±2 e-4. That is, a 10% increase in the unemployment rate translates into a 0.5%±0.2% change in the vote percentage difference in favour of Obama. This is significant to 95% and explains 5% of the variance.

Note that it statistical significance and explanatory power correlate, but it is possible to have statistically significant coefficients not explaining much, and it is possible to have coefficients not significantly different from zero esplaining large fractions of the variance.

We have met the enemy, and he is us — Pogo
by Carrie (migeru at eurotrib dot com) on Wed Jan 16th, 2008 at 05:57:10 AM EST
[ Parent ]
Did anyone run gender?
by the stormy present (stormypresent aaaaaaat gmail etc) on Wed Jan 16th, 2008 at 06:27:03 AM EST
[ Parent ]
Apparently that one is missing, as is race. At least they're missing from continuation's regression.

We have met the enemy, and he is us — Pogo
by Carrie (migeru at eurotrib dot com) on Wed Jan 16th, 2008 at 06:50:37 AM EST
[ Parent ]
From what I've read elsewhere, it seems there was a significant female swing toward Clinton in NH.  The assumption in leaving it out of these regressions may be that gender wouldn't correlate with geographical distribution, but I'm not sure that would turn out to be true.

Race may not be a huge factor in NH because my anecdotal impression (I've never been there) is that it's pretty darn overwhelmingly white.

by the stormy present (stormypresent aaaaaaat gmail etc) on Wed Jan 16th, 2008 at 07:24:03 AM EST
[ Parent ]
The CNN exit polls showed a 57% female/43% male distribution, as pointed out by Dataguy in his BooTrib diary.

That won't necessarily show up on census data, will it? Though it is worth getting the data just in case a 1% shift in the gender ratio from town to town actually explains something.

We have met the enemy, and he is us — Pogo

by Carrie (migeru at eurotrib dot com) on Wed Jan 16th, 2008 at 07:29:28 AM EST
[ Parent ]
It is not unusual for the gender balance in small towns, rural areas and inner cities to skew in one direction or another.  I'm not sure about NH, though.  But it could also be interesting to see if the exit polls break down gender turnout by region, i.e. to see if urban women were more likely to vote than rural or small-town women, or something like that.
by the stormy present (stormypresent aaaaaaat gmail etc) on Wed Jan 16th, 2008 at 07:33:46 AM EST
[ Parent ]
I have one nagging concern with all this.

Can you do a correlation matrix of the predictor variables?

We have met the enemy, and he is us — Pogo

by Carrie (migeru at eurotrib dot com) on Wed Jan 16th, 2008 at 06:49:38 AM EST
[ Parent ]

Display:

Occasional Series