Thursday, November 11, 2010

Lightning, Lotteries, and Probability

People often claim you have more chance of being struck by lightning than of winning the lottery. The argument appears to be that one Canadian in 5 million is struck by lightning every year, while your chances of winning the standard 6/49 lottery are about one in 14 million, and one in 5 million is a higher probability than one in 14 million. However, this reasoning is unsound.

The problem is that these two probabilities are not comparable. The estimate of the probability of being hit by lightning is an empirical one, derived from observation, and applies to an entire year's worth of thunderstorms. The estimate of the probability of winning the lottery is a mathematical one, derived from a formula which applies to a single drawing of the lottery.

We could derive from the first estimate the probability of being struck by lightning at the time the lottery number is drawn, which would provide a fairer comparison (and one which would favour the lottery), but the more important issue is why we would want to do that. The frequency of an event relative to electrocution by lightning is not a standard of worth. For example, the probability that an individual Canadian will become prime minister in the next year is lower then the probability that he or she will be struck by lightning, but no one would conclude that that difference in probabilities tells us anything about the value of the Canadian political system.

More articles at the Actual Analysis site

Lightning, Lotteries, and Probability © 2001, John FitzGerald

Tuesday, November 9, 2010

Mayor of all Toronto except part of it

In yesterday's post I came up with some hypotheses about the vote in the recent Toronto mayoral election. Since then I've refined them a bit and tested them.

I simplified them by reducing the independent variables to two – section of the city and household income, and by hypothesizing only about the vote for the winner, Rob Ford. Hypothesizing about all three major candidates just complicates analysis, and examination of the effects of the independent variables on their votes could be done post hoc to elucidate the effects on Mr. Ford's vote.

I had originally planned to analyze the results by subdivision, but that increased the power of the statistical test so much that almost any difference would have been statistically significant. So I analyzed the results by ward; that decision gave me a nice little sample of 44.

Income was defined as the quartile in which median household income in the ward fell. The sections of the city were the outer suburbs (those wards for whom the city limits were part of their land boundaries), the inner suburbs (other wards outside the old City of Toronto as it was before amalgamation in 1998), east Toronto (roughly the old City of Toronto east of Yonge St.), and west Toronto (roughly the old City of Toronto west of Yonge St.).

So my new null hypotheses were that Mr. Ford's vote would be affected by neither of the independent variables. I was hoping, though, that they'd be affected the section of the city but not by income. Specifically, I was hoping his vote would be highest in the outer suburbs,

Mr. Ford's vote was not correlated with the total vote in a ward (r = .25; p > .05), so I didn't correct for differences in the number of votes (if they had been correlated, I would have removed the effect of total votes with regression analysis and analyzed the residual vote).

My hopes were dashed. A two-way analysis found that Mr. Ford did do best in the outer suburbs, but not significantly better than in the inner suburbs. The big difference was between the pre-1998 City of Toronto and the rest of the current city. Mr. Ford won 31% of the vote in the old City of Toronto, and 59% elsewhere.

This analysis also found a weak effect of income, but further analysis suggested this was an artefact of random variation in the number of votes cast. Analysis of the residual vote I described earlier found no differences related to median household income.

Analysis of Mr. Smitherman's and Mr. Pantalone's votes confirmed they were the candidates of the pre-1998 City of Toronto. They did better there (and Mr. Smitherman did better only in east Toronto). Ward income was not related to the votes they received.

In general, then, different sections of the city voted differently but income had little if anything to do with the results. Mr. Smitherman, the chief competitor for Mr. Ford, failed to appeal outside the oldest part of the city. Perhaps another popular explanation of the results is correct – Mr. Ford just ran by far the best campaign.

Monday, November 8, 2010

Mayoral strongholds

Torontonians seem to have concluded about their recent mayoral election that the winner was the candidate of the suburbs. I thought a little more detail might help. Here we will look at the wards in which his support, and the support for the other two major candidates, was the strongest.

I did some exploratory analysis examining the percentages each candidate won of the vote in subdivisions, then confirmed it with sorts of the percentages of votes cast in each ward. I came up with four hypotheses I will be testing further:

1. Mr. Ford's support was strongest on the outskirts of the city. His support was strongest in wards 1, 2, 4, 31, and 49, all of which are pretty far from City Hall. All have the city limits as a boundary. Mr. Ford won 67% or more of the vote in these wards.

2. Joe Pantalone was the candidate of the west end of the old city of Toronto. His strongholds -- wards 14, 17, 18, and 19 -- clustered together in the west end. Mr. Pantalone took 20% of more of the vote in these wards.

3. George Smitherman was the candidate of money. Mr. Smitherman had strong support in both Forest Hill (wards 21 and 22) and Rosedale (ward 28).

4 Mr. Smitherman was also the candidate of the east end of the old City of Toronto. His support was strong in wards 30 and 32, which lie side by side along the eastern harbour and the lake. Mr. Smitherman took 50% or more of the vote in the wards in which he was strongest.

As I said, these are just hypotheses so far. I'll be souping up my data file, and then I'll be testing these hypotheses. More soon.

Actual Analysis website

Tuesday, November 2, 2010

Religion and mayoral choice in ward 26

People have been speculating about the effect of religion in the Toronto mayoral elections a week ago. The idea is that members of some religions would be less likely to vote for George Smithermen, who is gay and married to another man.

We saw in the last post that voters in Jewish neighbourhoods in Ward 21 were in fact most likely to vote for Mr. Smitherman instead of the other candidates. In this post we'll look at a Muslim neighbourhood, Thorncliffe Park in Ward 26.

As it turned out, Mr. Smitherman did finish second in the polls in Thorncliffe Park. However, he finished frst in the rest of the ward. He received 33% of the vote in Thorncliffe Park, and 44% in the rest of the ward. A powerful chi-square test finds this difference to be significant, while the weaker median test I described in the last post doesn't. However, the powerful test estimated an infinitesimal probability that the dfference was random, and the weak test estimated that the probability was less than .09, so I'm considering this difference statistically sgnificant.

However, of the eleven percentage points that went missing for Mr. Smitherman in Thorncliffe Park, Mr. Ford picked up only four. Most of the vote Mr. Smitherman lost went to three candidates with Muslim names, none of whom, however, made an issue of their being Muslim. One was an anti-poverty advocate, another a civil-rights advocate (and not the kind that thinks civil rights mean other people should shut up about their -- the advocate's -- religion), and one has campaigned before as an anti-unemployment candidate. They could simply have been taking a greater part in the public life of Thorncliffe Park than the other candidates.

As I concluded before, if religion affected the mayoral vote, it was probably weakly, and in interaction with other variables.

Main Actual Analysis site