Actual Analysis blog: 2010

Friday, December 17, 2010

The future belongs to the Swiss Federal Institute of Technology

The Swiss Federal Institute of Technology in Zurich has a plan to simulate the entire world. According to this report the plan is to "gather data about the planet in unheard of detail, use it to simulate the behaviour of entire economies and then to predict and prevent crises from emerging."

Back when I was being trained in research, this type of project was always held up to us as an example of what not to do. Unfortunately, many people today believe that if you collect massive amounts of data the Truth will emerge from it. Dream on.

There are may problems with this approach. One of the chief ones is that the process is entirely inductive. Relations are identified in data from the past, and then extrapolated to the future. These relationships may hold in the future, or they may not. And if they do hold in the future, they may not hold forever.

I often think that it would help decision-makers if they spent some time handicapping horse races. Believe me, if you see that horses on the rail have been winning all week, that is no guarantee that they are going to win for you today. There is no reason they should.

The basic problem here is the lack of a theoretical approach. Ordinarily you start with a theory, or at least a hypothesis, about how something in the world works and then you collect the data necessary to establish whether your theory or hypothesis stands up to empirical test. If you confirm the theory you then try to modify it to increase its explanatory power. What the Swiss Federal Institute of Technology is trying to do is to get the data to think up the theory for them. But data don't have brains. The theories they come up with are going to lead you down a lot of paths that go nowhere. Any dataset is full of relationships, many — and sometimes all — of which are spurious, products of random variation or of systematic bias. Spurious relationships are not a foundation of successful forecasting.

Another important problem is that typically the relationships you find between the variables of the type that are going to be collected for this project are weak. Most of the variation in them is due to the effects of other variables that you usually don't have measures of. That means that predictions from your model will be at best only grossly approximate. Among other things, the developers of this model of everything want to predict economic bubbles and collapses. However, since these predictions are almost certain to be only grossly approximate, they will offer little guide to policy. If you remember last winter, you remember the extremes governments went to after an H1N1 pandemic was predicted, and how unnecessarily expensive (and ineffective) they were.

But the European Union is sinking a billion euros into this venture. I'd wish them good luck, but I'm confident that even with the best luck possible this project is going to fail, and fail miserably.

Link to the first in a series of related articles at the main site

Friday, December 3, 2010

Gross national happiness

Canada is apparently considering joining the group of countries that assess their gross national happiness. Happiness is one of those concepts that has always interested me because a) so many people think it's extremely important, and b) so few people even attempt to define it. Love is a similar concept.

In fact the definition of gross national happiness is vague. The project seems chiefly to be an attempt to link population characteristics to feelings of well-being. Why you'd want to do that is a mystery to me. Sure, they've found that countries with low rates of infant mortality have happier citizens, but surely we don't justify fighting infant mortality as necessary to keep the public happy.

Similarly, an assessment of the adequacy of a country's economy has been proposed as an indicator of happiness, but if people are happy with an unsound economy are we to take that as a Good Thing? That approach doesn't seem to have worked too well in the United States, where many people were astonishingly proud of their economy until it came crashing down a few years ago.

I'm certain — or at least I'd like to be certain — that our government doesn't intend the assessment of happiness to be a guide to policy. If that's their intent, they should logically end up doing things like legalizing marijuana — that makes lots of people happy. If they don't intend it to be a guide to policy (and there's no good reason they should), there's no good reason to assess national happiness at all.

Gross national happiness © 2010, John FitzGerald
Actual Analysis website

Thursday, November 11, 2010

Lightning, Lotteries, and Probability

People often claim you have more chance of being struck by lightning than of winning the lottery. The argument appears to be that one Canadian in 5 million is struck by lightning every year, while your chances of winning the standard 6/49 lottery are about one in 14 million, and one in 5 million is a higher probability than one in 14 million. However, this reasoning is unsound.

The problem is that these two probabilities are not comparable. The estimate of the probability of being hit by lightning is an empirical one, derived from observation, and applies to an entire year's worth of thunderstorms. The estimate of the probability of winning the lottery is a mathematical one, derived from a formula which applies to a single drawing of the lottery.

We could derive from the first estimate the probability of being struck by lightning at the time the lottery number is drawn, which would provide a fairer comparison (and one which would favour the lottery), but the more important issue is why we would want to do that. The frequency of an event relative to electrocution by lightning is not a standard of worth. For example, the probability that an individual Canadian will become prime minister in the next year is lower then the probability that he or she will be struck by lightning, but no one would conclude that that difference in probabilities tells us anything about the value of the Canadian political system.

More articles at the Actual Analysis site

Tuesday, November 9, 2010

Mayor of all Toronto except part of it

In yesterday's post I came up with some hypotheses about the vote in the recent Toronto mayoral election. Since then I've refined them a bit and tested them.

I simplified them by reducing the independent variables to two – section of the city and household income, and by hypothesizing only about the vote for the winner, Rob Ford. Hypothesizing about all three major candidates just complicates analysis, and examination of the effects of the independent variables on their votes could be done post hoc to elucidate the effects on Mr. Ford's vote.

I had originally planned to analyze the results by subdivision, but that increased the power of the statistical test so much that almost any difference would have been statistically significant. So I analyzed the results by ward; that decision gave me a nice little sample of 44.

Income was defined as the quartile in which median household income in the ward fell. The sections of the city were the outer suburbs (those wards for whom the city limits were part of their land boundaries), the inner suburbs (other wards outside the old City of Toronto as it was before amalgamation in 1998), east Toronto (roughly the old City of Toronto east of Yonge St.), and west Toronto (roughly the old City of Toronto west of Yonge St.).

So my new null hypotheses were that Mr. Ford's vote would be affected by neither of the independent variables. I was hoping, though, that they'd be affected the section of the city but not by income. Specifically, I was hoping his vote would be highest in the outer suburbs,

Mr. Ford's vote was not correlated with the total vote in a ward (r = .25; p > .05), so I didn't correct for differences in the number of votes (if they had been correlated, I would have removed the effect of total votes with regression analysis and analyzed the residual vote).

My hopes were dashed. A two-way analysis found that Mr. Ford did do best in the outer suburbs, but not significantly better than in the inner suburbs. The big difference was between the pre-1998 City of Toronto and the rest of the current city. Mr. Ford won 31% of the vote in the old City of Toronto, and 59% elsewhere.

This analysis also found a weak effect of income, but further analysis suggested this was an artefact of random variation in the number of votes cast. Analysis of the residual vote I described earlier found no differences related to median household income.

Analysis of Mr. Smitherman's and Mr. Pantalone's votes confirmed they were the candidates of the pre-1998 City of Toronto. They did better there (and Mr. Smitherman did better only in east Toronto). Ward income was not related to the votes they received.

In general, then, different sections of the city voted differently but income had little if anything to do with the results. Mr. Smitherman, the chief competitor for Mr. Ford, failed to appeal outside the oldest part of the city. Perhaps another popular explanation of the results is correct – Mr. Ford just ran by far the best campaign.

Monday, November 8, 2010

Mayoral strongholds

Torontonians seem to have concluded about their recent mayoral election that the winner was the candidate of the suburbs. I thought a little more detail might help. Here we will look at the wards in which his support, and the support for the other two major candidates, was the strongest.

I did some exploratory analysis examining the percentages each candidate won of the vote in subdivisions, then confirmed it with sorts of the percentages of votes cast in each ward. I came up with four hypotheses I will be testing further:

1. Mr. Ford's support was strongest on the outskirts of the city. His support was strongest in wards 1, 2, 4, 31, and 49, all of which are pretty far from City Hall. All have the city limits as a boundary. Mr. Ford won 67% or more of the vote in these wards.

2. Joe Pantalone was the candidate of the west end of the old city of Toronto. His strongholds -- wards 14, 17, 18, and 19 -- clustered together in the west end. Mr. Pantalone took 20% of more of the vote in these wards.

3. George Smitherman was the candidate of money. Mr. Smitherman had strong support in both Forest Hill (wards 21 and 22) and Rosedale (ward 28).

4 Mr. Smitherman was also the candidate of the east end of the old City of Toronto. His support was strong in wards 30 and 32, which lie side by side along the eastern harbour and the lake. Mr. Smitherman took 50% or more of the vote in the wards in which he was strongest.

As I said, these are just hypotheses so far. I'll be souping up my data file, and then I'll be testing these hypotheses. More soon.

Actual Analysis website

Tuesday, November 2, 2010

Religion and mayoral choice in ward 26

People have been speculating about the effect of religion in the Toronto mayoral elections a week ago. The idea is that members of some religions would be less likely to vote for George Smithermen, who is gay and married to another man.

We saw in the last post that voters in Jewish neighbourhoods in Ward 21 were in fact most likely to vote for Mr. Smitherman instead of the other candidates. In this post we'll look at a Muslim neighbourhood, Thorncliffe Park in Ward 26.

As it turned out, Mr. Smitherman did finish second in the polls in Thorncliffe Park. However, he finished frst in the rest of the ward. He received 33% of the vote in Thorncliffe Park, and 44% in the rest of the ward. A powerful chi-square test finds this difference to be significant, while the weaker median test I described in the last post doesn't. However, the powerful test estimated an infinitesimal probability that the dfference was random, and the weak test estimated that the probability was less than .09, so I'm considering this difference statistically sgnificant.

However, of the eleven percentage points that went missing for Mr. Smitherman in Thorncliffe Park, Mr. Ford picked up only four. Most of the vote Mr. Smitherman lost went to three candidates with Muslim names, none of whom, however, made an issue of their being Muslim. One was an anti-poverty advocate, another a civil-rights advocate (and not the kind that thinks civil rights mean other people should shut up about their -- the advocate's -- religion), and one has campaigned before as an anti-unemployment candidate. They could simply have been taking a greater part in the public life of Thorncliffe Park than the other candidates.

As I concluded before, if religion affected the mayoral vote, it was probably weakly, and in interaction with other variables.

Main Actual Analysis site

Saturday, October 30, 2010

Income and mayoral choice in Ward 21

In the previous post about the municipal elections in Toronto I concluded that Joe Mihevc's support in his successful campaign to be re-elected as councillor for Ward 21 was not affected by his involvement in the St. Clair streetcar right-of-way controversy. Instead it seemed that he was getting his support from the less affluent parts of the ward (many of which were most affected by the right-of-way), while his chief opponent, Shimmy Posen, was getting his from the more affluent areas.

There has been some talk about the results of the mayoral race reflecting increased disparities in income, so I thought I'd look at how income was related to the mayoral vote in Ward 21. My hypothesis was that Rob Ford, the successful candidate for mayor, would have won most of the subdivisions Mr. Posen did, and few of the subdivisions Mr. Mihevc won.

Mr. Ford did win a significantly^* larger proportion of the mayoral vote in subdivisions won by Mr. Posen than in subdivisions won by Mr. Mihevc -- 44% in Mr. Posen's subdivisions and 29% in Mr. Mihevc's. However, in the subdivisions won by Mr. Posen, Mr Ford was outpolled by another candidate, George Smitherman. Mr. Ford received 1,844 votes in those subdivisions, while Mr. Smitherman received 1,942.

Obviously Mr. Ford appealed to affluent voters more than he did to less affluent voters. However, the results in Ward 21 suggest his victory was not chiefly due to a main effect of income. Income probably had its effect in combination with some other variable.

^* I didn't use the raw vote counts for the statistical test, since almost any difference would have been significant. Instead I used a median test comparing the percentages of te vote Mr. Ford won in each subdivision (p < .01).

Actual Analysis

Thursday, October 28, 2010

A streetcar named Doesn't Matter

I have just downloaded summaries of last Monday's Toronto municipal elections from the City of Toronto's open data site. A contentious issue in the ward, and a controversial issue throughout the city, has been the renovation of the streetcar line along St. Clair Avenue West. The incumbent councillor, Joe Mihevc, was blamed by many for problems with the renovation. Although he was re-elected, I wanted to see if the renovation had affected where he got his support.

Only two candidates, Mr. Mihevc and Shimmy Posen, won any of the polling subdivisions; Mr. Mihevc won 20 and Mr. Posen 10. My idea was that if the streetcar-line renovation had affected his support, Mr. Mihevc would have drawn his support from polling subdivisions away from St. Clair Ave.

On the map of the ward below, subdivisions won by Mr. Mihevc are shown in red and those won by Mr. Posen in blue. Subdivisions outside the ward are in grey. St. Clair Avenue is marked by the black lines extending beyond the borders of the ward.

Clearly, Mr. Mihevc's support was strong along St. Clair Avenue. The variable that chiefly determined support was income, with Mr. Posen's strength almost entirely in the affluent neighbourhoods north of Nordheimer/Cedarvale Ravine, and Mr. Mihevc's chiefly in the south. However, Mr. Mihevc won some well-off subdivisions near St. Clair West as well. Despite all the problems created by the renovation of the streetcar line, problems which were raised by the successful mayoral candidate at an all-candidates' meeting in the heart of Mihevc territory just before the election, St. Clair West remained part of Joe Mihevc's stronghold.