When A Crowd Isn’t a Crowd
Diversity is essential to animating the collective intelligence that emerges in models like the MATLAB contest, but the existence of diversity isn’t enough. It must also be maintained. Get enough people together—be it in a bar or a chat room—and a mysterious dynamic kicks in. People either accentuate their differences and polarize into opposing camps, or they downplay their differences altogether in order to reach a consensus. Both phenomena have the same net effect: the diversity within the crowd is diminished. Humans have evolved over many millennia into highly social creatures. In many circumstances, our ability to reach an amicable agreement meant the difference between life and death: “A mammoth is charging. Shall we run or poke him with our spears?” But when collective intelligence is in play, as it is in such crowdsourcing models as information markets and problem-solving networks, consensus is an undesirable outcome.
In 2004 James Surowiecki published The Wisdom of Crowds: Why the Many Are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. The title of Surowiecki’s book is a winking reference to Charles Mackay’s 1841 classic, Extraordinary Popular Delusions & the Madness of Crowds , a stern indictment of the herd mentality that lead to such disasters as the Dutch tulip mania. (Needless to say, the term crowdsourcing owes a debt to both authors.) While theories of group intelligence pre-dated Surowiecki’s book by decades, and in fact had recently come back into vogue in fields as disparate as sociology and business management, The Wisdom of Crowds captured the popular imagination in a way no other work on the subject ever had. The book contained an array of persuasive examples in which the crowd proved itself wiser than its smartest member. How did a crowd of fairgoers in rural England guess the weight of a prize-winning steer within one pound? How did a classroom of students guess the number of jelly beans in the jelly bean jar? How did the audience for the game show Who Wants to Be a Millionaire consistently beat the experts? Through the wisdom of crowds. Such anecdotes have acquired an almost magical patina, entering the collective imagination and becoming fodder for cocktail conversation and water cooler discussion. Unfortunately they were shorn of Surowiecki’s careful analysis.
In fact there’s no magic to the wisdom of crowds, and the expression itself is a bit misleading. In these examples the crowd was neither wise nor even functioning as a crowd, per se. A crowd implies a group of people acting as a unit, as in “the crowd broke through the barrier and descended upon the author in a fit of hysteria.” Okay, authors don’t generally inspire that degree of unchecked adoration, but you get the idea. The definition of “crowd” is “a group of people united by a common characteristic.” By contrast, collective intelligence is diminished by too many common characteristics. It flourishes in direct proportion to the amount of diversity contained within a group of people, and their ability to express their individual viewpoints. In order to be wise, so to speak, the crowd can’t act like a crowd at all.
There are other conditions that must be met for diversity to trump ability: First, it must be a real pickle of a problem. No one needs a diverse group of individuals to help them tie their shoes. Next, the crowd must have some qualifications to solve the problem at hand. A random collection of subway commuters could hardly be expected to outperform a group of nuclear engineers at designing a more efficient reactor—even Page’s brown socks were pulled from a faculty lounge, not the phone book. There must also be some method of aggregating and processing each individual’s contribution, such as the MATLAB contest’s scoring and ranking engine. But finally, participants must be drawn from a large enough pool to guarantee a diverse array of approaches and their ability to express their individuality—their “local knowledge”—must not be impaired.
Keeping all this in mind, let’s revisit some of those examples that seemed so counter-intuitive at first glance. Take the case of the jellybean jar. Any more or less random collection of students will differ in background and experience, and thus possess different types of “private information” or local knowledge. The aggregation mechanism, in this case, is simply the teacher’s ability to collect all the estimates and calculate an average. But crucially, the students are asked to write down their guesses without conferring with their neighbors, so they are able to think and act independently. (Gulley’s MATLAB contestants don’t confer so much as they steal from one another. Their relative isolation allows them to retain their diversity.)
Now let’s look again at a game show audience’s ability to accurately predict over 90 percent of the answers. On the game show in question, Who Wants to Be a Millionaire, contestants are asked a series of fifteen questions of increasing difficulty. If they answer all fifteen correctly they win $1 million. The questions are multiple-choice format, with four possible answers. When they get stuck, contestants ask for a “lifeline.” This means either calling a friend—presumably chosen for their encyclopedic knowledge—for help, or polling the audience. The “experts” perform admirably, getting the correct answer 65 percent of the time. But the audience does far better, guessing the correct answer 91 percent of the time.
This seems deeply impressive. It’s far better than all but the very best contestants do, and it would seem to provide ample evidence that the group is smarter than its smartest individual. But it’s actually just a function of simplest arithmetic, an illustration that if even a tiny number of individuals possess the correct answer, the group itself will predict accurately. This, Page writes, is because “the mistakes cancel one another out, and correct answers, like cream, rise to the surface .” This can be easily illustrated. Say the question posed to the audience, to use a real example from the show, is whether “Sherpas” and “Gurkhas” are native to (A) Nepal; (B) Morocco; (C) Equador; or (D) Russia. If only four percent of the audience knows that the correct answer is (A) Nepal, the rest of the audience can be expected to guess randomly among all four answers. The result is that Morocco, Equador and Russia will all receive 24 percent of the guesses, but Nepal will receive 28 percent, cluing our contestant to the correct answer.
And of course, there’s a big difference between guessing trivia questions and improving an algorithm by 1000 degrees of magnitude. The latter isn’t merely impressive—it defies belief. But in fact the same circumstances—diversity and the proper conditions in which to express it—are at work in both examples. On its face MATLAB would seem to attract mostly Mensa-quality programmers. In other words it’s a group that has self-selected according to its proficiency at solving such problems. A studio audience, on the other hand, is a randomly selected group.
It is certainly true that many of the most talented MATLAB programmers participate in the contest. But the best coders have generally all learned the same tricks and shortcuts from years of using the MATLAB computer language. It’s the inexperienced coders—the outsiders who have to come up with their own shortcuts—that make possible the giant cognitive leaps that allow the winning solution to improve on the initial solution by so many degrees of magnitude. If great minds think alike—and in many circumstances they do—then they really constitute only one mind. Or as Page puts it, “two heads aren’t better than one when it’s really only one head.” A diverse group of solvers results in many different approaches to a problem. How they apply this to real-world problems—far more involved than guiding a salesman through a set of cities—is the subject of our next chapter.


Good post, Jeff. But I'm not sure I agree on the difference in importance between the MatLab-case and the AskTheAudience-case. While it would seem that the former can create differences of a much greater magnitude, the latter clearly delivers a usable level of difference/accuracy at much greater speeds. What I miss here is a discussion about how this could be harnessed and utilized. Some of the existing crowdsourcing-plays clearly does this, I'm thinking in particular about cases where human agents have been used to verify images or aid in scanning results. We're often overly impressed by huge improvements, and miss out on the fact that incremental innovation is the real lifesaver for companies.
Hmm, just thought of something. Couldn't Toyota's impressive innovation policy be read as a kind of internal crowdsourcing? See http://www.newyorker.com/talk/financial/2008/05/12/080512ta_talk_surowiecki
Posted by: | May 14, 2008 at 12:15 AM
As a long time MATLAB programmer and fan, let me first say I appreciate your using that example. I think there *is* a big big difference. I could write down a mathematical model of the jellybean or millionaire example in 5 minutes (they are mostly monte carlo type randomized algorithms with some assumptions that will drive reasonable convergence properties), but the Matlab one is a complex social phenomenon. I haven't participated in the contests, but basically the same thing happened to me on a project team, where I spent a couple of days coding a solution to a problem, which seemed perfect, but didn't work because of an obscure bug. When I finally gave up and quit, my friend and project team mate spent just an hour or so on my code and found the bug I'd missed. It was a single misplaced apostrophe :) The code ran beautifully after that.
Modeling the 'many eyeballs/shallow bugs' is hard. There may be things hidden there that get to the 'smarter in practice than in theory' phenomenon. But in general, the explanation usually given for 'unreasonable effectiveness' effects (such as in neural networks) is a combination of fundamental properties (NNs are universal function/patter recognizers) along with the 'future is like the past' statistical assumption (or similar ergodicity conditions). This works for crowds too.
And btw, you're going to have to work harder to get comments :) My current ratio on my blog is hovering around 2.5 per post for about 80 posts, and I know I put in a LOT of thought and effort to drive up the commenting culture of my readers.
Posted by: Venkat | May 14, 2008 at 03:41 AM
We've been talking a lot recently with Tommi Vilkamo, the head of Nokia Beta Labs, about in what situations prediction markets actually work. I am sure there will be a large number of executives wanting to try this on something it might not suite. I'd like to therefore point out some of our thoughts.
There are "problems" where the solution information is too unevenly spread within the population for prediction to work. In such situations, an "open market" approach produces wrong conclusions, since it will be biased with the opinion that is available to majority, but which does not represent the whole picture. If a piece of critical information is only available to a handful, the prediction will be biased but narrowing the "crowd" too much on the other hand would lose its crowd-characteristic.
For example, imagine a company wanting to predict which of four possible new product concepts will sell most. In such a situation, if the consumer is let to predict, the result will only reflect mass opinion (e.g. "what would be coolest"). What if certain product has a major flaw that is only known to the development team? Even if it was a big company and the prediction was done in-house, the key information might be too unevenly spread.
Another thing that I would like to point out about crowd-problem-solving is, that when the quality of a solution can be instantly measured - as in the MATLAB-example - having a crowd to work on it is extremely powerful. Instead, if the delay of the feedback on a given possible improvement would be significant in contrast to the time span of the whole problem, the crowd loses much of its efficiency. In such cases it would be difficult to identify the current best candidate, which then would spread the resources on different solutions. Also, trial-and-error would not function, so that would leave out much of the script kiddies contributions ;)
Brilliant blog, will drop by frequently!
Posted by: Ilkka Peltola | May 20, 2008 at 04:48 AM
Another thing that I would like to point out about crowd-problem-solving is, that when the quality of a solution can be instantly measured - as in the MATLAB-example - having a crowd to work on it is extremely powerful. Instead, if the delay of the feedback on a given possible improvement would be significant in contrast to the time span of the whole problem, the crowd loses much of its efficiency. In such cases it would be difficult to identify the current best candidate, which then would spread the resources on different solutions. Also, trial-and-error would not function, so that would leave out much of the script kiddies contributions ;)
Posted by: kraloyun | December 07, 2009 at 09:31 AM