Well, Watson beat the human champions in the first game of the Jeopardy! face off between man and machine, with a score of $35,734 to $10,400 for Brad Rutter and $4,800  for Ken Jennings. But Watson’s developers were puzzled by his flub in the Final Jeopardy! segment. The category was US Cities, and the answer was:  “Its largest airport was named for a World War II hero; its second largest, for a World War II battle.”  The two human contestants wrote “What is Chicago?” for its O’Hare and Midway, but Watson’s response was a lame “What is Toronto???”

How could the machine have been so wrong? David Ferrucci, the manager of the Watson project at IBM Research, explained during a  viewing of the show on Monday morning that several things probably confused Watson. First, the category names on Jeopardy! are tricky. The answers often do not exactly fit the category. Watson, in his training phase,  learned that categories only weakly suggest the kind of answer that is expected, and, therefore, the machine downgrades their significance.  The way the language was parsed provided an advantage for the humans and a disadvantage for Watson, as well. “What US city” wasn’t in the question. If it had been, Watson would have given US cities much more weight as it searched for the answer. Adding to the confusion for Watson, there are cities named Toronto in the United States and the Toronto in Canada has an American League baseball team. It probably picked up those facts from the written material it has digested. Also, the machine didn’t find much evidence to connect either city’s airport to World War II. (Chicago was a very close second on Watson’s list of possible answers.) So this is just one of those situations that’s a snap for a reasonably knowledgeable human but a true brain teaser for the machine.

The mistake actually encouraged Ferrucci. “It’s goodness,” he said. Watson knew it did not know that right answer with any confidence. Its confidence level was about 30%. So it was right about that. Moreover, Watson has learned how the categories work in Jeopardy! It understands some of the subtleties of the game, and it doesn’t make simplistic assumptions. Think about how Watson could be used in medicine, as a diagnostic aid. A patient may describe to a doctor a certain symptom or a high level of pain, which, on the surface, may seem to be an important clue to the cause of the ailment. But Watson may know from looking at a lot of data that that symptom or pain isn’t the key piece of evidence, and could alert the doctor to be aware of other factors.

(By the way, there are many fields where Watson could help out. IBM general counsel Robert Weber describes how Watson might be used in the legal profession in a guest blog posting on The National Law Journal Web site. Anne K. Altman, general manager, Global Public Sector, talks about how Watson could be helpful to government in a posting on Government Technology magazine’s blog.)

Another encouraging sign: Watson bet intelligently, just $947, so it still won the game by a wide margin. “That’s smart,” Ferrucci said. “You’re in the middle of the contest. Hold onto your money. Why take a risk?”

Watson may not have much of a sense of humor, but Ferrucci sure does. He wore a Toronto Blue Jays jacket to the Jeopardy! viewing.

February 16, 2011
3:02 am

After watching Watson answer the questions on the board as it did, I am awestruck. I am a junior studying computer science at a California State University, and after watching this performance am very compelled to apply for an internship at IBM. In an Artificial Intelligence class I took last semester, we discussed several issues with natural language processing. Seeing IBM overcome these hurdles with such speed and accuracy amazes me. Even when it misses a question, seeing why it missed is just as interesting to me as when it is right.

Deep Blue was an amazing project, but after seeing the performance of IBM this time around with Watson, the bar is set to an all new high. I would be honored to be a part of the development team in any way, shape, and/or form. This week in mid February 2011 is history in the making, I’ll be sure to tell my kids about this when IBM unveils their next great achievement.

Posted by: Garrison Reeves IV

February 16, 2011
2:30 am

Haaaa!…(Sorry, growing up in Chicago, that is classic!)…and Sorry IBM (& my Dad used to work 4 you guys building main frames in the 70’s) ;P

Posted by: MaicohSasha

February 16, 2011
2:18 am

February 16, 2011
2:10 am

Who the heck cares about who wins and who loses? I certainly dont, nor do I care about how fast the button gets pushed. This moment is a milestone in computing/programmingand those who worked long and hard at the task understand that the goal was to (parse) more or less have a computer make sense out of human speech. They won! Whether or not Watson rules the day on Jeopardy is secondary…all it has to do is put up a reasonably good fight.

Posted by: srbond

February 16, 2011
1:23 am

Several points:

1. ‘Watson’ doesn’t choose answers. Hundreds of programmers worked on dozens of algorithmic routines that prioritize searches and grade correlation probabilities. Although those threads come to a conclusion, it’s actually the work of all those human programmers’ code working in concert.

2. The programmer who worked on the ‘category’ routine didn’t properly distinguish concrete labels (U.S. Cities) from abstract puns or euphemisms. The routine had plenty of time to prioritize all U.S. city characteristics, but didn’t.

3. The links in comments don’t provide any concrete information on when Watson starts processing. It might take a human 2-3 seconds to read the visual clue and another 3-5 seconds for it to be read aloud. If the computer gets an instantaneous direct text feed of the ‘answer’, it has a 5-8 second ‘head start’ on the humans. Since we don’t see any visualization of Watson arranging priorities, I suspect he has a systemic time advantage.

4. Fine, the computer has to ‘physically push’ a button, but that’s done with a servo that responds to an electric impulse in microseconds. It takes a human at least a few milliseconds to move their thumb. The same applies to the ‘enable’ signal, which is almost instantaneous for the computer, but takes a few microseconds for humans. There ought to be a ‘perception lag’ factor to equalize computer and human response times.

5. Whoever wrote the Final Jeopardy routine did a good job. Given the standings, Watson could have bet $1.00, but $940 entailed absolutely no risk.

6. It is an excellent ‘outing’ for the programming team. If I recall correctly, Kasparov beat Big Blue several times before the last chess match defeat. It’s also a good opportunity to familiarize an audience with computer technology. BUT … they should be talking to *programmers*, not executives.

Posted by: Westmiller

February 16, 2011
12:26 am

@Mike. That’s a good point, but Watson doesn’t consider categories in specific. Instead, it considers categories in general. What I mean is that IBM did not stuff the computer with all the information it would need and then throw countless sample questions at it like the CYC project does. Although to be honest, there was a significant amount of stuffing of facts and examples into it.

Instead, Watson attempts to catch the multifaceted nuances of natural language and develop an ‘understanding’ of the context and use all the information that it has in order to develop a meaningful answer. Experience has shown the Watson team that categories (in general) may not be a good indicator of the answer.

What’s interesting is not the failure to give the category the proper weight as the main clue, but the idea of how it came up with its incorrect answer. Unlike previous attempts at A.I., Watson is not a parrot with a series of tricks in order to make the viewer think that it’s cognizant. Instead of dictating to the system that categories about states mean that the answer is going to be a state would be more a show of brut force programming and not actual A.I. or machine learning.


Posted by: Lou

February 16, 2011
12:16 am

“Why would a baseball team name be more important than the number of airports that a town has, for instance?”

Don’t forget Toronto’s smaller airport was recently named for Billy Bishop. That does relate to the question. I wonder if Watson put a little extra weight on that because the name announcement would have been in the news within the last few years.

Posted by: K Stricker

February 15, 2011
11:47 pm

I’m surprised to hear you say that the category title has little bearing on the correct answers. I wonder, for instance, in the history of Jeopardy (or in Watson’s training matches) what fraction of the correct responses in the “U.S. cities” category (which is a fairly common category) have actually been cities in which the largest and most notable with that name was NOT in the U.S.? And of all those U.S. towns sharing the Toronto name, how many of them have two named airports, which was a significant part of the clue?

Today’s Double Jeopardy round and the first half of Tuesday contest were so brilliant, that it makes us wonder how it could ignore so many factual indicators to come up with Toronto. Why would a baseball team name be more important than the number of airports that a town has, for instance?

Also, does Watson take advantage of the extra time afforded contestants in Final Jeopardy to do more in-depth analysis and testing of its most-likely responses?

Posted by: Mike

February 15, 2011
11:21 pm

Regarding “daily double” hunting, it is Watson’s first strategy. From Watson Researcher Dr. Jon Lenchner’s post:

“The Watson Research team studied the historical distribution of Daily Doubles and found they appear most-frequently in the three bottom rows, with the fourth being the most common. Daily Doubles also most frequently appear in the first column. Watson also makes use of even more statistics to dynamically predict their location based on what has been exposed so far in a game.”


Posted by: cnay

February 15, 2011
11:19 pm

February 15, 2011
11:16 pm

@Pamir All contestants must wait until a signal is given to buzz in — after Trebek finishes reading the clue. Watson researcher Dr. David Gondek gives a good explanation here:

“Watson’s buzzing is not instantaneous. For some clues he may not complete the question answering computation in time to make the decision to buzz in. For all clues, even if he does have an answer and confidence ready in time, he still has to respond to the signal and physically depress the button.”


Posted by: cnay

February 15, 2011
10:39 pm

It was obvious that Watson won almost all of the questions due to button speed, both humans were clicking on over 90% of the questions. I saw perhaps two questions that the humans beat Watson to the button. Unfair in my mind still. They should have had a human at the post for Watson, clicking the button and reading Watson’s top answer. That would be much more fair.

Posted by: Jack

February 15, 2011
10:23 pm

Looks like Watson bets like a Black Jack player. Good job! I enjoyed the show tonight… sorry I missed last night.

Posted by: Michael Galaty

February 15, 2011
10:15 pm

Can anyone on the team answer does Watson use Solid State Drives and if not, how might performance might be enhanced if Watson did use SSD’s?

Posted by: John Harper

February 15, 2011
10:13 pm

I’m a little surprised people think Watson has any advantage over humans. First and foremost, it is deaf and blind. It must read a text file and come up with a solution in the time it takes Alex Trebek to read the “answer” to the question. It doesn’t come up with a solution “instantaneously” as has been suggested. Rather, an incredible amount of computation is taking place in that time. It then has an equal opportunity to click in to solve the problem. For some reason folks are suggesting crippling Watson’s answering ability in favor of it’s human competitors. I just don’t get that.

Posted by: John Harper

February 15, 2011
10:10 pm

Here’s how Watson knows what it knows…


Posted by: Steve Hamm

February 15, 2011
10:07 pm

Here’s a post on Watson’s wagering strategies from IBM researcher Gerald Tesauro:


Posted by: Steve Hamm

February 15, 2011
9:58 pm

Here’s some info on how Watson operates on the show from Dave Gondek, one of the top researchers.


Posted by: Steve Hamm

February 15, 2011
9:47 pm

I don’t care that Watson made a mistake in final Jeopardy!. The system is still amazing in what it can do. It’s also still winning. People make mistakes too, and I have certainly seen worse answers in final Jeopardy! from humans. Go Watson!

Posted by: CoreyTess

February 15, 2011
9:44 pm

Watson is smoking them in jeopardy. But what I would like to know is watson smarter then a 5th grader???

Posted by: Jroc

February 15, 2011
9:42 pm

@Jessica Naomi

Obviously the computer can deduce that with one question remaining in a certain category, that it must be the final question in that category. Certainly it doesn’t take a genius to realize this.

Posted by: reader

February 15, 2011
9:40 pm

Read Steve Baker’s book, Final Jeopardy, just published, to get the inside scoop on how Watson was developed and the game was structured.

Posted by: Steve Hamm

February 15, 2011
9:37 pm

The way Watson has learned from practice rounds is awesome. What was the most difficult part of creating Watson’s success: understanding the question? sifting to the right answer fast enough? learning from experience? Each of these is mind-boggling!

Posted by: John Holz

February 15, 2011
9:33 pm

Watson seems to be able to pars the probability of where the Daily Doubles are very well too. On Monday I believe Watson didn’t start at the lowest dollar amount, but went directly to a higher dollar amount which was a Daily Double. Tuesday night Watson picked both Daily Doubles. It really didn’t seem random. With its computing power, it looks like Watson has an unfair advantage here too.

Posted by: Pamir | Reiki Help Blog

February 15, 2011
9:31 pm

Awesome. Just awesome. People can say what they want and feel how they feel but no one can deny this is truly a technological advancement for the entire world.

Posted by: wesley atkins

February 15, 2011
9:31 pm

I’m trying to get somebody with expertise to answer the button speed question.

Posted by: Steve Hamm

February 15, 2011
9:26 pm

The button issue is troubling. I asked your Twitter account this question and since there hasn’t been a reply, I’ll post here too. There are questions both human contestants obviously know the answers to, heck I do too… It seemed very obvious on both nights that Watson has an edge on button speed. Did you design a control on this? Some kind of equalizer? If not, the whole exercise is pointless.

Posted by: Pamir | Reiki Help Blog

February 15, 2011
9:21 pm

These two pages have exactly one intersection: Chicago

Has the author of this article seen a game of JEOPARDY! played?

‘“What US city” wasn’t in the question.’ Um, the response is the question… they give answers as clues. Perhaps ‘This US city…’ would have been more appropriate? Just saying.

And as for “Adding to the confusion for Watson, there are cities named Toronto in the United States and the Toronto in Canada has an American League baseball team.”

Per this “reason,” Watson should have answered “Springfield.” 🙂

Posted by: jas

February 15, 2011
9:17 pm

Harris Bornstein. Watson has learned that the “Category” has minimal impact on the answer, therefore Watson put less weight on the “US Cities” category than the facts in answer.

Posted by: mark

February 15, 2011
9:11 pm

I’m unhappy seeing Watson constantly win the button on questions in which both humans are obviously trying to hit it as soon as the talking stops. The contest shouldn’t be about humans not being as good at buttons as a computer!

It’s too late now, but IMHO, the rules should have been changed to have a small window in which, if both a human and Watson hit the button, one was chosen at random.

Posted by: Marc Auslander

February 15, 2011
9:08 pm

Harris, in Final Jeopardy, players ALWAYS place their wagers after they have seen the question.

Posted by: Waldo

February 15, 2011
9:07 pm

@Harry: Watson’s bet was made before the question was revealed. What made you think it wasn’t? And why should it bet more? It knows it is playing by tournament rules, so its money will go into the next game. Why risk it?

Posted by: jpb

February 15, 2011
8:56 pm

Is there an input delay of the questions into Watson to mimic the time it takes the human contestants to read the questions? The computation speed and accuracy should be the test here, and advance (instantaneous) input seems unfair/nonsensical.

Posted by: Neema Amini

February 15, 2011
8:54 pm

This is bull****. Watson’s wager for final Jeopardy should have been made BEFORE the question was revealed, why/how could Watson only have risked $947 with such a large $ lead in a category that is based on facts (US Cities)? It should have bet much more, something’s fishy. Also how could it have answered Toronto if the clue was U.S. Cities? And thirdly, how can a human be expected to buzz in faster than a machine? I’m sure Ken and Brad knew all these answers it’s just that Watson is to friggin’ fast, he’s not any smarter (but he is pretty smart though).

Posted by: Harris Bornstein

February 15, 2011
8:23 pm

How did the computer know that a question was the last one in a category? How did the computer even pick a category? How did it “decide” what amount to wager in Daily Double? The computer’s “voice” seemed excited when it picked a higher dollar amount in a category. Didn’t the computer win because it “rang” in faster, how could the human reflex be faster than a computer circuit? This reminded me of that other show – 20Q but Hal Sparks was the computer’s voice. This seemed rigged to me.

Posted by: Jessica Naomi

February 15, 2011
8:01 pm

Perhaps Watson found Toronto because the Toronto Island Airport is now called the Billy Bishop airport after a World War I hero. Toronto international is Pearson, a Canadian diplomat, Nobel Peace prize winner and Prime Minister.

Posted by: Mary Penner

February 15, 2011
8:01 pm

Let it play Jeopardy but please don’t let it drive me to the airport.

Posted by: jpb

February 15, 2011
7:42 pm

