/* ---- Google Analytics Code Below */

Wednesday, January 14, 2009

The Numerati

Just got to Stephen Baker's 2008 book: The Numerati. Several people had pointed me to it. It is an overview of the advance of numerical methods since WWII. How improvements in the availability of data, power of computers and new methods have led to remarkable results. Key too are the number of sensors that have revealed data from people. The Web is the prime example. These results have also led to some scary and wrong results. For example, one person interviewed had been a major modeler for Enron and could not predict it's fall. Today he is head of a modeling group for IBM.

I was educated in this field and a practitioner during the latter half of this progression, so the stories told of its history were well known. Baker does not provide much detail and does not know the math, data and computing issues behind the problems. Sometimes the difficulties involved are understated. So there is little hard substance in the book. Yet as the people interviewed discussed their current work I was reminded of similar efforts in my own firm. So I enjoyed the book and I think any practitioner in this area will as well.

I was once told that for every equation you include in a book you will lose 20% of your audience. This book will lose none of its audience that way. Still it is a good view of the progress and the dangers of having models that predict things. And especially when we attempt to predict what people can or will do.

He ends the book being cautious and describing the complexity of the world and how there are aspects of that complexity we have not figured out how to model. That is an appropriate place to leave a book like this. We have made much progress, it has value, but it does not mean we are or ever can be done. There is danger in thinking we can do more than we are ready to.

See also the book's blog.

Update: Baker has given a talk on the book that has been shown a number of times on Book-TV/Cspan2, which I saw today (1/26). Good overview of the book, covering much of the same landscape. He emphasizes his lack of modeling knowledge up front. He makes fun of a person he interviewed from the NSA when that person said that more data is always better. In fact that is the case, the key design issue is what you include or do not include in a model. You may exclude data and factors from a model, but not even knowing about external factors at all is dangerous. In that sense the NSA answer is correct.

Applying the well known 'searching for your keys under the streetlamp' story that he mentions in the book and in the talk. Having more data is like extending the extent of the streetlamp, increasing the liklihood that you will find your keys. Keeping it narrow means that you may find something that just looks like your keys, but will not serve for starting your car.

No comments: