Huge groupings of information Big Data are changing the way we look at countless problems. Im looking at an unexpected offshoot of the Wikileaks controversy, in which Julian Assanges website released documents galore from all kinds of classified resources. Programmers going to work on this suddenly public information have now extracted dates and locations from 77,000 incident reports involved in the war in Afghanistan, creating a map of the violence. The project took one night, and the remarkable thing is that based solely on the model created here, the researchers could predict ensuing military events with uncanny accuracy.
The method was tested against the events of 2010 and proved accurate even in the relatively quiet northern provinces, where data points were few. What we are seeing is like a foretaste of Isaac Asimovs psychohistory, described in his Foundation novels as a way of analyzing and predicting future events through a combination of history, sociology and statistics. Big data combines our unprecedented and growing ability to store information with increases in computing power. The result: Were tackling problems that have always seemed beyond our reach with statistics and quantitative analysis, and its even happening on our home PCs.
One early player in all this is Google. The company has already indexed something on the order of 4 percent of all the books ever printed between 1800 and 2000, and has released a database containing every word in this library. You would think a word like television probably didnt appear until the first sets were being developed, but Googles database ( books.google.com/ngrams) can find instances of the word appearing before 1900, with sustained use beginning in the early 1920s. Play around with the site and youll find it a source of endless fascination. You can plug in multiple words and chart their usage against each other.
Watch the Big Data trend carefully as you look for business openings. One thing thats bound to happen is the conjunction of the smartphone with ever-increasing storage and onboard camera technology. So-called lifelogs are much in the air among futurists. Theyre the result of next-generation equipment the kind of thing well routinely be carrying in a few years that records not just where you are but what you saw and what youve heard. Imagine the uses of technology like this in keeping track of your own habits, flagging the places where youre spending too much, and helping you recall places and names you might have forgotten.
Right now, Big Data is being used to produce curious and somewhat unsettling results. A Stanford professor named Jure Leskovec tracks data on Web behavior, using social networks like Facebook not to keep up with friends and family but as goldmines of statistical information. Leskovec has discovered that the right methods can predict which contacts users will add as friends on the site a method thats already accurate in about half the cases hes studied. His study of messaging (using Microsoft Instant Messenger) has uncovered how widely spaced users are (six degrees of separation is just about right), with implications for making the Internet more efficient by learning how to produce the shortest path between any two computers.
But if you want to take the trend to where it really gets powerful, consider that other researchers at Stanford have developed the first software simulation of an entire organism. Its only a single-cell bacterium, but modeling it involves 525 genes and the interactions of 28 categories of molecules, taking us down to the fundamental building blocks of cellular life. Computational biology takes Big Data in the direction of computerized experiments that can model and test game-changing solutions to lifes worst problems: diseases like Alzheimers and cancer.
Were only at the beginning of this trend, but when people voluntarily give up their own data think social networks they help to generate statistical models that everyone from law enforcement to human resources will consult to predict future behaviors. Next time you send a tweet, remember that youre adding to the data storehouse (Cornell scientists are already studying Twitter usage) and ponder how business will put Big Data to work in the future.
Paul A. Gilster is the author of several books on technology. Reach him at firstname.lastname@example.org.