Computers

'Big Data' is changing the way we look at problems

July 29, 2012 

Huge groupings of information – “Big Data” – are changing the way we look at countless problems. I’m looking at an unexpected offshoot of the Wikileaks controversy, in which Julian Assange’s website released documents galore from all kinds of classified resources. Programmers going to work on this suddenly public information have now extracted dates and locations from 77,000 incident reports involved in the war in Afghanistan, creating a map of the violence. The project took one night, and the remarkable thing is that based solely on the model created here, the researchers could predict ensuing military events with uncanny accuracy.

The method was tested against the events of 2010 and proved accurate even in the relatively quiet northern provinces, where data points were few. What we are seeing is like a foretaste of Isaac Asimov’s “psychohistory,” described in his “Foundation” novels as a way of analyzing and predicting future events through a combination of history, sociology and statistics. Big data combines our unprecedented and growing ability to store information with increases in computing power. The result: We’re tackling problems that have always seemed beyond our reach with statistics and quantitative analysis, and it’s even happening on our home PCs.

One early player in all this is Google. The company has already indexed something on the order of 4 percent of all the books ever printed between 1800 and 2000, and has released a database containing every word in this library. You would think a word like “television” probably didn’t appear until the first sets were being developed, but Google’s database ( books.google.com/ngrams) can find instances of the word appearing before 1900, with sustained use beginning in the early 1920s. Play around with the site and you’ll find it a source of endless fascination. You can plug in multiple words and chart their usage against each other.

Business openings

Watch the Big Data trend carefully as you look for business openings. One thing that’s bound to happen is the conjunction of the smartphone with ever-increasing storage and onboard camera technology. So-called “lifelogs” are much in the air among futurists. They’re the result of next-generation equipment – the kind of thing we’ll routinely be carrying in a few years – that records not just where you are but what you saw and what you’ve heard. Imagine the uses of technology like this in keeping track of your own habits, flagging the places where you’re spending too much, and helping you recall places and names you might have forgotten.

Right now, Big Data is being used to produce curious and somewhat unsettling results. A Stanford professor named Jure Leskovec tracks data on Web behavior, using social networks like Facebook not to keep up with friends and family but as goldmines of statistical information. Leskovec has discovered that the right methods can predict which contacts users will add as “friends” on the site – a method that’s already accurate in about half the cases he’s studied. His study of messaging (using Microsoft Instant Messenger) has uncovered how widely spaced users are (six degrees of separation is just about right), with implications for making the Internet more efficient by learning how to produce the shortest path between any two computers.

Game-changing solutions

But if you want to take the trend to where it really gets powerful, consider that other researchers at Stanford have developed the first software simulation of an entire organism. It’s only a single-cell bacterium, but modeling it involves 525 genes and the interactions of 28 categories of molecules, taking us down to the fundamental building blocks of cellular life. Computational biology takes Big Data in the direction of computerized experiments that can model and test game-changing solutions to life’s worst problems: diseases like Alzheimer’s and cancer.

We’re only at the beginning of this trend, but when people voluntarily give up their own data – think social networks – they help to generate statistical models that everyone from law enforcement to human resources will consult to predict future behaviors. Next time you send a tweet, remember that you’re adding to the data storehouse (Cornell scientists are already studying Twitter usage) and ponder how business will put Big Data to work in the future.

Paul A. Gilster is the author of several books on technology. Reach him at gilster@mindspring.com.

News & Observer is pleased to provide this opportunity to share information, experiences and observations about what's in the news. Some of the comments may be reprinted elsewhere in the site or in the newspaper. We encourage lively, open debate on the issues of the day, and ask that you refrain from profanity, hate speech, personal comments and remarks that are off point. Thank you for taking the time to offer your thoughts.

Commenting FAQs | Terms of Service