Researchers look for novel patterns in information

New York TimesNovember 17, 2013 

— David Soloff is recruiting an army of “hyperdata” collectors.

The company he co-founded, Premise, created a smartphone application now used by 700 people in 25 developing countries. Using guidance from Soloff and his co-workers, these people, mostly college students and homemakers, photograph food and goods in public markets.

By analyzing the photos of prices and the placement of everyday items like piles of tomatoes and bottles of shampoo and matching that to other data, Premise is building a real-time inflation index to sell to companies and Wall Street traders, who are hungry for insightful data.

“Within five years, I’d like to have 3,000 or 4,000 people doing this,” said Soloff, who is also Premise’s chief executive. “It’s a useful global inflation monitor, a way of looking at food security, or a way a manufacturer can judge what kind of shelf space he is getting.”

Collecting data from all sorts of odd places and analyzing it much faster than was possible even a couple of years ago has become one of the hottest areas of the technology industry. The idea is simple: With all that processing power and a little creativity, researchers should be able to find novel patterns and relationships among different kinds of information.

For the last few years, insiders have been calling this sort of analysis Big Data. Now Big Data is evolving, becoming more “hyper” and including all sorts of sources. Startups like Premise and ClearStory Data, as well as larger companies like General Electric, are getting into the act.

A picture of a pile of tomatoes in Asia may not lead anyone to a great conclusion other than how tasty those tomatoes may or may not look. But connect pictures of food piles around the world to weather forecasts and rainfall totals, and you have meaningful information that people like stockbrokers or buyers for grocery chains could use.

And the faster that happens, the better, so people can make smart – and quick – decisions.

“Hyperdata comes to you on the spot, and you can analyze it and act on it on the spot,” said Bernt Wahl, an industry fellow at the Center for Entrepreneurship and Technology at the University of California, Berkeley. “It will be in regular business soon, with everyone predicting and acting the way Amazon instantaneously changes its prices around.”

Tracking onion prices

Standard statistics might project next summer’s ice cream sales. The aim of people working on newer Big Data systems is to collect seemingly unconnected information like today’s heat and cloud cover, and a hometown team’s victory over the weekend, compare that with past weather and sports outcomes, and figure out how much mint-chip ice cream mothers would buy today.

At least, that is the hope, and there are early signs it could work. Premise claims to have spotted broad national inflation in India months ahead of the government by looking at onion prices in a couple of markets.

Premise’s subscribers include Wall Street hedge funds and Procter & Gamble, a company known for using lots of data. None of them would comment for this article. Subscriptions to the service range from $1,500 to more than $15,000 a month, though there is also a version that offers free data to schools and nonprofit groups.

The new Big Data connections are also benefiting from the increasing amount of public information that is available. According to research from the McKinsey Global Institute, 40 national governments now offer data on matters like population and land use. The U.S. government alone has 90,000 sets of open data.

“There is over $3 trillion of potential benefit from open government economic data, from things like price transparency, competition and benchmarking,” said Michael Chui, one of the authors of the McKinsey report. “Sometimes you have to be careful of the quality, but it is valuable.”

Real-time data crunching

That government data can be matched with sensors on smartphones, jet engines, even bicycle stations, that are uploading data from across the physical world into the supercomputers of cloud computing systems.

Until a few years ago, much government and private data could not be collected particularly fast or well. It was expensive to get and hard to load into computers. As sensor prices have dropped, however, and things like Wi-Fi have enabled connectivity, that has changed.

In the world of computer hardware, in-memory computing, an advance that allows data to be crunched without being stored in a different location, has increased computing speeds immensely. That has allowed for some real time data crunching.

‘Unstructured data’

Traditional data analysis was built on looking at regular information, like payroll stubs, that could be loaded into the regular rows and columns of a spreadsheet. With the explosion of the Web, however, companies like Google, Facebook and Yahoo were faced with unprecedented volumes of “unstructured” data, like how people cruised the Web or comments they made to their friends.

New hardware and software also have been created that sharply cut the time it takes to analyze this information, fetching it as fast as an iPhone fetches a song.

ClearStory Data, a startup in Palo Alto, Calif., has introduced a product that can look at data on the fly from various sources. With ClearStory, data on movie ticket sales, for example, might be mixed with information on weather, even Twitter messages, and presented as a shifting bar chart or a map, depending on what the customer is trying to figure out.

The trick, said Sharmila Shahani-Mulligan, ClearStory’s co-founder and chief executive, was developing a way to quickly and accurately find all of the data sources available. Another was figuring out how to present data on, say, typical weather in a community, in a way that was useful.

“That way,” Shahani-Mulligan said, “a coffee shop can tell if customers will drink Red Bull or hot chocolate.”

News & Observer is pleased to provide this opportunity to share information, experiences and observations about what's in the news. Some of the comments may be reprinted elsewhere in the site or in the newspaper. We encourage lively, open debate on the issues of the day, and ask that you refrain from profanity, hate speech, personal comments and remarks that are off point. Thank you for taking the time to offer your thoughts.

Commenting FAQs | Terms of Service