NEW YORK — It was a case for a digital Sherlock Holmes. Last fall, the city’s Department of Environmental Protection wanted, finally, to crack down on restaurants that were illegally dumping cooking oil into sewers in their neighborhoods – congealed yellow grease is responsible, the department says, for more than half of New York’s clogged drains. The question, of course, was how to find the culprits.
The antiquated answer would have been to have the health department send inspectors to restaurants on blocks with backed-up sewers and hope by chance to catch a busboy pouring the contents of a deep fryer into the street.
Enter the city’s Office of Policy and Strategic Planning, a geek squad of civic-minded number-crunchers working from a pair of cluttered cubicles across from City Hall in the Municipal Building. They dug up data from the Business Integrity Commission, an obscure city agency that among other tasks certifies that all local restaurants have a carting service to haul away their grease. With a few quick calculations, comparing restaurants that did not have a carter with geo-spatial data on the sewers, the team was able to hand inspectors a list of statistically likely suspects.
The result: a 95 percent success rate in tracking down the dumpers. With nothing grander than public data, the Case of the Grease-Clogged Sewers was solved.
Data – or Big Data, as quantitative analysts will call it – is the tool du jour for tech-savvy companies that have realized that lurking in the vast pools of unprocessed information in their networks are solutions to some of today’s most pressing and convoluted problems. A few years ago, Google, for example, took the 50 million most common keywords that Americans typed in search bars and tried to figure out, by comparing them with federal health statistics, where the H1N1 flu virus was to likely strike next.
According to a new book, “Big Data: A Revolution That Will Transform How We Live, Work and Think,” the enormous quantity of information whirling through the ether can affect and enhance our quality of life. As the authors put it, “The change of scale has led to a change of state.”
Now the city has brought this quantitative method to the exceedingly complicated machine that is New York. For the modest sum of $1 million, and at a moment when decreasing budgets have required increased efficiency, the in-house geek squad has over the last three years leveraged the power of computers to double the city’s hit rate in finding stores selling bootleg cigarettes; sped the removal of trees destroyed by Hurricane Sandy; and helped steer overburdened housing inspectors – working with more than 20,000 options – directly to lawbreaking buildings where catastrophic fires were likeliest to occur.
“I think of us as the Get Stuff Done Folks,” Michael Flowers who oversees the group, said. “All we do is take and process massive amounts of information and use it to do things more effectively.”
Before being hired in 2009 by John Feinblatt, the mayor’s chief policy adviser, Flowers didn’t know much about computer code – let alone Bayesian statistics. From 1999 to 2003, he worked at the Manhattan district attorney’s office, prosecuting homicides and drug crimes. When he left law enforcement, he moved to Washington, where he joined the power law firm Williams & Connolly and later took a job with the Senate Permanent Subcommittee on Investigations. Disenchanted by the smug homogeneity of Washington, Flowers leapt at the chance in 2005 to travel to Iraq with a team from the Justice Department to work on issues concerning mass graves and on Saddam Hussein’s trial.
Working with ‘the kids’
These days, Flowers, a relative amateur in data analytics, is the geek squad’s chief tactician and resident asker of questions.
He allows the half-dozen post-collegiate techies working under him to ferret out the answers and, at age 43, he refers to them endearingly as “the kids.” His office gives the impression of a high-tech startup – but without the cool furniture. Nick O’Brien, 30 and the team’s chief of staff, works standing at a lectern. Ben Dean, the 24-year-old chief analyst, sits on an ergonomic rubber ball.
One of the benefits that come from working with the informational atoms of the city is an almost molecular understanding of New York itself. What the city knows about its 8 million residents is staggering. Contained in public archives is information about their boilers and their sprinkler systems, the state of their local taxes, the number of heart attacks and fires that occur inside their buildings and whether they have ever logged complaints about roaches or construction noise. Additional data is gathered about their businesses, their commuting habits and children’s test scores.
“There’s a deep, deep relationship between New Yorkers and their government,” Flowers said, “and that relationship is captured in the data.”
In all, a terabyte of raw information – enough to fill nearly 143 million printed pages – passes daily through Flowers’ office, and his team’s first job, he said, was to get that information into a comprehensible form: to, in effect, create a lingua franca for the bureaucracy’s Tower of Babel.
One day this month, Flowers, in a military swag vest from Iraq, was in his office kicking around ideas for future projects.
His most ambitious plan was a proposal to move beyond public information into the deeper and possibly more profitable mine of social-media data. Every day, he said, there are 250,000 New York-centric posts on Twitter alone – some concerning trash complaints, others unsanitary restaurant conditions. “If Young & Rubicam can use tweets to sell you stuff,” he hypothetically asked, “why can’t the city use them to make you less sick?”
This makes civil libertarians uncomfortable, particularly at a time when the Police Department’s chief Big Data project – its use of the Compstat system to guide stop-and-frisk – is being questioned. Flowers insists that he has put in place safeguards, like keystroke logs on his employees’ computers, to ensure that information is not abused. Still, groups like the New York Civil Liberties Union say that they are watching public data mining with a guarded, if optimistic, eye.
“I think that the Bloomberg administration’s attention to data has enormous potential for good,” Donna Lieberman, the executive director of the union, said. “Obviously, it means that the city can make and tweak policies based on reality. But the potential for the selective use and release of data is one aspect that raises concern.”
Another, at least for Flowers, is whether his geek squad will survive the end of Bloomberg’s tech-friendly tenure. For now, he said, he is proceeding under the assumption that it will, adding that the best way to ensure its viability is to create an appetite among city agencies for the analytical work his group produces.
“We know that there will always be a Fire Department, a Finance Department, a Department of Buildings,” Flowers said. “So hopefully by building a common data infrastructure that shares information in real time, it won’t matter who sits in City Hall.”