Paul Gilster
Jimmy Wales, the man responsible for the Wikipedia collaborative encyclopedia, is talking about building a better search engine.
Wales' company, Wikia (
www.wikia.com), uses many of Wikipedia's user-involved techniques to host publishing sites on a wide range of topics. What he seems to have in mind is a search engine whose output is continually refined by user participation, just as Wikipedia theoretically gets better as users spot and correct errors.
Just how this will play out is unclear, because Wikipedia remains controversial. Unlike the Encyclopedia Britannica, which draws on the work of experts, Wikipedia is open to contributions from anyone on the Net.
These days collaboration is all the buzz, and finding the next big Internet play invariably involves search technology mixing with the user input that defines Web 2.0 sites. The premise of the vaguely defined Web 2.0 is to create value by combining nonproprietary information from many different sources.
Whatever Wales comes up with, we seem to be taking steps toward what Web creator Tim Berners-Lee calls a "semantic Web."
That's an exotic term for something we'd all like to see: a way to get quick, targeted answers without having to sift through reams of results.
Google already takes a step in that direction with its Page Rank technology. Ask Google for information and it finds a huge number of Web pages, then ranks them based on how many other sites link to them, while weighing how popular each linking site appears to be.
So Google is already mining user activity in choosing its results, but the problem is that when you search Google, you can't always assume that the answers at the top of the first results page are the best. Google is only so smart.
In a semantic Web, though, the search tool can extract data more meaningfully, giving you a precise answer rather than a boatload of documents.
Danny Hillis is trying to make this happen.
The founder of Thinking Machines, Hillis has a new company, Metaweb, that is building Freebase, an ambitious attempt to combine the best aspects of search engines with the more structured world of databases. To shake out the concept, the company has imported into Freebase much of Wikipedia, music and restaurant data, and a huge amount of census data and location information. They are refining their software en route to building what Hillis has called "the world's database."
Here's where collaboration comes in.
When it's up and running, anyone will be able to add information to Freebase. Textual or visual data will be only a start. As we're seeing with Web 2.0 sites such as Flickr (
www.flickr.com), users will tag their information, describing what it is and how it relates to other information. These 'meta' tags are where the database concept comes in. Tagged information can be searched more quickly and accurately, allowing a small but highly targeted list of results. You spend less time refining a search query and let the machines do the work.
The tagging process relies on users making their own connections between the data they find with the help of Freebase software. A user might look up an entry that has been imported from Wikipedia. Freebase will categorize the item in broad terms: a person, location, company, etc. That categorization allows it to provide a list of structured items that need to be filled out to add value to the entry. Think of these as database fields.
As people use the entry, they fill in these fields -- address, date of birth, geographical coordinates or whatever fits the type of information involved -- tagging the item in ways that enhance its value and make it more accurately searchable.
Some people see this as a breakthrough. Publisher Tim O'Reilly describes it as "building the synapses for the global brain."
Whether this is accurate or another round of futurist hype might take years to determine. Publicly gathered information sources such as Wikipedia still wrestle with problems of accuracy because of their lack of editorial control.
If Freebase is a global brain, what bad ideas might it think up, and how will it purge erroneous data?
There's no question that opening up Web information and connecting it carries powerful potential.
Think of the "mashups" of mapping data and real estate listings that draw value from separate, publicly available resources (
www.housingmaps.com). They are just one example of integrated content from multiple sources.
It's a trend we are only beginning to exploit. Adding database-style tags to the mix may indeed boost search capabilities to the point where getting a single, straight answer from a computer becomes the norm.
Get $150+ in coupons in every Sunday N&O. Click here for convenient home delivery.