Paul Gilster
I'm looking at an edition of Henry James' "An International Episode." It's a library copy containing handsome illustrations interleaved with the text. But it's not a physical book.
I'm reading it online with the help of the Open Library, a project to digitize hundreds of thousands of books. Competing directly with Google Print, the Open Library (
www.openlibrary.org) is less controversial, more exciting in its implications and uses superior software.
The benefits of digitizing old books are clear enough -- computer tools make the library experience useful in ways that extend the power of the printed word. The aforementioned Google Print (print.google.com) gives an idea what I mean. I was reading "Moby Dick" and couldn't recall the significance of a minor character, but a search in the online version pulled up exactly what I needed.
It's wonderful to search the text of a book at will, and the database is a big one: Google just announced that it had added the full text of more than 10,000 books that are no longer under copyright into its collection. These volumes come from university libraries at Michigan, Harvard, Stanford and the New York Public Library.
But Google Print has become quite controversial. The reason is that publishers (and some authors) are concerned about infringements of copyright law, because Google has chosen an "opt-out" model, meaning that if you are a publisher, you have to specifically request to have your book excluded or you may just find it digitized and available online, even if only in fragments. The argument has slowed Google's plans for digitization.
The Open Library works with a different model. For one thing, it's not building its digital library within the purview of a single company, which must be reckoned a good thing. Google Print can deliver books to your screen, but they are searchable only through Google's tools. What we need are databases that interconnect rather than library holdings that depend upon a single corporate sponsor.
Open Library at present focuses on public domain books. For the longer term, the project eyes older books where the copyright is still in force but the copyright owner cannot be found. These orphaned books are hard to find outside libraries and would offer a robust addition to a digital archive. As to copyrighted works, Open Library will publish only those whose publishers explicitly join the program.
Behind the Open Library is the Open Content Alliance, founded in October by Yahoo and including Microsoft, Hewlett-Packard, the Smithsonian Institution, the National Archives of the United Kingdom, and a variety of library and university institutions. A key player is Brewster Kahle's Internet Archive. The visionary Kahle is intent on storing high-definition images of book pages that become digital volumes that can be printed from any location.
Only a handful of books are on display at the Open Library site, but that will change swiftly. Working at the University of Toronto, the Internet Archive has already scanned about 2000 volumes, and Yahoo contribution will cover 18,000 books from the University of California system, while Microsoft's MSN Search has committed to supporting 150,000 books over the course of the next year.
Open Library's software is elegant -- page through the Henry James title mentioned earlier and see what I mean about the quality of the scanning. The project offers a glimpse of future networked book devices connected to massive library holdings for research and pleasure reading.
I hope we will begin to see working relationships develop between the Open Content Alliance and Google, Amazon, and other book digitization projects. We need to get lost older volumes back into viable circulation even as we sort out how to digitize new books while protecting the legitimate rights of their authors. These are the first steps in building a digital library worthy of the name.