One of the most “wow!” things that I saw at the Books in Browsers conference at the Internet Archive last week was a demonstration by Sameer Verma of San Francisco State University of a full fledged OPDS BookServer called Pathagar running from a SheevaPlug “wall wart.” Wall warts are small transformer sized computers running a LAMP stack, enabling widely distributed Internet computing. A SheevaPlug Pathagar BookServer can serve up to 500 users, permitting locations with scarce network and electrical capacity to have access to over 20,000 ebooks in a single device.
This is an astounding project – originally designed to work with One Laptop Per Child networks, but made more generic – enabling the placement of digital libraries in the most remote corners of the globe. It also serves to highlight one of the greatest and most complex debates of the moment: whether digital libraries should be distributed and replicated, or aggregated in massive online databases — and what we lose by choosing one architecture over another.
The SheevaPlug Bookserver gets books closest to those who will use them. In areas with minimal networking, or where privacy matters, and the choice of reading materials may have immediate ramifications for liberty and survival, there are compelling reasons to get libraries down to the smallest, socially cohesive level. In many parts of the world that would be a village; in other societies, individualism makes the notion of walking around with all the books in the world in a single handheld device the ultimate distributed library.
The growing capacity of handheld storage demonstrates that distribution need not imply scarcity. At the Internet Archive, Brewster Kahle advocates replicated digital libraries, and has been seen waving a portable 4 TB drive loaded with tens of thousands of books in PDF format as a functioning portable library. Plug the drive into most off-the-shelf PCs or Macs and the machine will automatically index the contents.
Distributed content might be an appealing vision even at the scale of DPLA, where a wide range of digital treasures reside at local public libraries, archives, and musuems. These disparate sources can be aggregated in a metadata index for discovery, with access policies supporting the curated replication of subsets in any number of locations. For each community library, its own locally relevant digital library.
Yet generally, the trajectory of digital libraries is toward big tent visions, where as much content as possible is aggregated in one database. There are engineering motivations for this approach: search and discovery are facilitated through maximum control of indexing; the costs of metadata normalization across heterogeneous collections is mitigated; more sophisticated but not necessarily more complex functionality can be offered; and retrieval algorithms are vastly simplified.
Beyond that, large aggregations permit multiplier effects through additive services that work at scale, including social recommending, semantic applications that apply thesauri and ontologies to repositories, and the ability to form attachments to other network-based collections. While some network affordances could be associated with smaller, distributed library caches, the initiation cost and management of information resource attachment or linkage can be significant, and is arguably best achieved at the highest possible level of aggregation. This is the motivation for the partnership between DPLA and Europeana – global services are hard to knit together at the level of community.
Moreover, western technologists often forget that most of the world accesses the Internet through mobile devices. As Mary Meeker reminds us in her 2011 report, the “megatrend of the 21st Century” will be the empowerment of people via connected mobile devices. Over 70 percent of India’s population has a mobile phone – over 800 million people with wireless subscriptions. In this light, only the most severe political, social, or economic repression argues for preferential local access of digital libraries if they are available on a global scale. When revolts against despotic regimes are plotted via twitter and Facebook, and witness photographs are posted to Flickr!, there’s little reason to curate local collections of materials that live in a stand-apart library – we live in an ever increasingly connected world.
This is a complex debate, and it may well be that at the next “local maximum” point of our network evolution, we will once again re-distribute network resources to a more atomic level. At this time, we’re still still learning how to work together globally, and if we are lucky, we may be able to next learn how to bring the world to the village.
