Peter Brantley -- December 18th, 2011

I should be holiday shopping, but instead I have been thinking about something called linked open data. It’s not an entirely insane use of my time, as I have to consider a day-long session at ALA Midwinter in Dallas, “Libraries, Linked Data, and the Semantic Web” in which I am supposed to declaim meaningfully alongside colleagues who know quite a bit more about this than I do.

Fortunately, linked open data, or “LOD”, is a relatively simple concept. It refers to the practice of presenting, or “publishing” data held in a database or information repository in a normalized and structured way. This is often done in a formal syntax known as Resource Description Framework (RDF), but it need not be. For a basic example, consider the book “Thinking, fast and slow” by Daniel Kahneman. A linked data approach to this title would produce statements representing bibliographic information as relationships: “Title is ‘Thinking, fast and slow’”, and “Author is ‘Daniel Kahneman’”. If this data was represented in RDF, it would look something like this output from Open Library. (Open Library will produce RDF for any title in its catalog; you can play with similar entries to your heart’s content).

One essential component of linked data approaches is making sure that you publish hooks with which to identify the thing you are talking about, that other people can use as a reference. For books, fortunately, publishers and libraries have had these for decades: they are usually ISBNs, or alternatively, some sort of library catalog identifier. At Open Library, we have unique identifiers both for works and their unique “manifestations”, which are the editions of any given book: for example, you can see our pointer to the first edition of Kahneman’s work using the identifier: OL24896701M.

The cool thing about linked data for libraries and publishers is what happens when people associate new information with books, particularly where they can map relationships with other data using connectors such as ISBNs or some other identifier. There’s not much value in simply recording catalog entries for books in RDF and leaving them to sit in glorified, geeky XML syntax more suitable for computers than humans. The key to the power of linked data is the “O” in “LOD”: Open. When linked data statements about books are open, other people can knit together skeins of associations between books that were not possible before.

That’s why the British Library is publishing their catalog in LOD; why the European Digital Library, Europeana, has committed to linked open data; and why OCLC has just published a faceted subject heading scheme based on the Library of Congress’ LCSH in a linked data format.

The blogger Library Loon has an excellent post on how linking associations between books can be so compelling, in an approach known as relationship modeling. It’s a way of creating a conceptual graph, where one thing can point to one or more other things with which it has some kind of connection.

Imagine, for example, that J. Random Cataloger is cataloging Mozart’s Don Giovanni and José Zorrilla’s Don Juan Tenorio. Imagine then that Golden Age Spanish literature is not J. Random’s specialty, such that he does not know that the original is Tirso de Molina’s play El burlador de Sevilla. How is he to associate the Mozart and Zorrilla works … without considerable literary sleuthing?

… The opportunity comes with the open-world assumption: J. Random won’t know the filiation of the many adaptations of El burlador (whyever should he?), but the Loon (mostly) does—and once the Loon publishes that portion of graph in RDA terms, every library catalog everywhere can take advantage of it. Relationship modeling, in other words, only need be done once …

That’s the brilliance of linked data. It’s very different power than a simple linking of terms across books, such as what the startup Small Demons currently provides. Small Demons permits readers to browse terms across a wide range of books: every book in which the neighborhood “Greenwich Village” is mentioned, for example; it’s a brilliant exposition of the powers of structured data. But only a LOD approach to relationship mapping between books can supercharge browsing and book recommending algorithms so they include information on characteristics shared by books at a higher meta-level. (Small Demons is moving toward this kind of mapping as well).

Here’s the best part: in an open platform, anyone can document otherwise disguised relationships like the one that knits Mozart’s and Zorrilla’s operas together. In turn, the openness of the data enables the creation of new services — and new businesses — based on associations between books, and between books and a universe of other data described as linked open data. LOD can leverage the expertise and insights that readers have to build a more sophisticated understanding of our culture. Consider it tagging on steroids.

And therein lies the problem, too. Because the opportunity to keep this data open, for all to use, is one that we can’t let slip by. That’s true whether you are a librarian, publisher, or entrepreneur. If a major retailer like Amazon is the only one to permit readers to produce and consume this kind of linked data about books — and other media as well — then it is “linked closed data” that is only available for Amazon to use, or to control its use through web services that benefit Amazon.

We have to make this a public investment, perhaps one provided through the Digital Public Library of American, so that all of us can benefit from these insights. Not just academic librarians, or online retailers: but you and me, seeing and making connections, contributing to our understanding of the world around us. That’s the new power of open.

  1. Laer Carroll

    The openness is a strength, but it has a weakness – the same one Wikipedia and the like suffer from – garbage entries, erroneous entries, and malicious entries.

    Wikipedia does an OK (not great) job of whacking bad stuff. Will LOD manage the same difficult feat?

  2. Nick Radcliffe

    This is good stuff, Peter.

    You might be interested in some related ideas getting kicked around at Fluidinfo. Open book metadata is one focus there, as described in various posts including:

    The things that’s interesting about Fluidinfo (in this context) is that it’s an open database that anyone can add data to using their own tags. The focus so far has been mainly on works, but it is equally applicable to individual editions. So for example, the work George Orwell’s Nineteen Eighty four has an object identified by the tag

    book:nineteen eighty four (george orwell)

    If you use a modern browser (Chrome, Safari or Firefox will all work) you can see the data in the system about that book by going to:

    (And if you sign up for an account, you can add your own data to the object too.)

    The nice thing about this is without XML and RDF and all that stuff, real people can actually augment and share data about books, and the system is intrinsically open. (People can control who can see their tags, but the objects—in this case the books—are all open, and anyone can add more.) And there’s an API, a query language and more.

    If you want to get a quick feel for the system, this YouTube video

    shows how it works in the related area of music.

    Do get in touch if this is interesting.

