Standardizing bookmarks

Peter Brantley -- November 2nd, 2011

Earlier this year, the National Information Standards Organization (NISO) received funding from the Andrew W. Mellon Foundation for two meetings, coordinated with the Internet Archive, that would encourage standards discussions around bookmarks and annotations. The aim of the meetings was to bring together as many stakeholders as possible – social reading startups, standards groups like the IDPF and EDItEUR, academic and research initiatives, and large ebook retailers – to discuss the challenges in creating standard-format, portable bookmarks and annotations.

As we described in the grant application materials, “The ability to accurately refer to a specific location within a digital text is fundamental for bookmarking and annotations in a digital environment. For both casual readers as well as professional and academic researchers, such pointers must be recognized across reading systems to enable social uses of books, articles and grey literature that range from personal memory aids to citations and critical analysis, as well as deep inter-linking. At present, no standards exist in this space.”

I was unable to attend our first meeting on October 10 at the Frankfurt Book Fair, but I did very much enjoy the gathering in San Francisco on October 26. (Once official meeting notes become available at NISO, I will provide an update). The meeting had a wide range of attendees representing groups like ReadSocial, Hypothes.is, ReadMill, findings, O’Reilly MEdia, IDPF, the Open Annotation Collaboration (OAC) out of Los Alamos, and several others. OAC, in particular, serves as a strong conceptual framework for thinking about annotation structures via RDF-based graph modeling.

After initial stage-setting presentations, followed by a round of micro-talks by some of the startups present, we literally rearranged our tables into a circle and discussed our priorities. One of the first and hardest decisions was to focus primarily on text-based materials as sources, recognizing that annotations themselves might contain a wide range of media. Bookmarking for text, video, and audio will require broadly similar but distinct efforts, and waiting to distill an elegant, media neutral approach is probably a spell of death for timely implementation. It seems a better approach to consider distinct source media classes as “flavors” of bookmarking.

We then quickly came to the agreement that a standard for bookmarking, as opposed to one for annotations, was fundamental and antecedent to any succeeding standards framework for annotations. If we cannot agree on what it is we are commenting on – that is, if we cannot assure some reasonable success in location re-discovery – we will not be able to advance past the first square. In real world scenarios, figuring out how to identify specific locations in texts is agonizingly difficult. Although the IDPF has an exact location bookmark and citation standard proposal, the Canonical Fragment Identifier (CFI), with editors from Adobe, Google, Apple, and the DAISY Consortium, it will not work for the majority of cases which require fuzzy matching.

The ability to support fuzzy matching is critical for a wide range of uses. For example, readers should be able to bookmark reliably across different editions of a work, as long as the editions are reasonably similar. Similarly, if a work is updated via contributed errata or the incorporation of new material, bookmarking should not be fatally disrupted by the resulting changes in byte-count-offsets or hash calculations.

After much discussion and comparison of the strategies used by the social reading startups in the room, the group settled on a multi-pronged identification strategy. Because fuzzy matching is inherently imprecise, it is almost certain that a combination of (e.g.) byte-count-offset, word count, and structural page/paragraph/sentence would be required to support robust bookmarking interoperability. This kind of lossy identification, similar conceptually to that used in standards like OpenURL, has the best chance of triangulating to the preferred textual context.

It was also decided that some manner of serialization, such as a JSON encoding, was at least as necessary, or might even be conceivably preferential, to a RDF encoding of bookmarking syntax. Any bookmark standard must be both easily constructed and easily consumed by a wide range of agents, many of which will be web based and reliant on http protocol-based data transfers.

The next steps are the formation of a NISO working group, and then the hard work of scheduling and shepherding a group of participants towards a proposal that can be publicly vetted, with ultimate approval by NISO membership. Although even a draft standard cannot be expected until summer 2012 at the earliest, we should be able to agree on a basic serialization of bookmarking. And that would be a significant accomplishment, in and of itself.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>