Sometime in November, I came to the realization that I had horribly misinterpreted the NISO Z39.88/OpenURL 1.0 spec. I’m on the NISO Advisory Committee for OpenURL (which makes this even more embarrassing) and was reviewing the proposal for the Request Transfer Message Community Profile and its associated metadata formats when it dawned on me that my mental model was completely wrong. For those of you that have primarily dealt with KEV based OpenURLs (which is 99% of all the OpenURLs in the wild), I would wager that your mental model is probably wrong, too.
A quick primer on OpenURL:
- OpenURL is a standard for transporting ContextObjects (basically a reference to something, in practice, mostly bibliographic citations)
- A ContextObject (CTX, for short from now on) is comprised of Entities that help define what it is. Entities can be one of six kinds:
- Referent – this is the meat of the CTX, what it’s about, what you’re trying to get context about. A CTX must have one referent and only one.
- ReferringEntity – defines the resource that cited the referent. This is optional and can only appear once.
- Referrer – the source of where the CTX came from (i.e. the A&I database). This is optional and can only appear once.
- Requester – this is information about who is making the request (i.e. the user’s IP address). This is optional and can only appear once.
- ServiceType – this defines what sorts of services are being requested about the referent (i.e. getFullText, document delivery services, etc.). There can be zero or many ServiceType entities defined in the CTX.
- Resolver - these are messages specifically to the resolver about the request. There can be zero or more Resolver entities defined in the CTX.
- All entities are basically the same in what they can hold:
- Identifiers (such as DOI or IP Address)
- By-Value Metadata (the metadata is included in the Entity)
- By-Reference Metadata (the Entity has a pointer to a URL where you can retrieve the metadata, rather than including it in the CTX itself)
- Private Data (presumably data, possibly confidential, between the entity and the resolver)
- A CTX can also contain administrative data, which defines the version of the ContextObject, a timestamp and an identifier for the CTX (all optional)
- Community Profiles define valid configurations and constraints for a given use case (for instance, scholarly search services are defined differently than document delivery). Context objects don’t actually specify any community profile they conform to. This is a rather loose agreement between the resolver and the context object source: if you provide me with a SAP1, SAP2 or Dublin Core compliant OpenURL, I can return something sensible.
- There are currently two registered serializations for OpenURL: Key/Encoded Values where all of the values are output on a single string, formatted as key=value and delimited by ampersands (this is what majority of all OpenURLs that currently exist look like) and XML (which is much rarer, but also much more powerful)
- There is no standard OpenURL ‘response’ format. Given the nature of OpenURL, it’s highly unlikely that one could be created that would meet all expected needs. A better alternative would be for a particular community profile to define a response format since the scope would be more realistic and focused.
Looking back on this, I’m not sure how “quick” this is, but hopefully it can bootstrap those of you that have only cursory knowledge of OpenURL (or less). Another interesting way to look at OpenURL is Jeff Young’s 6 questions approach, which breaks OpenURL down to “who”, “what”, “where”, “when”, “why” and “how”.
One of the great failings of OpenURL (in my mind, at least) is the complete and utter lack of documentation, examples, dialog or tutorials about its use or potential. In fact, outside of COinS, maybe, there is no notion of “community” to help promote OpenURL or cultivate awareness or adoption. To be fair, I am as guilty as anybody for this failure, since I had proposed making a community site for OpenURL, but due to a shift in job responsibilities and then the wholesale change in employers, coupled with the hacking of the server it was to live on, left this by the wayside. I’m putting this back on my to do list.
What this lack of direction leads to is that would-be implementors wind up making a lot of assumptions about OpenURL. The official spec published at NISO is a tough read and is generally discouraged by the “inner core” of the OpenURL universe (the Herbert van de Sompels, the Eric Hellmans, the Karen Coyles, etc.) in favor of the “Implementation Guidelines” documents. However, only the KEV Guidelines are actually posted there. The only other real avenue for trying to come to grips with OpenURL is to dissect the behavior of link resolvers. Again, in almost every instance this means you’re working with KEVs and the downside of KEVs is that they give you a very naive view of OpenURL.
KEVs, by their very nature, are flat and expose next to nothing about the structure of the model of the context object they represent. Take the following, for example:
Ugly, I know, but bear with me for a moment. From this example, let’s focus on the Referent:
and then let’s make this a little more human readable:
rft.btitle: Dépendances et niveaux de représentation en syntaxe
rft.place: Amsterdam, Philadelphia
Looking at this example, it’s certainly easy to draw some conclusions about the referent, the most obvious being that it’s a book.
Actually (and this is where it gets complicated and I begin to look pedantic) it’s really only telling you, I am sending some by value metadata in the info:ofi/fmt:kev:mtx:book format, not that the thing is actually a book (although the info:ofi/fmt:kev:mtx:book metadata values do state that, but, ignore that for a minute since genre is optional).
The way this actually should be thought of:
Metadata by Value:
Btitle: Dépendances et niveaux de représentation en syntaxe
Metadata by Value:
Btitle: Minimalist Progam
Metadata By Value:
So, this should still seem fairly straightforward, but the hierarchy certainly isn’t evident in the KEV. It’s a good starting point to begin talking about the complexity of working with OpenURL, though, especially if you’re trying to create a service that consumes OpenURL context objects.
Back to the referent metadata. The context object didn’t have to send the data in the “metadata by value” stanza. It could have just sent the identifier “urn:isbn:9027231141″ (and note in the above example, it didn’t have an identifier at all). It could also have sent metadata in the Dublin Core format, MARC21, MODS, ONIX or all of the above (the Metadata By Value element is repeatable) if you wanted to make sure your referent could be parsed by the widest range of resolvers. While all of these are bibliographic formats, in Request Transfer Message context objects (which would be used for document delivery, which got me started down this whole path), you would conceivably have one or more of the aforementioned metadata types plus a Request Transfer Profile Referent type that describes the sorts of interlibrary loan-ish types of data that accompany the referent as well as an ISO Holdings Schema metadata element carrying the actual items a library has, their locations and status.
If you only have run across KEVs describing journal articles or books, this may come as a bit of a surprise. Instead of saying the above referent is a book, it becomes important to say that the referent contains a metadata package (as Jonathan Rochkind calls it) that is in this (OpenURL specific) book format. In this regard, OpenURL is similar to METS. It wraps other metadata documents and defines the relationships between them. It is completely ambivalent about the data it is transporting and makes no attempt to define it or format it in any way. The Journal, Book, Patent and Dissertation formats were basically contrived to make compatibility with OpenURL 0.1 easier, but they are not directly associated with OpenURL and could have just as easily been replaced with, say, BibTex or RIS (although the fact that they were created alongside Z39.88 and are maintained by the same community makes the distinction difficult to see).
What this means, then, is that in order to know anything about a given entity, you also need to know about the metadata format that is being sent about it. And since that metadata could literally be in any format, it means there are lot of variables that need to be addressed just to know what a thing is.
For the Umlaut, I wrote an OpenURL library for Ruby as a means to parse and create OpenURLs. Needless to say, it was originally written with that naive, KEV-based, mental model (plus some other just completely errant assumptions about how context objects worked) and, because of this, I decided to completely rewrite it. I am still in the process of this, but am struggling with some core architectural concepts and am throwing this out to the larger world as an appeal for ideas or advice.
Overall the design is pretty simple: there is a ContextObject object that contains a hash of the administrative metadata and then attributes (referent, referrer, requester, etc.) that contain Entity objects.
The Entity object has arrays of identifiers, private data and metadata.
And then this is where I start to run aground.
The original (and current) plan was to populate the metadata array with native metadata objects that are generated by registering metadata classes in a MetadataFactory class. The problem, you see, is that I don’t want to get into the business of having to create classes to parse and access every kind of metadata format that gets approved for Z39.88. For example, Ed Summers’ ruby-marc has already solved the problem of effectively working with MARC in Ruby, so why do I want to reinvent that wheel? The counter argument is, by delegating these responsibilities to third party libraries, there is no consistency of APIs between “metadata packages”. A method used in format A may very well raise an exception (or, worse, overwrite data) in format B.
There is a secondary problem that third party libraries aren’t going have any idea that they’re in an OpenURL context object or even know what that is. This means there would have to be some class that handles functionality like xml serialization (since ruby-marc doesn’t know that Z39.88 refers to it as info:ofi/fmt:xml:xsd:MARC21), although this can be handled by the specific metadata factory class. This would also be necessary when parsing an incoming OpenURL since, theoretically, every library could have a different syntax for importing XML, KEVs or whatever other serialization is devised in the future.
So I’m looking for advice on how to proceed. All ideas welcome.