Risk, reward, and the in-between
July 7th, 2008
I have been following a thread on the VuFind-Tech list regarding the project’s endorsement of Jangle to provide the basis of the ILS plugin architecture for that project. It’s not an explicit mandate, just a pragmatic decision that if work is going in to creating a plugin for VuFind, it would make more sense (from an open source economics point of view) if that plugin was useful to more projects than just VuFind. More users, more interest, more community, more support.
The skepticism of Jangle is understandable and expected. After all, it’s a very unorthodox approach to library data, seemingly eschewing other library initiatives and, at the surface, seems to be wholly funded by a single vendor’s support.
And, certainly, Jangle may fail. Just like any other project. Just like VuFind. Just like Evergreen. Any new innovative project brings risk. More important than the direct reward of any of these initiatives succeeding is the disruption they bring to the status quo. Instead of what they directly bring to the table, what do they change about how we view the world?
Let’s start with Evergreen. Five years ago I sat in a conference room at Emory’s main library while Brad LaJeunesse and Jason Etheridge (this predated PINES hiring Mike Rylander and Bill Erickson) told us that they were ditching Unicorn and building their own system. I, like the others in the room, Selden Deemer, Martin Halbert, smiled and nodded and when they left I (Mr. Library Technology Polyanna) turned to the others and said that I liked their moxie, but it was never going to work. Koha was the only precedent at the time, and, frankly, it seemed like a toy.
Now where are we? Most of the public libraries in Georgia using Evergreen, a large contingency from British Columbia migrating, and a handful of academic libraries either live or working towards migration. Well, I sure was wrong.
The more significant repercussion of PINES going live with Evergreen was that it cast into doubt our assumptions of how our relationship with our integrated library system needed to work. Rather than the library waiting for their vendor to provide whatever functionality they need or want, they can instead, implement it themselves. While it’s unrealistic for every library to migrate to Evergreen or Koha, these projects have brought to light the lack of transparency and cooperation in the ILS marketplace.
Similarly, projects like VuFind, Blacklight and fac-back-opac prove that by pulling some off-the-shelf non-library-specific applications and cleverly using existing web services (like covers from Amazon) that we can cheaply and quickly create the kinds of interfaces we have been begging from our vendors for years. It is unlikely that all of these initiatives will succeed, and the casualties will more likely be the result of the technology stack they are built upon rather than any lack of functionality, the fact that they all appeared around the same time and answer roughly the same question, shows that we can pool our resources and build some pretty neat things.
To be fair, the real risk taker in this arena was NC State. They spent the money on Endeca and rolled out the interface that wound up changing the way we looked at the OPAC. The reward of NCSU’s entrepreneurialism is that we now have projects like VuFind and its ilk. Very few libraries can afford to be directly rewarded by NC State’s catalog implementation, but with every library that signs on with Encore or Primo, III and Ex Libris owe that sale to a handful of people in Raleigh. You would not be able to download and play with VuFind if NC State libraries had worried too much about failure.
Which then brings me to Jangle. The decision to build the spec on the Atom Publishing Protocol has definitely been the single most criticism of the project (once we removed the confusing, outdated wiki pages about Jangle being an Rails application), but there has been little dialogue as to why it wouldn’t work (actually, none). The purpose of Jangle is to provide an API for roughly 95% of your local development needs with regards to your library services. There will be edge cases, for sure, and Jangle might not cover them. At this point, it’s hard to tell. What is easier to tell, however, is that dwelling on the edge cases does absolutely nothing to address the majority of needs. Also, the edge cases are mainly library-internal-specific problems (like circulation rules). A campus or municipal IT person doesn’t particularly care about these specifics when trying to integrate the library into courseware or some e-government portal. They just want a simple way to get the data.
This doesn’t mean that Jangle is solely relegated to simple tasks, however. It just is capable of scaling down to simple use cases. And that’s where I hope Jangle causes disruption whether or not it is ultimately the technology that succeeds. By leveraging popular non-library-specific web standards it will make the job of the systems librarian or the external developer easier, whether it’s via AtomPub or some other commonly deployed protocol.
Blindly groping towards ActivePlatform
April 9th, 2008
Something I’ve taken it upon myself to do since I joined Talis is make ActiveRDF a viable client to access the Platform. While this is mostly selfishness on my part (I want to keep developing in Ruby and there’s basically no RDF support right now, plus this gives me a chance to learn about the RDF/SPARQL-y aspects of the Platform), I also think that libraries like this can only help democratize the Platform.
So far, it’s been pretty ugly. I haven’t had much time to work on it, granted, but the time I’ve spent on it has made me think that there will be a lot of work to do. Couple this with some of the things that make the Platform difficult to work with in Ruby anyway (read: Digest Authentication) and this might be a more uphill battle than I’ll ever have time for, but I figure it’s either this or go back to Python and I’m not quite ready to give up on Ruby yet.
Currently, performance is abysmal with ActiveRDF against the Platform, so I’ll need to think of shortcuts to improve that (I’m not even considering write access presently). Here’s some code (this is as much for my benefit, so I can remember what I’ve done) to work with Ian Davis’ Quotations Book Example store:
require ‘time’ # Otherwise ActiveRDF starts freaking out about DateTime
require ‘active_rdf’$activerdf_without_xsdtype = true
# less than ideal, but without it, ActiveRDF sends
# ^^<http://www.w3.org/2001/XMLSchema#string> with string literals even if you don’t want
# to send the datatype. I haven’t actually tried it with other datatypes to see how this breaks
# down the road.ConnectionPool.set_data_source(:type => :sparql, :results => :sparql_xml, :engine=>:joseki, :url=> “http://api.talis.com/stores/iand-dev2/services/sparql”)
Namespace.register :foaf, “http://xmlns.com/foaf/0.1/”
Namespace.register :dc, “http://purl.org/dc/elements/1.1/”
Namespace.register :quote, “http://purl.org/vocab/quotation/schema”QUOTE::Quotations.find_by_dc::creator(”Loren, Sophia”).each do | quote |
# print the important stuff from each graph
# http://purl.org/vocab/quotation/schema#quote has to be manually added as a predicate
# the “#” seems to cause problems
quote.add_predicate(:quote, QUOTE::quote)
puts quote.quote
puts quote.subject
puts quote.rights
puts quote.isPrimaryTopicOf
end
If you actually try to execute this, you’ll see that it takes a long time to run (God help you if you try it on QUOTE::Quotations.find_by_dc::subject(”Age and Aging”)). A really long time.
If you set some environment vars before you go into irb:
$ export ACTIVE_RDF_LOG_LEVEL=0
$ export ACTIVE_RDF_LOG=./activerdf.log
then you can tail -f activerdf.log and see what exactly is happening.
After ActiveRDF does it’s initial SPARQL query (SELECT DISTINCT ?s WHERE { ?s <http://purl.org/dc/elements/1.1/creator> “Loren, Sophia” . }), it’s doing two things for every request in the block:
- a SPARQL query for every predicate associated with the URI (http://api.talis.com/stores/iand-dev2/services/sparql?query=SELECT+DISTINCT+%3Fp+WHERE+%7B+%3Chttp%3A%2F%2Fapi.talis.com%2Fstores%2Fiand-dev2%2Fitems%2F1187139384317%3E+%3Fp+%3Fo+.+%7D+)
- a SPARQL query for the value of the attribute (predicate): http://api.talis.com/stores/iand-dev2/services/sparql?query=SELECT+DISTINCT+%3Fo+WHERE+%7B+%3Chttp%3A%2F%2Fapi.talis.com%2Fstores%2Fiand-dev2%2Fitems%2F1187139384317%3E+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2Fcreator%3E+%3Fo+.+%7D
for every predicate in the graph. You can imagine how crazily inefficient this is, since to get every value for a resource, you have to make a different HTTP request for each one.
Obviously this would be a lot easier if it used DESCRIBE rather than SELECT, but without a real RDF library to parse the resulting graph, I’m not sure how ActiveRDF would deal with what the triple store returned.
So, anyway, these are some of the hurdles in making ActiveRDF work with the Platform, but I’m not quite ready to throw in the towel, yet.
Mea Culpa
April 3rd, 2008
Jangle-discuss is now publicly viewable. Since I’m always logged into the Googles, I never noticed that it was blocked to non-subscribers.
Sorry about that.
Bootstrapping Jangle
April 2nd, 2008
After several months of trying, Jangle.org is finally starting to take off. I set up a Drupal instance yesterday on our new web host.
When I was still at Georgia Tech, one of the things I was trying to work on was a framework to consistently and easily expose the library’s data from its various silos into external services. In that case, my initial focus was the Sakai implementation that we were rolling into production, but the intention was to make it as generic as possible (i.e. the opposite of a “Blackboard Building Block“) so it could be consumed and reconstituted into as many applications as we wanted.
Coincidentally (and, for me, conveniently), Talis was also thinking about such a framework that would supply a generic SOA layer to libraries (and potentially beyond) and contacted me about possibly collaborating with them on it as an open source project. Obviously that relationship changed a bit when they hired me and they put me and my colleague Elliot Smith (reports of his demise have been greatly exaggerated) in charge of trying to get this project off the ground. Thankfully, Elliot is the other Talis malcontent who prefers Ruby, so our early prototypes all focused on Rails (the Java that originally seeded the project, like all Java, made my eyes glaze over).
We had a hard time getting anywhere at first. Not even taking into consideration the fact that he and I were an ocean apart, we really had no idea what it was that we should be building or why it would be useful to Talis (after all, they are paying the bills) since they already have an SOA product, Keystone. Also, we didn’t want to recreate Apache Synapse or Kuali Rice. In essence, we were trying to find a solution to a problem we hadn’t really defined, yet.
In December and early January, I drove across town for a couple of meetings with Mike Rylander, Bill Erickson and Jason Etheridge from Equinox to try to generate interest in Jangle and, at the same time, solicit ideas from them as to what this project should look like and do. Thankfully, they gave me both.
Jangle still foundered a bit through February. We were waiting for the DLF’s ILS and Discovery Systems API recommendation to come out (since we had targeted that as goal) and Elliot produced a prototype in JRuby (we had long abandoned Rails for this) that effectively consumed the Java classes used for Keystone and rewrote them for Jangle. The problem we were still facing, though, is that we were, effectively, just creating another niche library interface from scratch and there were too many possible avenues to take to accomplish that. Our freedom was paralyzing us.
I gave a lightning talk on Jangle at Code4lib2008 that was big on rah-rah rhetoric (free your data!) and short on details (since we hadn’t really come up with any yet) that generated some interest and a few more subscriptions to our Google Group. A week later, the DLF met with the vendors to talk about their recommendation. I attended by phone. While in many ways I feel the meeting was a wash, it did help define for me what Jangle needed to do.
At the end of my first meeting with Equinox, Mike Rylander asked me if we had considered supporting the Atom Publishing Protocol in Jangle. At the time, I hadn’t. In fact, I didn’t until I sat on the phone for 8 hours listening to the vendors hem and haw over the DLF’s recommendation. The more I sat there (with my ear getting sore), the more I realized that AtomPub might be a good constraint to get things moving (as well as useful to appealing to non-library developers).
We are just now trying to start building how this spec might work. Basically there are two parts. First, the Jangle “core” which is an AtomPub interface to external clients. It’s at this level that we need to model how library resources map to Atom (and other common web data structures, like vCard) and where we need to extend Atom to include data like MARC (when necessary). The Jangle core also proxies these requests to the service “connectors” and translates their responses back to the AtomPub client. The connectors are service specific applications that takes the specific schema and values in, say, a particular ILS’s RDBMS and puts them in a more syntax to send back to the Jangle core. Right now, the proposal is that all communication between the core and connectors would be JSON over HTTP (again, to help forward momentum).
So at this point you may be asking why AtomPub rather than implementing the recommendations of the DLF directly? The recommendation assumes the vendors will be complicit, uniform and timely in implementing their API and I cynically feel that is unrealistic. I also think it helps to get a common, consistent interface to help build interoperability (like the kind that the DLF group is advocating), since then you’d only have to write one, say, NCIP adapter and it would work for all services that have a Jangle connector. Also, by leveraging non-library technologies, it opens up our data to groups outside our walls.
So, if you’re interested in freeing your data (rah-rah!), come help us build this spec. We’re trying to conform to the Rogue ‘05 specification that Dan Chudnov came up with for development of this so, while it will still be a painful process, it won’t be painful and long. :) In other words, this ain’t NISO.
Buy me
March 6th, 2008
Real men go to conferences with their mothers
March 3rd, 2008
In most cases, the suggestion that I travel across the country with my mom to attend a library technology conference would be greeted with incredulous shock. It would be about the same reaction I’d have to the expectation that I take my one and a half year old son across the country. However, that was the position I found myself last Monday, at 5:30AM, driving frantically from Chattanooga to Nashville so as not to miss our flight to Portland for Code4lib 2008.
Why would I subject myself to this? Well, a majority of my family lives in Portland and it seemed like a good opportunity for Che to meet his 97 year old great-grandmother. Since Selena couldn’t go, my mother volunteered to join me and take care of Che while I was in the conference. Surprisingly, despite ridiculously planned twelve hour traveling days, Che was a remarkably good traveling companion. Especially given that the day before we left, he was up all night vomiting from a stomach virus he picked up at daycare. But enough about my stupid traveling decisions for now.
Code4lib 2008 set an awfully high bar for Providence, RI to follow. Jeremy Frumkin managed to find an amazing hotel with both free breakfast and happy hour in the downtown of a remarkable city and somehow managed perfect weather in Portland at the end of February. Is it ok to name Jeremy president for life of Code4lib? Good luck, Brown University, Jeremy just made your job insanely hard.
I would say the overarching theme of this year’s event was “help us with our open source project!”. Or maybe, “fooled you! This actually all about RDF”. Also, given how this is one of the more premier library technology conferences, we might want to give a scholarship to an MBA or two to teach us how to use a projector and Powerpoint. I began to think there was a hex on the podium.
Rather than go over the presentations themselves (since that will be done more eloquently elsewhere, I’m sure), I’ll just make some observations. My perspective changed somewhat after Brewster Kahle’s keynote and Rob Styles’ presentation (during the following break) since my role in the conference went from ‘jerk in the audience making snarky comments in IRC’ to master of ceremonies when Jeremy had to go back to Corvallis to care for his sick daughter (or, rather, for his sick daughter and exhausted wife).
Rob was the first to focus on RDF and I think the constraint of our twenty minute presentation slots was both a blessing and a curse here. Rob did an excellent job of explaining why RDF is a good fit for MARC data (or, rather, the metadata that we are currently putting in MARC), but there wasn’t enough time for that and demonstrating why we would want to go through the effort of actually doing it (like, for linked open data and whatnot). It was a good overview of busting the MARC into its conceptual components and making connections between those components, however.
If any of my former colleagues at Georgia Tech still read my blog (hello!), I definitely urge you to take a look at Oregon State’s Interactive Course Assignment Pages (ICAP). This is exactly what I was trying to do when I wrote the subject guides application there. There is a lot of what appears to be NIH syndrome when it comes to subject/course guide applications; in reality I think these applications have to conform to a lot of local needs and expectations which is why “somebody else’s code” doesn’t always do the trick. ICAP is so similar to what Tech is running (although executed much, much more elegantly) that I don’t think there would be much of a leap for them to migrate. As an aside, the developer (Kim Griggs) mentioned that it took four months from proof-of-concept to production. I find this an amazing attribute of Ruby of Rails. The Umlaut took slightly less than six months. I should add these were both one person development teams.
Next up, David Walker talked about the new Worldcat API. What I found interesting about his talk is that he said his initial interest in the API was to be able to build ad-hoc union catalogs when he needed to. This was my original reason for trying out the Talis Platform when I was at Georgia Tech (although I could never figure out how to manage getting all the libraries I wanted to work with to get their holdings in a Platform store), so it definitely resonated. With the gravitational pull that OCLC already has in the U.S., this idea of ‘union catalog for specific problem set’ could really flourish. Of course, since not every library can afford to put their holdings in Worldcat (and therefore have access to the API) it may not be all that useful to the sorts of libraries that would gain the largest benefit.
While I was impressed by Winona Salesky’s and Michael Park’s presentation on XForms (there were definitely some slick features there!), my takeaway was that this technology seemed too difficult to deploy (when your options are browser extensions or Java servlets, I’ll pass on that). Am I missing something here? Java purists, you are free to mock me and my scripting language ways.
Interestingly, the Zotero presentation removed pretty much all references to RDF and the Semantic Web that appeared in their proposal.
The breakout session I attended on Tuesday was proposed by Jonathan Rochkind (who I co-presented with on the Umlaut) about finding a way to isolate only the open access content in OAIster. While maintaining a separate index of this data might be useful (sort of like what IndexData does, but they also include the non-free material; I am also harvesting some of this data with the intention of putting it into a platform store… but whoa does the clean-up of the data take time…), the group eventually decided that a web service that identified whether or not OAIster results were OA or not (via whitelists, blacklists, etc.) might be an easier first step. I’ve created a Google Group to carry this discussion forward here, if anybody is interested in participating.
I can’t remember which lightning talks happened on day one (is there a list published anywhere?), but it doesn’t matter. The day two lightning talks showed why having this style of presentation is so great. The reasons can be summed up by pictures of underpants and player piano midi files being played from musical scores generated from OCR. When the video files are up, definitely check those two out.
Did I mention that there were free drinks from 5:30-7:30 every day?
Karen Coyle opened day two with her keynote on RDA. Corey Harper closed the formal presentations the next day on the same topic. It was pretty neat how they managed to not step on each other’s toes. I think their message resounded pretty loudly: people like Code4libbers need to get involved in RDA and DCAM to ground it in reality. Ok, I’ll see what I can do.
LibLime’s MARC editor was quite nice. Granted, I’ve never used OCLC Connexion, but I would have to think something like this would be a strong competitor.
I’ll skip over Aaron Swartz’s presentation (based on the number of Flickr photos of his presentation, I figure there must be plenty about it already), but I plan on writing a bit about ThingDB and other document-centric databases really soon.
Skipping up to the DLF ILS API, both Emily and Terry’s presentation as well as the breakout session on this were incredibly useful. I feel like the DLF has a pretty pragmatic approach to working on the problems of interoperability and Emily and Terry were really good emissaries to explain their goals to Code4lib. During the breakout I introduced Jangle and later gave a lightning talk on it. Since the cat’s out of the bag, I absolutely will have a post about that this week. While I don’t exactly trust the vendors’ response to the DLF proposal (they meet on Thursday), I think that the potential of this is mutually beneficial to every party. Jangle was quite well received, by the way.
Omeka seems exactly like what Greenstone promised to be, but wasn’t quite exactly (at least, in my mind). I really like the focus and polish of this project.
Day three is where things started slipping for me. Not only was my brain quite full at that point, but Che’s stomach virus apparently had taken a grip on me (and I apologize to anybody I paid that forward to). I did manage to get out to see my family that night, though (who may not be as thankful to have seen me if I spread it to them, too).
This is a fun conference and I’m really proud to be a part of it. Thankfully, Oregon State had everything so well organized that things still hummed along fine despite losing our ringleader. I had a good time MCing, but we lost all of the dignity and professionalism when Jeremy had to leave. “Slackerly” and “clownish” are more apt for me. I know Roy Tennant and, sir, I am no Roy Tennant.
12/17/07 - Resurgens - 1/11/08
January 14th, 2008
Last Thursday, Johns Hopkins University Libraries went live with the Ümlaut (Ü2). This comes slightly less than four weeks after Georgia Tech took theirs down (although they were using the much more duct tape and bailing wire version 1), and it’s nice to see a library back in the land of Röck Döts.
Ü2 shares very little except superficially with the original Ümlaut, and I owe Jonathan Rochkind a lot for getting it to this level. It’s an interesting dynamic between us (as anybody who has spent a minute in #code4lib in the last eight months knows) that seems to work pretty well. It would be nice to expand the community beyond just us, though. It’s pretty likely that the Ümlaut will work its way into Talis’ product suite in some form or another, so that would probably draw some people in, but it would be nice to see more SFX (or other link resolvers) customers join the party.
This isn’t to say that JHÜmlaut doesn’t need some work. In fact, there’s something really wrong with it: it’s taking way too long to resolve (Georgia Tech’s was about twice as fast, although probably with a lighter load). If I were to guess I would assume that the SFX API is the culprit; when GT’s was performing similarly, there was a bug in the OCLC Resolver Registry lookup that was causing two SFX requests per Ümlaut request (it wasn’t recognizing that it was duplicating). This isn’t the case with JHU (not only did Jonathan remove the OCLC Registry feature, it wouldn’t be affecting me, sitting at home in Atlanta, anyway).
Performance was one of the reasons GT’s relationship soured with the Ümlaut (an unfortunate bout of downtime after I left was the biggie, I think, though), so I hope we can iron this out before JHU starts getting disillusioned. Thankfully, they didn’t have the huge EBSCO bug that afflicted GT on launch.
For reasons only known in Ipswich, MA, EBSCO appends their OpenURLs with <<SomeIdentifier. Since this is injected into the location header via JavaScript (EBSCO sends their OpenURLs via a JavaScript popup), Internet Explorer and Safari don’t escape the URL which causes Mongrel to explode (these are illegal characters in HTTP, after all). Since the entire state of Georgia gets about half their electronic journal content from EBSCO, this was a really huge problem (which was fixed by dumping Mongrel in favor of LigHTTPD and FastCGI). These are the sorts of scenarios that caused the reference librarians to lose confidence.
JHU has the advantage of GT’s learning curve, so hopefully we can circumvent these sorts of problems. It’s still got to get faster, though.
Still, I’m happy. It’s good and refreshing to see the Ümlaut back in action.
Objectifying OpenURL
January 9th, 2008
Sometime in November, I came to the realization that I had horribly misinterpreted the NISO Z39.88/OpenURL 1.0 spec. I’m on the NISO Advisory Committee for OpenURL (which makes this even more embarrassing) and was reviewing the proposal for the Request Transfer Message Community Profile and its associated metadata formats when it dawned on me that my mental model was completely wrong. For those of you that have primarily dealt with KEV based OpenURLs (which is 99% of all the OpenURLs in the wild), I would wager that your mental model is probably wrong, too.
A quick primer on OpenURL:
- OpenURL is a standard for transporting ContextObjects (basically a reference to something, in practice, mostly bibliographic citations)
- A ContextObject (CTX, for short from now on) is comprised of Entities that help define what it is. Entities can be one of six kinds:
- Referent - this is the meat of the CTX, what it’s about, what you’re trying to get context about. A CTX must have one referent and only one.
- ReferringEntity - defines the resource that cited the referent. This is optional and can only appear once.
- Referrer - the source of where the CTX came from (i.e. the A&I database). This is optional and can only appear once.
- Requester - this is information about who is making the request (i.e. the user’s IP address). This is optional and can only appear once.
- ServiceType - this defines what sorts of services are being requested about the referent (i.e. getFullText, document delivery services, etc.). There can be zero or many ServiceType entities defined in the CTX.
- Resolver - these are messages specifically to the resolver about the request. There can be zero or more Resolver entities defined in the CTX.
- All entities are basically the same in what they can hold:
- Identifiers (such as DOI or IP Address)
- By-Value Metadata (the metadata is included in the Entity)
- By-Reference Metadata (the Entity has a pointer to a URL where you can retrieve the metadata, rather than including it in the CTX itself)
- Private Data (presumably data, possibly confidential, between the entity and the resolver)
- A CTX can also contain administrative data, which defines the version of the ContextObject, a timestamp and an identifier for the CTX (all optional)
- Community Profiles define valid configurations and constraints for a given use case (for instance, scholarly search services are defined differently than document delivery). Context objects don’t actually specify any community profile they conform to. This is a rather loose agreement between the resolver and the context object source: if you provide me with a SAP1, SAP2 or Dublin Core compliant OpenURL, I can return something sensible.
- There are currently two registered serializations for OpenURL: Key/Encoded Values where all of the values are output on a single string, formatted as key=value and delimited by ampersands (this is what majority of all OpenURLs that currently exist look like) and XML (which is much rarer, but also much more powerful)
- There is no standard OpenURL ‘response’ format. Given the nature of OpenURL, it’s highly unlikely that one could be created that would meet all expected needs. A better alternative would be for a particular community profile to define a response format since the scope would be more realistic and focused.
Looking back on this, I’m not sure how “quick” this is, but hopefully it can bootstrap those of you that have only cursory knowledge of OpenURL (or less). Another interesting way to look at OpenURL is Jeff Young’s 6 questions approach, which breaks OpenURL down to “who”, “what”, “where”, “when”, “why” and “how”.
One of the great failings of OpenURL (in my mind, at least) is the complete and utter lack of documentation, examples, dialog or tutorials about its use or potential. In fact, outside of COinS, maybe, there is no notion of “community” to help promote OpenURL or cultivate awareness or adoption. To be fair, I am as guilty as anybody for this failure, since I had proposed making a community site for OpenURL, but due to a shift in job responsibilities and then the wholesale change in employers, coupled with the hacking of the server it was to live on, left this by the wayside. I’m putting this back on my to do list.
What this lack of direction leads to is that would-be implementors wind up making a lot of assumptions about OpenURL. The official spec published at NISO is a tough read and is generally discouraged by the “inner core” of the OpenURL universe (the Herbert van de Sompels, the Eric Hellmans, the Karen Coyles, etc.) in favor of the “Implementation Guidelines” documents. However, only the KEV Guidelines are actually posted there. The only other real avenue for trying to come to grips with OpenURL is to dissect the behavior of link resolvers. Again, in almost every instance this means you’re working with KEVs and the downside of KEVs is that they give you a very naive view of OpenURL.
KEVs, by their very nature, are flat and expose next to nothing about the structure of the model of the context object they represent. Take the following, for example:
&url_ctx_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Actx&ctx_ver=Z39.88-2004
&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&ctx_id=10_8&ctx_tim=2003-04-11T10%3A08%3A30TZD
&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.aulast=Vergnaud
&rft.auinit=J.-R&rft.btitle=D%C3%A9pendances+et+niveaux+de+repr%C3%A9sentation+en+syntaxe
&rft.date=1985&rft.pub=Benjamins&rft.place=Amsterdam%2C+Philadelphia
&rfe_id=urn%3Aisbn%3A0262531283&rfe_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook
&rfe.genre=book&rfe.aulast=Chomsky&rfe.auinit=N&rfe.btitle=Minimalist+Program
&rfe.isbn=0262531283&rfe.date=1995&rfe.pub=The+MIT+Press&rfe.place=Cambridge%2C+Mass
&svc_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Asch_svc&svc.abstract=yes
&rfr_id=info%3Asid%2Febookco.com%3Abookreader
Ugly, I know, but bear with me for a moment. From this example, let’s focus on the Referent:
&rft.auinit=J.-R&rft.btitle=D%C3%A9pendances+et+niveaux+de+repr%C3%A9sentation+en+syntaxe
&rft.date=1985&rft.pub=Benjamins&rft.place=Amsterdam%2C+Philadelphia
and then let’s make this a little more human readable:
rft.genre: book
rft.aulast: Vergnaud
rft.auinit: J.-R
rft.btitle: Dépendances et niveaux de représentation en syntaxe
rft.date: 1985
rft.pub: Benjamins
rft.place: Amsterdam, Philadelphia
Looking at this example, it’s certainly easy to draw some conclusions about the referent, the most obvious being that it’s a book.
Actually (and this is where it gets complicated and I begin to look pedantic) it’s really only telling you, I am sending some by value metadata in the info:ofi/fmt:kev:mtx:book format, not that the thing is actually a book (although the info:ofi/fmt:kev:mtx:book metadata values do state that, but, ignore that for a minute since genre is optional).
The way this actually should be thought of:
Referent:
Metadata by Value:
Format: info:ofi/fmt:kev:mtx:book
Metadata:
Genre: book
Btitle: Dépendances et niveaux de représentation en syntaxe
…
ReferringEntity:
Identifier: urn:isbn:0262531283
Metadata by Value:
Format: info:ofi/fmt:kev:mtx:book
Metadata:
Genre: book
Isbn: 0262531283
Btitle: Minimalist Progam
…
Referrer:
Identifier: info:sid/ebookco.com:bookreader
ServiceType:
Metadata By Value:
Format: info:ofi/fmt:kev:mtx:sch_svc
Metadata:
Abstract: yes
So, this should still seem fairly straightforward, but the hierarchy certainly isn’t evident in the KEV. It’s a good starting point to begin talking about the complexity of working with OpenURL, though, especially if you’re trying to create a service that consumes OpenURL context objects.
Back to the referent metadata. The context object didn’t have to send the data in the “metadata by value” stanza. It could have just sent the identifier “urn:isbn:9027231141″ (and note in the above example, it didn’t have an identifier at all). It could also have sent metadata in the Dublin Core format, MARC21, MODS, ONIX or all of the above (the Metadata By Value element is repeatable) if you wanted to make sure your referent could be parsed by the widest range of resolvers. While all of these are bibliographic formats, in Request Transfer Message context objects (which would be used for document delivery, which got me started down this whole path), you would conceivably have one or more of the aforementioned metadata types plus a Request Transfer Profile Referent type that describes the sorts of interlibrary loan-ish types of data that accompany the referent as well as an ISO Holdings Schema metadata element carrying the actual items a library has, their locations and status.
If you only have run across KEVs describing journal articles or books, this may come as a bit of a surprise. Instead of saying the above referent is a book, it becomes important to say that the referent contains a metadata package (as Jonathan Rochkind calls it) that is in this (OpenURL specific) book format. In this regard, OpenURL is similar to METS. It wraps other metadata documents and defines the relationships between them. It is completely ambivalent about the data it is transporting and makes no attempt to define it or format it in any way. The Journal, Book, Patent and Dissertation formats were basically contrived to make compatibility with OpenURL 0.1 easier, but they are not directly associated with OpenURL and could have just as easily been replaced with, say, BibTex or RIS (although the fact that they were created alongside Z39.88 and are maintained by the same community makes the distinction difficult to see).
What this means, then, is that in order to know anything about a given entity, you also need to know about the metadata format that is being sent about it. And since that metadata could literally be in any format, it means there are lot of variables that need to be addressed just to know what a thing is.
For the Umlaut, I wrote an OpenURL library for Ruby as a means to parse and create OpenURLs. Needless to say, it was originally written with that naive, KEV-based, mental model (plus some other just completely errant assumptions about how context objects worked) and, because of this, I decided to completely rewrite it. I am still in the process of this, but am struggling with some core architectural concepts and am throwing this out to the larger world as an appeal for ideas or advice.
Overall the design is pretty simple: there is a ContextObject object that contains a hash of the administrative metadata and then attributes (referent, referrer, requester, etc.) that contain Entity objects.
The Entity object has arrays of identifiers, private data and metadata.
And then this is where I start to run aground.
The original (and current) plan was to populate the metadata array with native metadata objects that are generated by registering metadata classes in a MetadataFactory class. The problem, you see, is that I don’t want to get into the business of having to create classes to parse and access every kind of metadata format that gets approved for Z39.88. For example, Ed Summers’ ruby-marc has already solved the problem of effectively working with MARC in Ruby, so why do I want to reinvent that wheel? The counter argument is, by delegating these responsibilities to third party libraries, there is no consistency of APIs between “metadata packages”. A method used in format A may very well raise an exception (or, worse, overwrite data) in format B.
There is a secondary problem that third party libraries aren’t going have any idea that they’re in an OpenURL context object or even know what that is. This means there would have to be some class that handles functionality like xml serialization (since ruby-marc doesn’t know that Z39.88 refers to it as info:ofi/fmt:xml:xsd:MARC21), although this can be handled by the specific metadata factory class. This would also be necessary when parsing an incoming OpenURL since, theoretically, every library could have a different syntax for importing XML, KEVs or whatever other serialization is devised in the future.
So I’m looking for advice on how to proceed. All ideas welcome.
Filing an extension on my fifteen minutes
January 7th, 2008
I was reading Brian’s appeal for more Emerils in the library world (bam!), noticed Steven Bell’s comment (his blog posting was a response to one by Steven in the first place) and it got me thinking.
First off, I don’t necessarily buy into Brian’s argument. Maybe it’s due to the fact that he’s younger than me, but my noisy, unwanted opinions aren’t because I didn’t get a pretty enough pony for my sixteenth birthday or because I saw Jason Kidd’s house on Cribs ™ and want to see my slam dunk highlights on SportsCenter on my 40″ flat screens in every bathroom. It’s because I feel I have something to offer libraries and I genuinely want to help affect change. Really, I know this is what motivates Brian, too, despite his E! Network thesis, because we worked together and I know his ideas.
Brian doesn’t have to worry about his fifteen minutes coming to a close anytime soon. Although at first blush it would appear that the niche he has carved out for himself is potentially flash-in-the-pan-y (Facebook, Second Life, library gaming, other Library 2.0 conceits), the motivation for why he does what he does is anything but. He is really just trying to meet users where they are, on their terms, to help them with their library experience.
Technologies will change and so, too, will Brian, but that’s not the point. He’ll adapt and adjust his methods to best suit what comes down the pike, as it comes down the pike (proactively, rather than reactively) and continue to be a vanguard in engaging users on their own turf. More importantly, though, I think he can continue to be a voice in libraries because he works in a library and if you have some creative initiative it’s very easy to stand out and make yourself heard.
Brian and I used joke about the library rock star lifestyle: articles, accolades, speaking gigs, etc. A lot of this comes prettily easily, however. If you can articulate some rational ideas and show a little something to back those ideas up, you can quickly make a name for yourself. Information science wants visionary people (regardless of whether or not they follow that leader) and librarians want to hear new ideas for how to solve old problems. Being a rock star is pretty easy, being a revolutionary is considerably harder.
I made the jump from library to vendor because I wanted to see my ideas affect a larger radius than what I could do at a single node. It has been an interesting adjustment and I’m definitely still trying to find my footing. It has been much, much more difficult to stand out because I am suddenly surrounded by a bunch of people that much are smarter than me, much better developers than me, and have more experience applying technology on a large scale. This is not to say that I haven’t worked with brilliant people in libraries (certainly I have, Brian among them), but the ratio has never been quite like this. Add to the fact that being a noisy, opinionated voice within a vendor has its immediate share of skeptics and cynics (who are the ‘rock stars’ in the vendor community? Stephen Abram? Shoot me.), I may find myself falling into Steven Bell’s dustbin. Then again, I might be able to eventually influence the sorts of changes that inspired me to make the leap in the first place. I can do without the stardom in that case.
New Year, New Leaf
January 4th, 2008
I realize I’ve been extremely quiet for the last several months (probably around the time I left Tech for Talis). While there are a slew of reasons for this (holidays, settling into new job, trying to shore up some projects, writing some articles, family, etc.), I’ve let it get out of hand. One of the downsides to not writing here, is that it makes writing elsewhere much more difficult. My goal is to be much more prolific here. I don’t want to give myself mandates that I’m going to post daily or weekly, since I don’t want to the things I write about to be contrived, but I think output here increases my productivity elsewhere, so I want to promote that.
I also want to read more.
Anyway, since I last posted here, I’ve written two columns for the Journal of Electronic Resources Librarianship (of which the first one doesn’t even come out until May) and one about the Communicat for the new Code4Lib Journal. A colleague of mine from Talis and I have been working on a generic API application for libraries. I’ve completely refactored ROpenURL, dipped my toes back into Ümlaut development (in the meantime, Georgia Tech has done away with their Ümlaut implementation which is frustrating and somewhat embarrassing), and have been playing around a lot with JRuby, especially with regards to Ümlaut2 (or 3, which I hope to integrate with the Platform).
I’ll be writing about some my reflections from these experiences soon.