Archive

jangle

For a couple of months this year, the library world was aflame with rage at the proposed OCLC licensing policy regarding bibliographic records.  It was a justifiable complaint, although I basically stayed out of it:  it just didn’t affect me very much.  After much gnashing of teeth, petitions, open letters from consortia, etc. OCLC eventually rescinded their proposal.

Righteous indignation: 1, “the man”: 0.

While this could certainly counted as a success (I think, although this means we default to the much more ambiguous 1987 guidelines), there is a bit of a mixed message here about where the library community’s priorities lie.  It’s great that you now have the right to share your data, but, really, how do you expect to do it?

It has been a little over a year since the Jangle 1.0 specification has been released; 15 months or so since all of the major library vendors (with one exception) agreed to the Digital Library Federation’s “Berkeley Accord”; and we’re at the anniversary of the workshop where the vendors actually agreed on how we would implement a “level 1″ DLF API.

So far, not a single vendor at the table has honored their commitment, and I have seen no intention to do so with the exception of Koha (although, interestingly, not by the company represented in the Accord).

I am going to focus here on the DLF ILS-DI API, rather than Jangle, because it is something we all agreed to.  For all intents and purposes, Jangle and the ILS-DI are interchangeable:  I think anybody that has invested any energy in either project would be thrilled if either one actually caught on and was implemented in a major ILMS.  Both specifications share the same scope and purpose.  The resources required to support one would be the same as the other, the only difference between the two are the client-side interfaces.  Jangle technically meets all of the recommendations of the ILS-DI, but not to the bindings that we, the vendors, agreed to (although there is an ‘adapter’ to bridge that gap).  Despite having spent the last two years of my life working on Jangle, I would be thrilled to no end if the ILS-DI saw broad uptake.  I couldn’t care less about the serialization; I only care about the access.

There is only one reason that the vendors are not honoring their commitment:  libraries aren’t demanding that they do.

Why is this?  Why the rally to ensure that our bibliographic data is free for us to share when we lack the technology to actually do the sharing?

When you look at the open source OPAC replacements (I’m only going to refer to the OSS ones here, because they are transparent, as opposed to their commercial counterparts):  VuFind, Blacklight, Scriblio, etc. and take stock of hoops that have to be jumped through to populate their indexes and check availability, most libraries would throw their hands in the air and walk away.  There are batch dumps of MARC records.  Rsync jobs to get the data to the OPAC server.  Cron jobs to get the MARC into the discovery system.  Screen scrapers and one off “drivers” to parse holdings and status.  It is a complete mess.

It’s also the case for every Primo, Encore, Worldcat Local, AquaBrowser, etc. that isn’t sold to an internal customer.

If you’ve ever wondered why the third party integration and enrichment services are ultimately somewhat unsatisfying (think BookSite.com or how LibraryThing for Libraries is really only useful when you can actually find something), this is it.  The vendors have made it nearly impossible for a viable ecosystem to exist because there is no good way to access the library’s own data.

And it has got to stop.

For the OCLC withdrawal to mean anything, libraries have either got to put pressure on their vendors to support one of the two open APIs, migrate to a vendor that does support the open APIs, or circumvent the vendors entirely by implementing the specifications themselves (and sharing with their peers).  This cartel of closed access is stifling innovation and, ultimately, hurting the library users.

I’ll hold up my end (and ensure it’s ILS-DI compatible via this) and work towards it being officially supported here, but the 110 or so Alto customers aren’t exactly going to make or break this.

Hold your vendor’s feet to the fire and insist they uphold their commitment.

For the last couple of weeks I’ve returned to working on Alto Jangle connector, at least part-time.  I had shelved development on it for a while; I had a hard time finding anybody interested in using it and had reached a point where the development database I was working against was making it difficult to know what to expect in a real, live Alto system.  After I got wind of a couple of libraries that might be interested in it, I thought I should at least get it in a usable state.

One of the things that was vexing me prior to my hiatus was how to get Sybase to page through results in a semi-performant way.  I had originally blamed it on Grails, then when I played around with refactoring the connector in PHP (using Quercus, which is pretty slick by the way, to provide Sybase access via JDBC — the easiest way to do it) I realized that paging is just outside of Sybase’s capabilities.

And when you’re so used to MySQL, PostgreSQL and SQLite, this sort of makes your jaw drop (although, in its defense, it appears that this isn’t all that easy in Oracle, either — however, it’s at least possible in Oracle).

There seem to be two ways to do something like getting rows 375,000 – 375,099 from all of the rows in a table:

  1. Use cursors
  2. use SET ROWCOUNT 375100 and loop through and throw out the first 375,000 results.

The first option isn’t really viable.  You need write access to the database and it’s unclear how to make this work in most database abstraction libraries.  I don’t actually know that cursors do anything differently than option 2 besides pushing the looping to the database engine itself.  I was actually using cursors in my first experiments in JRuby using java.sql directly, but since I wasn’t aware of this problem at the time, I didn’t check to see how well it performed.

Option 2 is a mess, but this appears to be how GORM/Hibernate deals with paging in Sybase.  Cursors aren’t available in Quercus’ version of PDO, so it was how I had to deal with paging in my PHP prototypes, as well.  When I realized that PHP was not going to be any faster than Grails, I decided to just stick with Grails (“regular C-PHP” is out — compiling in Sybase support is far too heavy a burden).

This paging thing still needed to be addressed.  Offsets of 400,000 and more were taking more than twelve minutes to return.  How much more, I don’t know — I killed the request at the 12 minute mark.  While some of this might be result of a bad or missing index, any way you cut it, it wasn’t going to be acceptable.

I was kicking around the idea of exporting the “models” of the Jangle entities into a local HSQLDB (or whatever) mirror and then working the paging off of that.  I couldn’t help but think that this was sort of a waste, though — exporting from one RDBMS to another solely for the benefit of paging.  You’d have to keep them in sync somehow and still refer to the original Sybase DB for things like relationships and current item or borrower status.  For somebody that’s generally pretty satisfied with hacking together kludgy solutions to problems, this seemed a little too hack-y… even for my standards.

Instead, I settled on a different solution that could potentially bring a bunch of other features along with it.  Searchable is a Grails plugin for Compass, a project to easily integrate Lucene indexes with your Java domain classes (this would be analogous to Rails’ act_as_ferret).  When your Grails application starts up, Searchable will begin to index whatever models you declared as, well,  searchable.  You can even set options to store all of your attributes, even if they’re not actual database fields, alleviating the need to hit the original database at all, which is nice.  Initial indexing doesn’t take long — our “problem” table that took twelve minutes to respond takes less than five minutes to fully index.  It would probably take considerably less than that if the data was consistent (some of the methods to set the attributes can be pretty slow if the data is wonky — it tries multiple paths to find the actual values of the attribute).

What this then affords us is consistent access times, regardless of the size of the offset:  the 4,000th page is as fast as the second:  between 2.5 and 3.5 seconds (our development database server is extremely underpowered and I access it via the VPN — my guess is that a real, live situation would be much faster).

The first page is a bit slower.  I can’t use the Lucene index for the first page of results because there’s no way for Searchable to know if the WORKS_META table has changed since the last request since these changes wouldn’t be happening through Grails.  Since performance for the first hundred rows out of Sybase isn’t bad, the connector just uses it for the first page, then syncs the Lucene index with the database at the end of the request.  Each additional page then pulls from Lucene.  Since these pages wouldn’t exist until after the Lucene index is created and the Lucene index is recreated every time the Grails app is started, I added a controller method that checks the count of the Sybase table and the count of the Lucene index to confirm that they’re in sync (it’s worth noting that if the Lucene index has already been created once, this will be available right away after Grails starts — the reindexing is still happening, but in a temp location that will be moved to the default location once it’s complete overwriting the old index).

The side benefit to using Searchable is that it will make adding search functionality to Alto connector that much easier.  Building SQL statements from the CQL queries in the OpenBiblio connector was a complete pain the butt.  CQL to Lucene syntax should be considerably easier.  It seems like  it would be possible for these Lucene indexes to potentially alleviate the need for the bundles Zebra index that comes with Alto, eventually, but that’s just me talking, not any sort of strategic goal.

Anyway, thanks to Lucene, Sybase is behaving mostly like a modern RDBMS, which is a refreshing change.

I cannot perceive a day that I might charge for a webinar about Jangle.  I expect that that day will never come.

Still, it pains me to see a NISO Webinar on Interoperability:

http://www.niso.org/news/events/2009/interop09

It pains me for a couple of reasons — a hundred bucks for a webinar?  Come on, NISO, get over yourself.

Secondly, I have tremendous respect for and was happy to participate in the DLF ILS-DI Berkeley Accord, but it’s, at best, a half measure, is no longer being actively developed and has, for all intents and purposes, lost its sponsorship.

Jangle isn’t perfect and I realize there’s not a NISO standard to be found (well, you can send Z39.2…), but if you’re going to talk about interoperability, there’s not a more pragmatic and simple approach on the table, currently.

Let me start this by saying this is not a criticism or a rant against any of the technologies I am about to mention.  The problems I am having are pretty specific and the fact that I am intentionally trying to use “off the shelf” commodity projects to accomplish my goal complicates things.  I realize when I tweet things like this, it’s not helpful (because there is zero context).

I’ve been in a bit of a rut this week.  Things were going ok on Monday, when I got the Jangle connector for Reserves Direct working, announced and started generating some conversation around how to model course reserves (and their ilk) in Jangle.  However, this left me without anything specific to work on.  I have a general, hand-wavy, project that I am supposed to be working on to provide a simple, general digital asset management framework that can either work on the Platform or with a local repository like Fedora, depending on the institution’s local needs or policies.  More on this in some other post.  The short of it is, I need to gather some general requirements before I can begin something like this in earnest, which led me to revive an old project.

When Jangle first started, about 15 months ago, Elliot and I felt we needed what we called “the Base Connector“.  The idea here was that there were always going to be situations where a developer doesn’t have direct, live access to a system’s data, and they would need a surrogate external database to work with.  The Base Connector was an attempt to provide an out of the box application that could simulate the basics of an ILS and be populated with the sorts of data you would get from commandline ‘report’ type applications.  The sort of thing you can cron and write out to a file on your ILS server.  Updates in catalog records.  Changes in user status.  Transactions.  That sort of thing.

After the amount of interest at Code4lib in Janglifying III Millenium, I decided to revisit the concept of the Base Connector.  Millenium’s (and to an extent, Unicorn’s, and there are no doubt others) lack of consistent database access, makes it a good candidate for this duplicate database.  I was hoping to take a somewhat different approach to this problem than Elliot and I had originally tried, however.  I was hoping to be able to come up with something:

  1. More generically “Jangle” and less domain specific to ILSes
  2. Easy to install
  3. Customizable with a simple interface
  4. Something, preferably, that could be taken mostly “off the shelf”, where the only real “coding” I had to do was to get the library data in and the connector API out.  I was hoping all “data model” and “management” stuff could be taken care of already.

In my mind, I was picturing using a regular CMS for this, although it needed to be able to support customized fields for resources.

Here is the rough scenario I am picturing.  Let’s say you have an ILS that you don’t have a lot of access to.  For your external ‘repository’, you’ll need it to be able to accomodate a few things.

  • Resources will need not just a title, an identifier and the MARC record, but also have fields for ISBN, ISSN, OCLC number, etc.  They’ll also need some sort of relationship to the Items and Collections they’re associated with.
  • Actors could be simple system user accounts, but they’ll need first names and last names and whatnot.
  • Collections, I assume, can probably be contrived via tags and whatnot.
  • The data loading would probably need to be able to be done remotely via some commandline line scripting.

I decided to try three different CMSes to try to accomplish this:  Drupal, Plone and Daisy.  I’ll go through each and where I ran into a snag for each.  I want to reiterate here that I know next to nothing about any of these.  My problems are probably not shortcomings of the project themselves, but more due to my own ignorance.  If you see possible solutions to my issues (or know of other possible projects that fit my need even better) please let me know.  This is a cry for help, not a review.

Drupal

One of the reasons I targeted Drupal is that it’s easy to get running, can run on cheap shared hosting, has quite a bit of traction in libraries and has CCK.  I actually got the farthest with Drupal in this capacity.  With CCK, I was able to, in the native Drupal interface, build content types for Resources and Items.  For Actors, I had just planned on using regular user accounts (since then I could probably piggyback off of the contributed authentication modules and whatnot).  Collections would be derived from Taxonomies.

Where things went wrong:

My desire is to decouple the ‘data load’ aspect of the process from the ‘bag of holding’ itself.  What I’m saying is that I would prefer that the MARC/borrower/item status/etc. load not be required to be built in Drupal module, but, instead, be able to be written in whatever language the developer is comfortable with and a simple way of getting that data into the bag of holding.

There are only two ways that I can see to use an external program to get data into Drupal:

  1. Use the XMLRPC interface
  2. Simulate the content creation forms with a scripted HTTP client.

I’m not above number two, but I would prefer not to if there’s a better way available.  The problem is that I can find almost zero documentation on the XMLRPC service.  What ‘calls’ are available?  How do I create a Resource content type?  How do I relate that to a user or an Item?  I have no idea where to look.  I don’t actually even know if the fields I created will be searchable (which was the whole point of making them).

Drupal seems promising for this, but I don’t know where to go from here.

Plone

I really thought Plone was going to be a winner.  It’s completely self-contained (framework, webserver and database all rolled into one installer) and based on an OODB.  Being Python based, I feel I can fall back on Python to build the scripts to actually do the dirty work of massaging and loading the data.  The downside to Plone (and I have looked eye-to-eye with this downside before) is that it and Zope are total voodoo.

It didn’t take me long to run into a brick wall with Plone.  I installed version 3.2.1 thanks to the handy OSX installer and got it up and running.

And then I couldn’t figure out what to do next.  I think I want Archetypes.  I followed the (outdated)  instructions to install it.  I see Archetypes stuff in the Zope control panel.  However, I never see anything in Plone.  I Google.  Nothing.  Feeling that it must be there and I’m just missing something I follow this tutorial to start building new content types.  I build a new content type.  It doesn’t show up in the Plone quick installer.  Nothing in the logs.  I Google.  Nothing.

Nothing frustrates me more than software making me feel like total dumbass.

I am at the point where I think Plone might be up to the task, but I don’t have the interest, time or energy to make it work.  At the end of the day, I’m still not entirely sure that it would meet my basic criteria of the ‘content type’ being editable within the native web framework anyway.  I also have no idea if my plan of loading the data via an external Python (or, even better, Ruby) script is remotely feasible.

Plone got the brunt of my disgruntled tweeting.  This is mainly due my frustration at seeing how well Plone would fit it my vision and being able to get absolutely nowhere towards realizing that goal.

Daisy

What, you’ve never heard of it?  I have a history with Daisy, and I know, without a doubt, it could serve my needs.  The problem with Daisy is that it has a lot of working parts.  To do what I want, you need both the data repository and the wiki running, as well as MySQL.  On top of that, some external web app would need to actually do the Jangle stuff (and, this would most likely be Ruby/Sinatra) interacting with the HTTP API.  This is a lot of running daemons.  A lot of daemons that might not be running at any given time which would break everything.  Daisy is a lot of things, but it’s not ‘self-contained’.

This is not a criticism.  If I was running a CMS, this would be ok.  When I was developing the Communicat, this was ok.  Those are commitments.  Projects that you think, “ok, I’m going to need to invest some thought and energy into this”.

The bag of holding is a stop-gap.  “I need to use this until I can figure out a real way to accomplish what I need”.  Maybe it’s the ultimate production service.  That’s fine, but it needs to scale down as far as it scales up.  I literally want something that somebody can easily get running anywhere, quickly and start Jangling.

If anybody has any recommendations on how I can easily get up and running with any of the above projects, please let me know.

Alternately, if anybody knows something else, a simple, remotely accessible dynamic, searchable data store, definitely enlighten me!  I realize the irony of this plea, given who I work for, but the idea here is for something not cloud based, since I would like for the user to be able to load in their sensitive patron data without having to submit it to some third party service.  There’s also the fact that there’s no front end that I can just ‘plug in’ to manage the data.

If I can’t get anything off the shelf working, I think I’ll be reduced to writing something simple in Merb or Sinatra with CouchDB or Solr or something.  I was really hoping to have to avoid doing this, though.

“[S]training-to-be-clever acronyms department”

Project names are very important to me in a “the clothes make the man” sensibility.  I’d prefer to leave an application untitled rather than have a contrived or pedestrian name attached to it (which is why my EAD publisher app never had a title — nothing non-EAD-derogatory ever came to me).

Often, the name is the only part of a project that “works” (see:  FancyPants, CommuniCat etc.), so, in my mind, it’s important that it’s memorable enough that people (coworkers, mostly) pay attention to the initial pitch so you don’t have to explain its functionality every time.  When you are in the brainstorming/gathering-enough-interest-to-get-the-green-light phase, everything’s about marketing.  And, for me, that means a good working title.  I don’t like acronyms, usually, because I don’t think they’re terribly interesting (WAG the Dog’s acronym notwithstanding – I liked that one).

Anyway, “Jangle” was there before I was.  I don’t have to like the name to think the project is worthwhile.  Still, sometimes it pains me to spend all my time on a project that I had no influence over the title.  Silly, yes, but that’s why my pants are so fancy.

My relationship with Ruby nowadays is roughly akin to somebody addicted to pain killers.  I know it’s not good for me (since everything I work on nowadays is RDF, XML or both) but I’m able to still be productive and the pain of quitting, while in the long run would be better for everybody, just isn’t something I have time for right now.  Maybe someday I’ll make the jump back to Python (since it’s actually pretty good at dealing with both RDF and XML), but for now I’ll just find workarounds to my problems (unlike others, I am completely incapable of juggling more than one language).

I first ran into my big XML and Ruby problem a couple of weeks ago while working on the TalisLMS connector for Jangle.  I’ve, of course, run into it before, but it has never been a total show stopper like this.  In order to add the Resource entity to the TalisLMS (Jangle-ese for bibliographic records) connector, I am querying the Platform store the OPAC uses.  I’m using the Platform rather than the Zebra index that comes with Alto (the records are indexed in both places) because the modified date isn’t sortable in Zebra and that would be an issue when serializing everything to Atom.  The records are transformed into a proprietary RDF format (called BibRDF) when loaded into the Platform (this is for the benefit of Prism, our OPAC).  In order to get the MARC records (there’s no route back to the MARC from BibRDF), I have to pull the UniqueIdentifer (which is the mapped 001)  field out of the BibRDF and throw them in a Z39.50 client (Ruby/ZOOM) and query the Zebra index.  In order to get enough metadata to create a valid Atom entry, I needed to be able to parse the BibRDF (which comes out of the Platform as RDF/XML), since that is the default record format.

And this is where I’d run into problems.  I have the default number of records set to be returned by the Jangle to 100.  That’s a pretty sweet spot for both servers to handle the load and clients to deal with resulting Atom document.  Well, you’d think it was, anyway, except REXML was taking about 10 seconds to parse the Platform response into Ruby objects.

I realize the Rubyists out there are already dismissing this and scrolling down to the comment box to write “well don’t use REXML, you dumbass”, but let me explain.  I generally don’t use REXML (unless it’s something very small and simple), instead opting for Hpricot for parsing XML.  I’ve tended to avoid LibXML in Ruby, when I first tried it, it segfaulted a lot, but that was the past… my reasons for avoiding it lately is because I have this stubborn ideal about having things work with JRuby and that’s just not going to be an option with LibXML (before you scroll down and add another comment about the Ruby/ZOOM requirement, it will eventually be replaced with Ruby-SRU… probably).  Hpricot was falling flat on its face with the BibRDF namespace prefixes, though (j.0:UniqueIdentifier).  It seems to have problems with periods in the prefix, so that was a no go.

So I had REXML and I had horrible performance.  Now what?

Well, JSON is fast in Ruby, so I thought that might be an option.  The Platform has a transform service, if you pass an argument with the URL for an XSLT stylesheet, it will output the result in the format you want.  Googling found several projects that would turn XML into JSON via XSLT (this one seems the best if you have an XSLT 2.0 parser), but they weren’t quite what I needed.  I wanted to preserve the original RDF/XML since I was just going to be turning around and regurgitating it back to the Jangle server, anyway.  I just needed a quick way to grab the UniqueIdentifier, MainAuthor and LastModified fields and shove the rest of the XML into an object attribute.

I have always chafed at the thought of actually doing anything in XSLT.  In retrospect (after I’ve been using almost exclusively for a month, now), I realize that my opinion was probably actually the result of the data that I was trying to transform (EAD, the metadata format designed to punish technologists) rather than XSLT itself (the project got sucked into a vortex when I tried working with the EAD directly with Ruby, too).  Still, I had always resisted.  The syntax is weird, variables confused me, I just never got the hang of it.

But, damn, it’s fast.

And, when I turned the XML into JSON (with XML), it was perfect.  Here’s my stylesheetHere’s what the output from the Platform looks like.  Here’s the output from the TalisLMS connector.

I wasn’t done, yet, though.  The DLF ILS-DI Adapter for Jangle’s OAI-PMH service was sooooo slow.  Requests were literally taking around 35 seconds each.  This was because I was using FeedTools to parse the Atom documents and Builder::XmlMarkup to generate the OAI-PMH output.  And this was silly.  Atom is a very short hop to OAI-PMH, and there was really no need to manipulate the data itself at all.  However, I did need to add stuff to the final XML output that I wouldn’t know until it was time to render.  So I wrote these two XSLTs.  I have patterns in there which are identified by “##verb##” or “##requestUrl##”, etc.  This way, I can load the XSLT file into my Ruby script, replace the patterns with their real values via regex, and then transform the Atom to OAI-PMH using libxslt-ruby.  Requests are now down to about 5 seconds.  Not bad.

All in all I’m pretty happy with this.  And I don’t have to quit my addiction just yet.

For those of you that noticed that libxslt-ruby doesn’t quite jibe with my JRuby requirement, well, I guess I’m not a very dogmatic at the end of the day (which is right about now).

I have been following a thread on the VuFind-Tech list regarding the project’s endorsement of Jangle to provide the basis of the ILS plugin architecture for that project.  It’s not an explicit mandate, just a pragmatic decision that if work is going in to creating a plugin for VuFind, it would make more sense (from an open source economics point of view) if that plugin was useful to more projects than just VuFind.  More users, more interest, more community, more support.

The skepticism of Jangle is understandable and expected.  After all, it’s a very unorthodox approach to library data, seemingly eschewing other library initiatives and, at the surface, seems to be wholly funded by a single vendor‘s support.

And, certainly, Jangle may fail.  Just like any other project.  Just like VuFind.  Just like Evergreen.  Any new innovative project brings risk.  More important than the direct reward of any of these initiatives succeeding is the disruption they bring to the status quo.  Instead of what they directly bring to the table, what do they change about how we view the world?

Let’s start with Evergreen.  Five years ago I sat in a conference room at Emory’s main library while Brad LaJeunesse and Jason Etheridge (this predated PINES hiring Mike Rylander and Bill Erickson) told us that they were ditching Unicorn and building their own system.  I, like the others in the room, Selden Deemer, Martin Halbert, smiled and nodded and when they left I (Mr. Library Technology Polyanna) turned to the others and said that I liked their moxie, but it was never going to work.  Koha was the only precedent at the time, and, frankly, it seemed like a toy.

Now where are we?  Most of the public libraries in Georgia using Evergreen, a large contingency from British Columbia migrating, and a handful of academic libraries either live or working towards migration.  Well, I sure was wrong.

The more significant repercussion of PINES going live with Evergreen was that it cast into doubt our assumptions of how our relationship with our integrated library system needed to work.  Rather than the library waiting for their vendor to provide whatever functionality they need or want, they can instead, implement it themselves.  While it’s unrealistic for every library to migrate to Evergreen or Koha, these projects have brought to light the lack of transparency and cooperation in the ILS marketplace.

Similarly, projects like VuFind, Blacklight and fac-back-opac prove that by pulling some off-the-shelf non-library-specific applications and cleverly using existing web services (like covers from Amazon) that we can cheaply and quickly create the kinds of interfaces we have been begging from our vendors for years.  It is unlikely that all of these initiatives will succeed, and the casualties will more likely be the result of the technology stack they are built upon rather than any lack of functionality, the fact that they all appeared around the same time and answer roughly the same question, shows that we can pool our resources and build some pretty neat things.

To be fair, the real risk taker in this arena was NC State.  They spent the money on Endeca and rolled out the interface that wound up changing the way we looked at the OPAC.  The reward of NCSU’s entrepreneurialism is that we now have projects like VuFind and its ilk.  Very few libraries can afford to be directly rewarded by NC State’s catalog implementation, but with every library that signs on with Encore or Primo, III and Ex Libris owe that sale to a handful of people in Raleigh.  You would not be able to download and play with VuFind if NC State libraries had worried too much about failure.

Which then brings me to Jangle.  The decision to build the spec on the Atom Publishing Protocol has definitely been the single most criticism of the project (once we removed the confusing, outdated wiki pages about Jangle being an Rails application), but there has been little dialogue as to why it wouldn’t work (actually, none).  The purpose of Jangle is to provide an API for roughly 95% of your local development needs with regards to your library services.  There will be edge cases, for sure, and Jangle might not cover them.  At this point, it’s hard to tell.  What is easier to tell, however, is that dwelling on the edge cases does absolutely nothing to address the majority of needs.  Also, the edge cases are mainly library-internal-specific problems (like circulation rules).  A campus or municipal IT person doesn’t particularly care about these specifics when trying to integrate the library into courseware or some e-government portal.  They just want a simple way to get the data.

This doesn’t mean that Jangle is solely relegated to simple tasks, however.  It just is capable of scaling down to simple use cases.  And that’s where I hope Jangle causes disruption whether or not it is ultimately the technology that succeeds.  By leveraging popular non-library-specific web standards it will make the job of the systems librarian or the external developer easier, whether it’s via AtomPub or some other commonly deployed protocol.

After several months of trying, Jangle.org is finally starting to take off.  I set up a Drupal instance yesterday on our new web host.

When I was still at Georgia Tech, one of the things I was trying to work on was a framework to consistently and easily expose the library’s data from its various silos into external services. In that case, my initial focus was the Sakai implementation that we were rolling into production, but the intention was to make it as generic as possible (i.e. the opposite of a “Blackboard Building Block“) so it could be consumed and reconstituted into as many applications as we wanted.

Coincidentally (and, for me, conveniently), Talis was also thinking about such a framework that would supply a generic SOA layer to libraries (and potentially beyond) and contacted me about possibly collaborating with them on it as an open source project. Obviously that relationship changed a bit when they hired me and they put me and my colleague Elliot Smith (reports of his demise have been greatly exaggerated) in charge of trying to get this project off the ground. Thankfully, Elliot is the other Talis malcontent who prefers Ruby, so our early prototypes all focused on Rails (the Java that originally seeded the project, like all Java, made my eyes glaze over).

We had a hard time getting anywhere at first. Not even taking into consideration the fact that he and I were an ocean apart, we really had no idea what it was that we should be building or why it would be useful to Talis (after all, they are paying the bills) since they already have an SOA product, Keystone. Also, we didn’t want to recreate Apache Synapse or Kuali Rice. In essence, we were trying to find a solution to a problem we hadn’t really defined, yet.

In December and early January, I drove across town for a couple of meetings with Mike Rylander, Bill Erickson and Jason Etheridge from Equinox to try to generate interest in Jangle and, at the same time, solicit ideas from them as to what this project should look like and do. Thankfully, they gave me both.

Jangle still foundered a bit through February. We were waiting for the DLF’s ILS and Discovery Systems API recommendation to come out (since we had targeted that as goal) and Elliot produced a prototype in JRuby (we had long abandoned Rails for this) that effectively consumed the Java classes used for Keystone and rewrote them for Jangle.  The problem we were still facing, though, is that we were, effectively, just creating another niche library interface from scratch and there were too many possible avenues to take to accomplish that.  Our freedom was paralyzing us.

I gave a lightning talk on Jangle at Code4lib2008 that was big on rah-rah rhetoric (free your data!) and short on details (since we hadn’t really come up with any yet) that generated some interest and a few more subscriptions to our Google Group.  A week later, the DLF met with the vendors to talk about their recommendation.   I attended by phone.  While in many ways I feel the meeting was a wash, it did help define for me what Jangle needed to do.

At the end of my first meeting with Equinox, Mike Rylander asked me if we had considered supporting the Atom Publishing Protocol in Jangle.  At the time, I hadn’t.  In fact, I didn’t until I sat on the phone for 8 hours listening to the vendors hem and haw over the DLF’s recommendation.  The more I sat there (with my ear getting sore), the more I realized that AtomPub might be a good constraint to get things moving (as well as useful to appealing to non-library developers).

We are just now trying to start building how this spec might work.  Basically there are two parts.  First, the Jangle “core” which is an AtomPub interface to external clients.  It’s at this level that we need to model how library resources map to Atom (and other common web data structures, like vCard) and where we need to extend Atom to include data like MARC (when necessary).  The Jangle core also proxies these requests to the service “connectors” and translates their responses back to the AtomPub client.  The connectors are service specific applications that takes the specific schema and values in, say, a particular ILS’s RDBMS and puts them in a more syntax to send back to the Jangle core.  Right now, the proposal is that all communication between the core and connectors would be JSON over HTTP (again, to help forward momentum).

So at this point you may be asking why AtomPub rather than implementing the recommendations of the DLF directly?  The recommendation assumes the vendors will be complicit, uniform and timely in implementing their API and I cynically feel that is unrealistic.  I also think it helps to get a common, consistent interface to help build interoperability (like the kind that the DLF group is advocating), since then you’d only have to write one, say, NCIP adapter and it would work for all services that have a Jangle connector.  Also, by leveraging non-library technologies, it opens up our data to groups outside our walls.

So, if you’re interested in freeing your data (rah-rah!), come help us build this spec.  We’re trying to conform to the Rogue ’05 specification that Dan Chudnov came up with for development of this so, while it will still be a painful process, it won’t be painful and long. :)  In other words, this ain’t NISO.