2009-06-16

Using Worldcat services in the library catalog

For the first time we have implemented features into our library catalog that are 'invisible' to most of our library users.
I was inspired to build in these features during my participation in the Worldcat mashathon in Amsterdam on 13-14 May 2009. It was easy to build them in and I hope we do serve unexpected audience this way.
So what are these features ?

If you have ever tried searching worldcat you have probably noticed you can see which libraries hold the title you have just found. At the mashathon I learned that there is an API to give you a list of all the libraries holding copies of a title that is identified by a specific OCLC number. In our library catalog we do not hold the OCLC number, but as I described in a previous post, we can address a OCLC Pica service to get the OCLC number belonging to the dutch PPN which we do hold.

This allowed me to look up all libraries holding copies of a title in our catalog and show them in our full record display. Well, this is not really true. Worldcat will only return the libraries in your 'neigborhood'. This is a pretty large area that is defined by the IP address you are using the service from.
OK, it worked, but I had two problems with it. Firstly. Why would we bother our users with other libraries when we hold the title ourselves ? And secondly. The service always returned libraries in the Wageningen area, since the service was addressed from our server, based in Wageningen. Even though this is a pretty large area, showing not only Dutch libraries, but also German and British libraries and sometimes even libraries on the east coast of Canada and the USA, this is not what you want.

Fortunately, the service allows you to provide an IP address in the request, which will make it ignore our server IP address and will use the IP address provided to decide which libraries to show. So now we pick up the IP address from the browser user. If this a 'Wageningen IP address' we will not invoke the Worldcat API service, so our users will not be bothered by other library holdings. For other IP adresses we will take the IP adress and provide the user with a list of other libraries in their 'neighborhood' holding the title.

If you follow a link like this you may not only notice this new feature, but you may also notice another feature. A feature that will only be shown to you when your IP adress can be related to a library in the Worldcat registry. And only if the person that registered that library also registered a base URL of an OpenURL resolver for that library.
In that case, we will show a link to your local OpenURL resolver to provide you with (mostly full text) services your library will provide for the title you have found in our catalog. Isn't that great !!
One drawback however. Pretty hard to test for us :)

So I am glad Sarah Miller from the library at the National Science Foundation contacted us about a problem they had with this last service. Her SFX button did not lead to a full text link, for a journal they hold. She even pointed out why. We should add 'sfx.ignore_date_threshold=1' to the OpenURL to show full text links from the journal record. She was right. We had this problem when we implemented SFX years ago. A that time the only way to solve this problem was to add year=999 to the OpenURL. We did not do this, but added this to the OpenURL when it was parsed and discovered to be coming from our own journal catalog record. This still works, but only with our own SFX server. We changed this, so you may actually be able to use this new feature.

There is one potential problem with this service. OCLC restricts access to the Worldcat search API to 5000 requests a day. I hope we do not hit that limit very often. It will be so if more than 5000 requests of a full display of a catalog record a day occur. Well this does happen, mostly because of crawlers. I try to filter out a few known crawlers and do not hit the Worldcat service if they request a full record display. And now just hope for the best. ..

2009-06-05

Moving to SOLR

We are currently running tests with SOLR to index the content of our Library Content Management System. At this moment the system is relying on Oracle Text Retrieval. We do not feel this meets our demands and we find it too tightly knit to our storage solution, which we want to be data base independant. We are planning an architectural description of our complete environment before we start developing, but I do have some ideas on which way to go of course.

We will start leaving our complete system as it is to start with and add an extra index layer. This will allow us to gradually leave Oracle text retrieval and evolve the sytem in stead of disrupting it.
The idea is to store the record id and a root table name after a record is succesfully entered or updated. So imagine someone changes a title of a publication. The id and table containing the bibliographic description is added to a list. The same id and table name will also be registered when a record pointing to it (for example of a book item) is modified. This list has the form of a list of URL's, since every record in our Library CMS is uniquely identified by a URL.

There will be a process reading this file retrieving each URL. The URL will return an XML file, that will be processed by an XSLT. This XSLT can enrich the XML via web services.
For example our own web services, to add information about the book-items to the bibliographic description. We can also add OCLC numbers and LCC headings by consulting web services at both OCLC Pica and OCLC Worldcat.
Many other enrichment of the data may be done before it will be offered to SOLR to index it. This way we will be able to retrieve the records by keys that are not present in the actual record, but which are added just before indexing.

This has great potentials. We can for example check our SFX server for information and add a flag to know before hand a publication is electronically available. We can check Worldcat and add better normalized author names and lots of stuff we might think of later.

We will remain compatible with our current way to search and present results.
We will store the same XML representation in the SOLR index as is stored in our CMS, so we can still use all XSLT's that are currently used to present results.

We will also 'hide' the query interface of SOLR behind our current SRU like interface. Of course we will have to add ways to make use of the new sorting and facetting features we will be able to work with.
This way we also keep our Z39.50 working since it makes use of the same interface to search the LCMS.

Well these are our first ideas, so we will do a lot of tests and rethinking I suppose. I just like to share it with the readers of this blog, so anyone can jump in with even better ideas.
I hope to share some more results with you at the end of this summer. I am really excited about this new journey we are taking.