Finally a component based library environment

At IgeLu 2008 in Madrid, the international Ex Libris user group meeting, Ex Libris explained their future strategy. Their software products are going to be part of a component based architecture with open interfaces. They are going to use one central data store for all meta data, a Unified Resource Management System to control this information and a Unified Resource Discovery and Delivery environment to present information to the end user.

In Wageningen we decided to rebuild our own library system some 7 years ago. We discussed buying an Integrated Library System, but realized this sort of product was outdated. The traditional ILS does not cover all the tools a modern library uses nowadays. More importantly, these systems were not open. They did not enable you to build additional components, basically because these systems were not component based to start with.

Because of this, for example, almost every library ends up with separate systems for their electronic resources and for their paper resources, forcing their users to choose between electronic or paper, before they start their search for information.

We decided to develop, what we call, a Library Content Management system, sharing one single data store for all meta data, accessible for all system components. We would like to buy these system components, but they were unavailable. The thing that came close was Ex Libris URL resolver. It had it's own meta data store, the so called knowledge base, but because of its open interfaces we could extend it to use our own content management system as an additional knowledge base. So we decided to buy this component and develop all other components ourselves. (Actually we also bought Ex Libris Meta search product, Metalib, because it also fits in our architecture but there are a lot of issues when it comes to meta searching, which I will discuss in a future blog post.)

Now Ex Libris chooses to follow a similar strategy. They have chosen better names. I prefer Unified Resource Management System over Library Content Management System, which we have chosen. We had some internal discussion about this and I feel that the term Content Management System has been misused so much over the last decade, we did the wrong thing picking this name. (Although I do feel it covers quite well what the system is doing)

I think we would have invested in Ex Libris URM solution, if it would have been introduced 6 years ago. Even though it still has to prove itself, since it is basically drawing board technology for the moment, I feel they have chosen the right architecture. I suppose they can do a better job then we, with our small development team.
For now we will be very keen on what is evolving and see what we can learn.. If their interfaces are going to be really open, we may want to replace components by Ex Libris products in the future or make our components work within their environment.

But who knows, may be other vendors will see the light as well and we will have even more choices in the future.


Hooray for OCLC Pica customer response !

In my post about our Google Book Search implementation, I mentioned that we could only do this for records containing an ISBN. Google also accepts other identifiers, like oclc numbers and the numbers of the Library of Congres.
We don't have that data recorded, but all of our records end up in Worldcat. Time to get in touch with OCLC pica, the European branch of OCLC, which manages our Dutch Union Catalog and makes sure these records are also uploaded to Worldcat.

I had a short but fruitful email communication. Their first reaction helped me to understand that I can locate a corresponding record in Worldcat using a URL containing the PPN (Pica Production Number) which we do record, since that is the identifier for the Dutch Union Catalog. That's neat. I can now point to Worldcat's 'Find in a Library close to you' page, from the catalog record, for books out on loan or non Wageningen UR users. On the resulting page the OCLC number is present. I could do some page scraping (which is pretty easy, since we only use XML tools and worldcat returns XHTML pages (bravo !)), but it is not elegant and pretty slow as well. I mentioned this to OCLC pica and also pointed that the link from Worldcat to our local catalog should always be based upon the PPN. (Worldcat only does this for non ISBN holding records and accidentally does this using the OCLC number in stead of the PPN). OCLC Pica responded quickly that it was indeed better to have a small service to request for the OCLC number when providing a PPN and that they would make this available to me before the end of March. I was astonished. Isn't that great. When I thanked for this immediate response, I took the liberty to request if they would add the Library of Congres number with the response as well. Thanks Martin.


Hooray for Google customer response !

It was not easy to find an appropriate response form on Google's web site to complain about my problems using the Google Books API. I found a form that was supposedly for authors and publishers wanting to advocate their book on Google Books and used it. Google responded today:
Thank you for notifying us of this problem regarding our API. I have forwarded these issues on to our specialists, who will look into the matter. Please feel free to reply to this email if you have any further details about the difficulties you are experiencing.

The Google Book Search Team

I noticed earlier (when Google introduced URL resolving in Google Scholar) that they can be quite responsive. Of course I haven't got a solution yet. But have you ever had a response on your problems with Microsoft, even though we pay them for their products ?


Google books API. Do they really want you to use it ......

Last week there has been a lot of discussion about the Google Books API, allowing one to check whether Google has a book description, can provide you with a cover and tell you whether it has scanned the book completely or partly. Examples for scripts appeared on the Google books site, Tim Spalding gave examples on the LibraryThing Thingology blog and Godmar Back responded with some alternate scripts on the code4lib discussion list.
Ex Libris announced proudly that they had implemented the 'About this book' product into their products and that it only took a week to get the link in place. Sunday evening at 11:00 pm. I decided to see wether it would be difficult to implement this into our OPAC. Just after midnight I had implemented it and it has been running since.
As Wouter Gerritsma explains in his blog, we can only check Google for a book, when we have an ISBN. Now we want to be able to do it for books that have not got a ISBN, using the OCLC number which we have not registered in our records. However, we do have a PPN (Pica Production Number) and OCLC Pica makes sure our titles end up in worldcat, so we should be able to get hold of the OCLC number.

So far so good.
But Google has some policies that obstruct the usage of their API. A product like SFX may suffer severely from this. (Depending on the way they are going to implement this) It surely affected our implementation severely and now I am trying to find a way to get around this.

I don't know if you have ever experienced to end up with the We're sorry .... message of Google, telling you that you probably are infected with spyware or some virus. (Some people are really shocked when they see this warning !!)
Google sends this message when it detects 'anomalous queries' from one single IP adress. We occasionally see this error in Wageningen and I am not sure if some computer on the university network does some extreme Google access or whether it is just busy with people searching Google. All the request from the network look like coming from one or just a few computers to Google, due to the network address translation on the firewall. Anyway, Google books seems to suffer much harder from this problem than other Google services. Just a few hours after implementation, the API did not respond with a JSON object (containing the requested information for this service) but with an ordinary html page, the 'We're sorry page' messing up this service completely.
I can hardly believe this is just caused by implementing this service, so I have now defined a ProxyPass directive on the web server so requests to Google for the API go via our library web server. Google will see all requests coming from this server now. This way we avoid it to see the requests coming from the firewall gateway and we will not suffer from all other Wageningen UR PC's searching Google. If this does not solve the problem, I will be sure that Google will see normal usage of the API as unwanted traffic. If so, what kind of API are they offering us ?? For the Google Map API or the Google Custom Search API they have a so called access key to use this service, I guess that would be the way to go for this API to prevent unwanted use.


Validating XML records and validating input, how to proceed ?

WebQuery posts forms to an XML data base. To be honest, the Oracle data base exists of tables with just a few columns. The most important is a CLob (Character Large Object) containing the actual XML record. The only real reason we put it in an Oracle table at this time is, because we make use of SQL Text Retrieval to get the stuff out. So we don't use Oracle at all to do any record validation or constraint checking. We are happy with this, because we want to be independent of proprietary data base features. Before posting the record, the XML is validated against its schema. We have two problems with this. The first problem is, that this validation is done at the server side. Often this is not very user friendly and we do additional javascript validation at the client level. The second problem is, that the schema language does not have sufficient validation syntax. What we need is something like schematron. However, this is not widely used yet.

An issue related with this is the generation of forms for updating and modifying XML records. We have a lot of them and up until now they have mostly been developed one by one. It would be most efficient to make them automagically, based on the schema. Schema is not suited for this either and I guess schematron is also insufficient.

At this moment we are working on XML files that provide the information to build forms automatically using general XSL style sheets. We are looking at how to avoid definitions to be redundantly defined. For example: At this moment field enumerations are defined in the schema. However, they are also defined in a data base, so they can be easily edited. We definitely have to come up with a more consistent architecture for this. It is not as if we are the only ones in this world that are trying to solve this problem, are we ?


V-sources, of course there is a Dutch library using an ERM solution

Joost de Vletter from Eindhoven University addressed me to say they have developed V-sources an ERM system, which they use and is going to be used by Delft University as well. Two posts ago I said I did not know of any Dutch libraries using an ERM system.
I was thinking of the efforts of the consortium of Dutch University libraries and the Royal Library to select a system. These efforts were unsuccessful so far. I forgot that Eindhoven invested in developing a system themselves, which will also be available commercially.

Joost also reacted on my remarks about the lack of integration of management of paper and electronic subscriptions. He answered that their system, like most other ERM systems is based on the DLF (Digital Library Federation) ERMI document. This describes a data model that does not consider the administration of paper subscriptions. He is right and following standards is a very good thing. And since these new library system components are far more open then before, it is probably easier to link between the paper subscription in one system and the electronic subscription in the other. However, the 'old' serials management systems of most vendors will not be that open, which will probably make integration more difficult. I sometimes think that vendors use the component based architecture argument to just sell more sytems.

Don't get me wrong, I am all for component based and even service oriented architectures, but not for just a part of our systems. Our management problems with electronic subscriptions, with mostly 'big deals' do not justify a separate ERM system. We can solve our problems with extra features within the serials management system. However, if there would be a national ERM system, shared between university libraries we would also benefit from a single point of administration. That would be a reason to implement ERM, assuming it would have a rich set of web services available. I heared from Thomas Place of Tilburg University, he is thinking in this same direction and I think most Dutch university libraries will consider this a good way to go.


European Library Automation Group - 32nd ELAG Library Systems Seminar - 14-16 april 2008

We have just opened the web site we have created (using WebQuery of course)for this years ELAG conference, which we will host in April. I will present a paper on our "Digital Library" as we call our library website. I will speak about our decisions and efforts to start building a Library Content Management System. There will be a lot of other interesting papers as well. Another exciting thing about ELAG meetings are the workshops. ELAG is not about listening to presentations only. You participate in one workshop during the conference and discuss a subject. Workshop reports are presented on the last day. This year you can also come and talk, without having to prepare a paper!!! You can express your ideas in 5 minute ligthning talks. So join us. Be fast. Convenient hotel accomodation nearby is limited.


Serials Management, the last conversion

This week we hope to start using our new serials management application. The last application that is still running on our old system. Cardex, the name of the application, is now based on the new CMS. It is entirely build using XSLT, some javascript and a bit of perl that does background printing and emailing of claims for missing issues or stagnating subscriptions. The old version, which has been used until now was build in 1982 (to be honest, the first patch of the code dates back to november 1982, so I presume 1982 must have been the year it was born). It was based on Minisis and written in SPL, a Pascal like proprietary programming language for the HP3000 series computers. Although the application was old, it had evolved over the years and people were pretty happy with it. We kept the same application properties, but the application, a web application now, has got a completely different look and feel. It now has all the old features and some more. Next step is to put in new functionality concerning electronic subscriptions or to integrate it with components for this, so called Electronic Resource Management (ERM) systems. The problem is, that I have not seen systems that make the connection with traditional serials management systems, which is quite strange, especially since subscriptions are often paper plus digital content (ok, it is changing). They do integrate with Open URL resolvers and sometimes with cataloging components. It feels like these systems force to separate paper and electronic serials management, just like in most systems, cataloging digital content (in meta search portals and OpenURL systems) is separated from traditional cataloguing of paper content in a ILS. Vendors have not split up their traditional ILS's into components yet.
For now I think we have to built these features ourselves. So far, no Dutch library I know has implemented a ERM system. Am I wrong ?


By the way ......

One of my last posts in Dutch I mentioned we were running WebQuery version 5.34. Now, two years later we are running version 5.52 This latest version introduced PAM (plugable authentication module) authentication. This is our first step towards implementing federated authentication.

Code reviewing

We are working with 8 people on the sytem now. Six of them do application development and we have reached a point where we have to consider implementing some quality control tools. One of them is code reviewing. We started with this recently. We have found it too much to start doing it systematically, for the moment. So we do not have every bit of code reviewed. We will take a piece of code every month and one of us, not the developer, will write e review and present this to the rest of us. We spend a morning discussing the review. This will lead to agreements on coding standards and one of us will document this in our wiki.
So far we have had only one review. The discussions at the moment are very much about very basic standards we have never explicitly formulated. I suppose that later on we will discuss more local practices. I think we have found an excellent way to create acceptable standards. Everybody is excited about it.

And now I will continue ...

I have been very quiet on this blog for two years !!
We have been working hard on implementing all library applications using the LCMS and I haven't found the urge to report about it. However, blogging has become more popular over the last two years and reading blogs as well. I will also start blogging in English (please be merciful !) since this might appeal to a much larger community.