2008-03-18

Google books API. Do they really want you to use it ......

Last week there has been a lot of discussion about the Google Books API, allowing one to check whether Google has a book description, can provide you with a cover and tell you whether it has scanned the book completely or partly. Examples for scripts appeared on the Google books site, Tim Spalding gave examples on the LibraryThing Thingology blog and Godmar Back responded with some alternate scripts on the code4lib discussion list.
Ex Libris announced proudly that they had implemented the 'About this book' product into their products and that it only took a week to get the link in place. Sunday evening at 11:00 pm. I decided to see wether it would be difficult to implement this into our OPAC. Just after midnight I had implemented it and it has been running since.
As Wouter Gerritsma explains in his blog, we can only check Google for a book, when we have an ISBN. Now we want to be able to do it for books that have not got a ISBN, using the OCLC number which we have not registered in our records. However, we do have a PPN (Pica Production Number) and OCLC Pica makes sure our titles end up in worldcat, so we should be able to get hold of the OCLC number.

So far so good.
But Google has some policies that obstruct the usage of their API. A product like SFX may suffer severely from this. (Depending on the way they are going to implement this) It surely affected our implementation severely and now I am trying to find a way to get around this.

I don't know if you have ever experienced to end up with the We're sorry .... message of Google, telling you that you probably are infected with spyware or some virus. (Some people are really shocked when they see this warning !!)
Google sends this message when it detects 'anomalous queries' from one single IP adress. We occasionally see this error in Wageningen and I am not sure if some computer on the university network does some extreme Google access or whether it is just busy with people searching Google. All the request from the network look like coming from one or just a few computers to Google, due to the network address translation on the firewall. Anyway, Google books seems to suffer much harder from this problem than other Google services. Just a few hours after implementation, the API did not respond with a JSON object (containing the requested information for this service) but with an ordinary html page, the 'We're sorry page' messing up this service completely.
I can hardly believe this is just caused by implementing this service, so I have now defined a ProxyPass directive on the web server so requests to Google for the API go via our library web server. Google will see all requests coming from this server now. This way we avoid it to see the requests coming from the firewall gateway and we will not suffer from all other Wageningen UR PC's searching Google. If this does not solve the problem, I will be sure that Google will see normal usage of the API as unwanted traffic. If so, what kind of API are they offering us ?? For the Google Map API or the Google Custom Search API they have a so called access key to use this service, I guess that would be the way to go for this API to prevent unwanted use.

2 comments:

Anonymous said...

Just a doubt : the IP address that is requesting the Google API (ie your institutional IP) and the IP address that will display the Google Books page may not have the copyright context. I mean, the book may be "viewable" for the former but not for the latter.

Peter van Boheemen said...

I am sure that is not the problem. Besides the fact that there are no copyright restrictions on these books, this message may also occur on ordinary Google queries. For some congiguration reasons on the side of Google, these error messages occur frequently when searching Google books and hardly ever when searching 'Google Web'