Siva Vaidhyanathan Questions Google Book Search

Friday at the Drexel University Libraries’ Scholarly Communication Symposium, Siva Vaidhyanathan raised some serious questions about the partnership between libraries and Google in a powerful and provocative analysis of Google Book Search.

Siva is unique in that he combines an in-depth knowledge of copyright, a reasoned appreciation for new technology and a clear love and deep respect for libraries and librarians with a strong sense of social justice and the public good. He is a skilled presenter and powerful speaker. Others wear suits, he wears a black leather jacket. He tends to raise difficult questions. Either we should feel very lucky that he has chosen to cast his critical eye on our issues, or we should feel slightly nervous.

He began by dismissing both Kevin Kelly’s overenthusiastic embrace of the universal library and John Updike’s nostalgic defense of the traditional publishing world. He agreed with Kelly that digitization was a worthwhile goal but asked if Google Book Search was the right way to do it, if now was the right time, and if copyright was up to the task. He also disagreed with Kelly by countering that books are linear for a reason. He conceded that people of the book are racked with anxiety about the future, implying that this may have been a motivating factor pushing libraries into too hasty deals with Google.

Among Vaidhyanathan’s concerns about Google Book Search include some nitty gritty quality issues about the improper ranking and the inadequacy of some search results. He pointed to a search for “copyright” in which the first hit is a book from 1912, and a search for “Copyright Law” that does not pull up the most recent and relevant books. This suggests that despite Kelly’s claims, users would still be better off if they consulted with a librarian. He also searched for some famous literary quotes (“it was the best of times”) that did not turn up their sources, but he did admit that at least one search (“Karl Rove”) turned up two good books at the top of the list and that led Vaidyanathan to information that he previously did not know.

Vaidhyanathan then reeled off 5 questions each for Google and the Google Library Partners:

For Google:
1. What will be the guiding principles for inclusion, exclusion, and rank within the index?
2. What will be scanned first? What order?
3. What safeguards are you taking to ensure user confidentiality and privacy?
4. What will be your metadata standards? Why would one book outrank another?
5. Will you omit certain titles from the index if a government demands it? Or will you merely present snippets to indicate the book’s existence?

For Libraries:
1. Did you insist on assurances that Google would protect user confidentiality and privacy?
2. Did you insist that Google’s index include input from your librarians about quality control, order of inclusion, order and metadata standards?
3. Did Google restrict your use of the digital files in any way? (no obvious restrictions in Michigan contract.)
4. Did you consider the harm to potential markets for publishers who have been selling and leasing digital files? What is the copyright justification for receiving an electronic copy as payment for a transaction?
5. What’s the hurry?

At one point, Vaidhyanathan compared Google Book Search to the Human Genome Project. Here, he claimed, a for-profit company named Celera demonstrated it could do the work better and faster, but governments declined, recognizing that this information should not be privatized. Now Vaidhyanathan became animated, stating that it should be the same for knowledge and asking, “since when is expediency one of the core values of librarianship?” (Ouch. I was taken aback when he said this.) He basically accused some of the world’s top research libraries of rushing into deals with Google in which they did not realize the true value of their holdings, failing to insist on quality control, failing to guarantee user privacy, and damaging their relationships with publishers.

These are serious charges, some of them have surfaced in a previous ACRLog post. As for the Google Library Partners side of the story, their public statements do point to the public good as a motivation for making their collections more widely accessible, and it’s hard to fault them for that. Vaidhyanathan’s comparison to the Human Genome Project seems unfair–no governments were willing to step up to digitize books on anything like the scale of Google Book Search as far as I know. According to the University of Michigan, it would have taken them more than a thousand years to digitize their collection on their previous pace of digitizing. New York Public Library strikes a cautious tone in their statement and hardly seems to be rushing into anything. As for quality control and metadata issues, I don’t know, but the FRBR Blog has reported that a Library of Congress working group on bibliographic control has recently met with Google. It’s an interesting point about library’s relationship with publishers–however there is an argument that GBS will help publishers to sell more books from their back catalogs.

But what about privacy? What were the discussions between the Library Partners and Google on privacy? Vaidhyanathan reminds us that Google is not just any company, but a humongous company with ambitious aims “to organize the world’s information.” Do we want all our information needs to be met and filtered through a lens that utimately has profit as its main aim?

In other interesting thoughts that Vaidhyanathan did not fully expand upon, he said he thought the issue of the libraries receiving an electronic copy as payment for the transaction could be the silver bullet that would lose a court case for Google. I think he also said that that if the project succeeds it will ultimately weaken fair use.

I don’t remember one session at ACRL Baltimore on Google Book Search. If Vaidhyanathan tells us one thing, it’s that as a profession, we need to know more.

UPDATE – A writer at the American Historical Association blog confirms some of Siva’s worries.

11 thoughts on “Siva Vaidhyanathan Questions Google Book Search

  1. On behalf of a few people (e.g., Mary Minow), possibly including myself, I find that this sentence stings a little:

    “Siva is unique in that he combines an in-depth knowledge of copyright, a reasoned appreciation for new technology and a clear love and deep respect for libraries and librarians with a strong sense of social justice and the public good.”

    Unique? Really? Nobody else combines those qualities?

  2. Walt – Sorry, I didn’t mean to slight anyone. Perhaps I was thinking for someone in a field outside of librarianship he brings a unique background and perspective. I apologize to you, Mary Minow or anyone else I may have offended. Would love to hear if you have any further comments on the post.

  3. I’ve commented at length on the Google Library Project/Google Book Search (and the Open Content Alliance) in Cites & Insights, including–a while back–extensive commentary on Siva V’s views. The arguments he’s making now haven’t changed all that much. I can’t see restating those discussions in a comment on a post.

    Clearly I’m with Siva in dismissing Kevin Kelly’s commentary, for what that’s worth. Just as clearly, I believe that calling what Google’s doing “privatizing” library materials is nonsense, since the books are returned unharmed to the libraries and since we know enough about the contracts–with two of them now public–to know that they’re absolutely non-exclusive.

    Beyond that, I’ll refer you to Cites & Insights… And, in fact, had it not been for the “unique” wording, I wouldn’t have commented on the post.

  4. I’m not entirely in agreement with Siva on this one, either, but I am growing more and more concerned about our willingness to compromise on what were solid principles because technology works well when we lighten up on concerns about leaving a trail behind about what we read. When we partner with corporations, don’t we have any leverage to say “we really care about this, so can you design a way to preserve that value in the technology?”

    I’ve heard librarians say we have no business foisting our values on a public that doesn’t care who knows what they read, or at least cares less than they do for the convenience and affordances enabled by giving up that privacy.

    So do we as a profession just say, like Rosanne Rosanna Dana, “never mind” – we were being hysterical, no big thing. Or do we seek some way to make our values more … valued by others?

    I’m not too pleased by the way OCLC is acting Google-like in launching new things without any prior information, as if having a sharp press release is so much more important than actually sharing information about work in progress. As I understand it, in developing their new OPAC interface for libraries, they swore their partner libraries to secrecy until it rolled out. That left a bad taste in my mouth. It’s a Google tactic I deplore, and I hate to see our organizations taking their lead.

  5. Barbara, I’m entirely in agreement with your first two paragraphs (and I’m on record as being disturbed by the notion that librarians can be more casual about privacy than they have been). And, of course, I can’t comment on the final paragraph because I currently work for OCLC.

  6. The Net Generation(tm) supposedly no longer cares about their privacy, seeing it as a minor concern in a life lived online. But I’m not sure how much of that is truth and how much marketing “conversation” meant to justify ever increasing expansion of data gathering. Libraries aren’t companies, no matter how much some think they should be run like such, and privacy should remain one of the core values of the profession.

  7. Having seen Siva present on this topic a year ago I must say that his analysis of the legal effects that may ensue if/when google gets sued over this project was compelling and kinda scary. I won’t try to summarize it here… I couldn’t do it justice. If you get the chance to attend one of his talks I highly recommend jumping at it. His writing is pretty good, too…

  8. re. PingBack above: I thought Realitys was spelled Realities. I don’t understand how scanning errors can be corrected when those who are writing the programs can’t spell…and what about those books that spell incorrectly on purpose…such as found in poetry.

    I find the Google book program is engaged with a total misunderstanding of fair use. And they are doing it for profit as well. While they hide behind it, I believe it is patently illegal…and is one of those cases, where if you have enough money you can do anything you want unless someone stops you…and who has that kind of money? Other than Gates/Microsoft who is now going to court with Google. I question if copyright is any longer a deterrent from illegal copying.

    Being a graduate of Michigan, I am also concerned with what the Michigan library will look like in the future for students. I have to agree with Townsend http://tinyurl.com/37qr3q that NOT enough forethought was made before Michigan got into this, and problems with the scanning, searching, et al will never be fully corrected, leaving critical problems for both retrieval and research reporting.

  9. The Net Generation(tm) supposedly no longer cares about their privacy, seeing it as a minor concern in a life lived online. But I’m not sure how much of that is truth and how much marketing “conversation” meant to justify ever increasing expansion of data gathering. Libraries aren’t companies, no matter how much some think they should be run like such, and privacy should remain one of the core values of the profession.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>