Daily Archives: August 18, 2006

A Valuable Disruption?

Two worthwhile articles on the potential impact of digitizing books – and the challenge Google has posed to concepts of copyright and fair use – are well worth a read.

The Washington Post presents the Google position – isn’t it a great idea to digitize all books so they can be discovered? Gee, whiz! And the publisher’s position – not unless we get a piece of the action and our rights are upheld. Part of the trouble is that nobody’s quite sure what exactly the action is or what rights are involved. Certainly, Google can generate some advertising revenue as they do now indexing the Web. But what else might they have up their sleeve? The company’s famous unwillingness to get down to details or talk about the future is a part of the problem. A fair use claim is one thing but other potential but undisclosed uses is another, and it’s clouding the issue. Right now the discovery process is under way for the pending lawsuits, which may force Google to be more forthcoming. (Google – discovery? That’s gonna be interesting.)

In First Monday, Paul Courant, a UMich professor of economics and public policy, discusses fundamentals of libraries and scholarhip in a Google world – and the conflict between sharing information and monetizing it.

Collaboration, across time and space, is the fundamental method of scholarship, and without it we can do nothing of value. Collaboration takes wildly different forms in different disciplines, and how it is done and can be done is affected in different ways by the new information technologies. But a positive (and disruptive) element of the new IT is that almost everywhere it makes collaboration easier, provided we can get at the material. In other words, if we focus on the purposes and mechanisms of scholarship, the new technologies are (or should be) our friends.

So what’s the problem? Current copyright law. He says

most copyrighted and potentially copyrighted material has no street value, while the corpus of it is of great value indeed. Somewhere between 95 and 97 percent of the copyrighted material in the University of Michigan libraries is out of print. The cost of getting permissions and finding rights holders for the vast quantity of material that is neither current nor very old can be prohibitive (Covey, 2005). And there is no gain in this for anyone.

Some kinds of current research and publication are rendered nearly impossible by copyright. My colleague Margaret Hedstrom (2005) offers the following example: An independent filmmaker is working on a project to produce a documentary on the evolution of video and computer games and gaming culture. She wants television coverage, commentary on games, ads, clips from movies of people playing games, interviews with developers, enthusiasts, and the games themselves. She may want pictures of old games to show the evolution of the genre. To proceed, the filmmaker will need researchers to find the material and its owners, and legal counsel to clear rights and to gain permission to reproduce few seconds or minutes of everything. She will need permissions for the interviews. And more. All of this imposes costs over and above the cost of doing the research itself, plausibly a multiple of the research costs.

The preceding is just one stylized example. The current rights environment makes it extremely difficult to use any commercially–produced material, including advertising, in scholarly work. While scholarly work should generally be considered a fair use, it is expensive and risky to make the case. As a result, otherwise feasible projects that would be of social and academic value will simply not get done, and valuable work will not get published solely because of the risk of lawsuits.

This is a system that must be disrupted, and for reasons that I hope will become clear later in this discussion, I believe that the digitization projects that Google and others are undertaken will help in the disrupting.

Will Google’s digitization of in-copyright books destroy publishing? Not if they only let searchers see snippets. Will it save scholarship? Not if researchers can’t get their hands on the whole book. Will it stir things up? No question about it. And I agree with Courant: it’s about time.

Are Web Searchers Getting Better

Some new research coming out of the University of Indiana in Bloomington suggests that search engine users are improving their results as evidenced by their use of more search terms. This would seem to contradict earlier research that indicates that 6 out of 10 search engine users never use more than one word in their searches. This new study was designed to determine if a true “Googlearchy” exists. This refers to a popular notion that engines that rank results by the popularity of sites provide inherently unfair results because they favor the most popular sites and help them to grow even more popular, which may prevent far better sites from being retrieved in search results.

So the researchers set up a study where they examined the results obtained by two different types of searchers, those who only used search engines and those who browsed without search engines, instead following links from one page to another. So what happened?

[The researchers] expected the real-world data to fall somewhere between the two extremes: targeted searching and haphazard surfing. Instead, it turned out that typical Web use — presumably a combination of searching and surfing — concentrated less on popular Web sites than either model had predicted. In other words, real-world Web searching does not fuel the Googlearchy nor does it keep less-popular sites from being found.

The researchers said the outcome appears to be based on the trend that:

more and more people are searching for more specific information. If someone submits a general query, say, “bird flu,” the results at the top of a search-engine’s results page will indeed list high-traffic websites, for example, the Centers for Disease Control site. And that site’s popularity will be reinforced. But Web searches are becoming increasingly more complex, according to Menczer. A search for “bird flu Turkey 2005″ will bring up far fewer results, and lead to more obscure pages.

So I’m questioning if searchers really are getting more sophisticated in the way they do their searches? I still tend to see many of our students using just one word or typing in rather long, formally structured sentences (usually something taken right out of an assignment). Of course, other researchers questioned the results of the Indiana studying, suggesting there were some issues with the data used and whether the searchers in the study really represented average Internet searchers. Those issues aside, as academic librarians we should be eager to promote the gist of the research findings. As one of the researchers put it, “the message here is that as soon as you become a slightly more sophisticated searcher, then you’re breaking the spell of the Web,” meaning that when you take the time to develop a more thoughtful search strategy you take greater control over the search results rather than just settling for the most popular sites that an engine like Google spits back. And even if we can convince students about the benefits of using a more sophisticated search (i.e., more than one word), we still need to contend with earlier studies that indicate only 3% of searchers tie words together with quote marks, and a mere 1% use other advanced search techniques to get better results. Just another reason why a little user education could go a long way towards helping our user communities get better, less biased search results.