Midterm brings its share of bustle to the library with last minute research questions to ask and copiers and printers to locate. Library staff are also busy negotiating licenses, finalizing renewals, and troubleshooting access to the resources on which faculty and students rely. I’d like to shed some light on a subtler side of the troubleshooting task that, while not a frequent occurrence, is a growing concern for me as a librarian and researcher. The technologies that enable this bustle of research activity can at times inadvertently trigger what publishers call excessive use or excessive downloading. This is considered a breach of contract according to the licenses for these resources. Remedying this breach usually involves working with university IT security to identify, inform, and prevent such use, assuring publishers that the breach is cured, and publishers then unblocking the network IP or IP range necessary to restore access to content.
Recently, I’ve been contemplating researchers’ expectations when working with scholarly content and technology. What technologies are they using? Are they compatible across content provider platforms? How might they trigger excessive use breaches? What exactly is excessive use or excessive downloading in an online research environment?
What publishers think
Sometimes the publisher’s license language specifies the use of bots, link-checker, crawlers, spiders, automated software, and even indexing as excessive or unauthorized. But more often, breaches associated with this activity are not explicitly defined, nor are they put in context of excessive use within the license. This leaves it fairly open to interpretation.
Publishers must consider the perspective of copyright holders, and typically enforce equivalent limitations for online use that they would for physical print materials uses. It sounds reasonable, but because in reality we use print and online resources very differently, such licenses terms may give up fair use and other scholarly exceptions granted by copyright law. Publishers take an even heavier hand when responding to excessive use breaches. Blocking the user’s IP access, or sometimes an entire campus IP range, presumes malicious intent (which it almost never is). This response also exaggerates the stakes involved and misunderstands what is necessary to perform digital research. Strict reinterpretation of print use restrictions in the online environment denies advances in research technology, from basic citation management software to APIs used for text and data mining. It also ignores the very structure of the linked-data world we live in.
What most people think
When users learn that their actions violate library license agreements, their reactions are surprised, apologetic, and most often confused. While some may be aware of the technologies that makes excessive downloading possible, most don’t believe they constitute unethical or unlawful actions. Breach of contract itself is kind of a boogey-man phrase that brings more readily to mind data breaches like Equifax. If people are aware of breaches occurring in academia, attention more often goes to those involving individual student records.
According to one IT security expert I asked, the kinds of scholarly content breaches I’m talking about don’t even register on the scale of data sensitivity or security. Unless credentials were stolen in order to download excessively, it is not security issue; it’s a copyright issue. Publishers who treat copyright infringement as a security issue might be mitigating risk, but they are not serving or educating their customer.
What librarians think
Librarians, naturally, do approach this from the service and education mindset. Increasingly that means a not just serving end-users within the academy, but the general public who pay for the research through their tax dollars. As researchers assert the right to retain copyright of their own content and share it more widely, more diverse collaboration is possible, increasing potential for innovative research discoveries. Libraries assert copyright exceptions and expose inequities in traditional publishing structures in order to make openness for innovation possible as well.
By Fred Benenson - User: Mecredis [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
I’ll digress briefly to the story of Aaron Swartz for illustration and comparison. He was an advocate of openness, yet his deliberate action to hack and release scholarly content provides, I suppose, a perfect case for publishers’ insistence to treat copyright as a security issue. In this case, the breach involved 4 million documents. The scope in numbers (less than 3% of the Equifax breach) pales by comparison, especially considering nature of the data and the consequences (or lack of) to those responsible and to those harmed.
Rarely are scholars’ actions as deliberate or the stakes of intellectual property loss as high as this scholarly breach (or breaches of individuals’ personal data). In fact many legitimate uses of scholarly research technologies are being blocked even to those with “rights” to use them. Some examples of technology uses I’ve seen publishers block include citation management software like EndNote that indexes and stores full text where available. As early as 2006, librarians reported browser technologies that link and open an articles’ cited references, triggering such use. What about mining text and data to discover disciplinary concepts across time and from journal publications that span multiple publishers? Innovating digital researchers are developing their own programming for this, but can they use it? Are there alternatives, and are they open or proprietary?
My role as an acquisitions librarian means I must balance the needs of publishers supplying the content we license with needs of users who access that content for their research and study. That balance falls somewhere between stoic realism and OAnarchy for me. But I’m still a teacher at heart, so educating all sides remains my goal. In the traditional, profit-based publishing system, where flat library budgets mean buying power decreases each year, I must follow open access developments carefully, just as I must work to negotiate the best deal within these existing structures. There is always room in this to educate publishers, librarians, and users.
Learning more about the tools researchers use, wish they had, or wish they could use without being blocked from access is my next goal. In my troubleshooting experience so far, tools like EndNote, Papers on Mac, Abstraktr, RedCap, WGET are just a few. So tell me…
What digital research
(or reference citation management)
technologies are your researchers using?