My head’s been buzzing since I first read yesterday on the New York Times Bits Blog that coder and activist Aaron Swartz was indicted under federal hacking laws for illegally downloading millions of articles from JSTOR (the full text of the indictment is embedded at the bottom of the post). Since then I’ve read through lots of articles and tweets, news about the case having all but taken over my Twitter stream, including a more in-depth story in today’s Times. And I’m finding that with every article I read I have more questions than answers.

Why’d he do it? Swartz is well known as an information activist and open access advocate, so this question’s not hard to answer. I’d hazard that it’s also not a stretch for many librarians to sympathize with Swartz at least a little bit. After all, we spend our days helping people find information, and we know all too well the frustrations of not being able to access the information we and our patrons need. I’ve read that Swartz wanted to use the data for research, but as JSTOR points out in the official statement, there are procedures in place for scholars who want to use large parts of JSTOR’s database for research.

What, exactly, did he do? This has been difficult to tease out, and the information in the many articles around the internet is highly varied. The indictment accuses Swartz of installing a laptop in a wiring closet at MIT to download large portions of JSTOR’s content. But it’s interesting to see terms like “hacking” and “stealing” used as synonyms with “illegal downloading” and “violating license terms” in many articles describing the case. As noted in an article in Wired:

Swartz used guest accounts to access the network and is not accused of finding a security hole to slip through or using stolen credentials, as hacking is typically defined.

On the other hand, Demand Progress, the progressive political organization founded by Swartz, has compared Swartz’s actions to “allegedly checking too many books out of the library” (a quote that’s been heavily retweeted). Of course, this analogy doesn’t really hold up, since books and databases operate under very different ownership models.

Why JSTOR? I’d guess that this is a question only a librarian would have, but I can’t help wondering why JSTOR? Why didn’t Swartz pick on one of the giant scholarly journal publishers with well-publicized huge profit margins? Perhaps JSTOR was easiest for him to access? Or maybe, because JSTOR isn’t one of the biggies, he suspected that if he got caught they wouldn’t press charges? It’s been reported that JSTOR secured the return of the downloaded content and did not press charges; the case is being brought by the U.S. Attorney’s Office.

What does this mean for libraries? And for the open access movement? As I was sitting down to finish writing this my CUNY colleague Stephen Francoeur sent out a link to this post on the Forbes blog that terms Swartz’s actions “reckless and counterproductive.” The post gets at something that’s been nagging at me since yesterday: it points out the possibility that the reputation of the open access movement could be damaged by association. And I’m still not sure how exactly to articulate it, but I worry that there may be fallout from this event that could have a negative effect on academic libraries, too.

The Pew Internet and American Life Project has just issued its third annual forecast of “The Future of the Internet.” It’s well worth a read. Among predictions:

–The mobile phone (or its descendant) will be the primary access point to the Internet by 2020.
–Social networking won’t increase tolerance. It might even polarize people into less tolerant camps.
–The original architecture of the Internet will not be replaced, but will be enhanced by research.
–Attempts to control access to content will continue to be challenged in an ongoing battle between intellectual property owners and users.

I’ve been thinking about this last point quite a bit since the Google settlement. I was very struck by a comment made by Brewster Kahle of the Internet Archive, interviewed in the Mercury News after the deal was announced. He accused Google of breaking the model of the Internet, “trying to build a walled garden of content that you have to pay to see.” My first thought was “our libraries are full of enormously expensive walled gardens.” How did we let that happen?

How many of you realize that the Harvard Business Review articles that are in your databases can’t be used in course reserves or printed out and shared with a class (or even, technically, made assigned reading)? Just look at the fine print: they are licensed “for individual use” of the library’s authorized patrons and are “not intended for use as assigned course material.” You can’t link it in your syllabus or in course reserves. For that, you have to pay all over again. (Thanks to members of the Digital Copyright list for noticing this weirdness.)

I recently reread Rory Litwin’s 2004 essay on Google and the Monetization of libraries, and found it very thought-provoking. But it’s not just the Googlization of libraries that worries me. Are academic libraries building collections for the future and for all to use, or are we content to simply rent access temporarily for a limited audience? If we won’t stand up for free and equitable access, who will?

To be sure, we’ve partnered with scholars to push for open access, particularly to STM research. But I’m baffled when libraries pay money to subscribe to commercial versions of public databases like PubMed, ERIC, and NCJRS Abstracts, teaching our students to use interfaces that we think are better, but which they can’t access once they graduate. Lifelong learning? Pfui. Free to all? Feh.

When did we decide libraries are no longer a commons but a go-between that rents temporary membership in publishers’ walled gardens? Did we even notice?

Some quotes from the Pew report are worth thinking about.

“Traditional carriers have little incentive to include poor populations, and the next five years will be rife with battles between carriers, municipal, and federal governments, handset makers, and content creators. I don’t know who will win.” danah boyd

“Tribes will be defined by social enclaves on the Internet, rather than by geography or kinship, but the world will be more fragmented and less tolerant, since one’s real-world surroundings will not have the homogeneity of one’s online clan.” Jim Horning

“There will be cross-linking of content provider giants and Internet service provider giants and that they will find ways to milk every last ‘currency unit’ out of the unwitting and defenseless consumer.
Governments will be strongly influenced by the business conglomerates and will not do much to protect consumers. (Just think of the outrageous rates charged by cable and phone company
TV providers and wireless phone providers today—it will only get worse.)” Steve Goldstein

“Copyright is a dead duck in a digital world.” Dan Lynch

“By 2020, the Internet will have enabled the monitoring and manipulation of people by businesses and governments on a scale never before imaginable. Most people will have happily traded their privacy—consciously or unconsciously—for consumer benefits such as increased convenience and lower prices. As a result, the line between marketing and manipulation will have largely disappeared.” Nicholas Carr

“The Internet is not magical; it will be utterly over-managed by commercial concerns, hobbled with ‘security’ micromanagement, and turned into money-shaped traffic for business, the rest 90% paid-for content download and the rest of the bandwidth used for market feedback.” Tom Jennings

If that’s the Internet in 2020 – where will libraries be? Will any of our traditional library values remain intact?

