Leaves of Graph

ACRLog welcomes a guest post from Pete Coco, the Humanities Liaison at Wheaton College in Norton, MA, and Managing Editor at Each Moment a Mountain.

Note: This post makes heavy use of web content from Google Search and Knowledge Graph. Because this content can vary by user and is subject to change at anytime, this essay uses screenshots instead of linking to live web pages in certain cases. As of the completion of this post, these images continue to match their live counterparts for a user from Providence, RI not logged in to Google services.

This That, Not That That

Early this July, Google unveiled its Knowledge Graph, a semantic reference tool nestled into the top right corner of its search results pages. Google’s video announcing the product makes no risk of understating Knowledge Graph’s potential, but there is a very real innovation behind this tool and it is twofold. For one, Knowledge Graph can distinguish between homonyms and connect related topics. For a clear illustration of this function, consider the distinction one might make between bear and bears. Though the search results page for either query include content related to both grizzlies and quarterbacks, Knowledge Graph knows the difference.

Second, Knowledge Graph purports to contain over 500 million articles. This puts it solidly ahead of Wikipedia, which reports having about 400 million, and lightyears ahead of professionally produced reference tools like Encyclopaedia Brittanica Online, which comprises an apparently piddling 120,000 articles. Combine that almost incomprehensible scope with integration into Google Search, and without much fanfare suddenly the world has its broadest and most prominently placed reference tool.

For years, Google’s search algorithm has been making countless, under-examined choices on behalf of its users about the types of results they should be served. But at its essence, Knowledge Graph presents a big symbolic shift away from (mostly) matching it to web content — content that, per extrinsic indicators, the search algorithm serves up and ranks for relevance — toward the act of openly interpreting the meaning of a search query and making decisions based in that interpretation. Google’s past deviations from the relevance model, when made public, have generally been motivated by legal requirements (such as those surrounding hate speech in Europe or dissent in China) and, more recently, the dictates of profit. Each of these moves has met with controversy.

And yet in the two months since its launch, Knowledge Graph has not been a subject of much commentary at all. This is despite the fact that the shift it represents has big implications that users must account for in their thinking, and can be understood as part of larger shifts the information giant has been making to leverage the reputation earned with Search toward other products.

Librarians and others teaching about internet media have a duty to articulate and problematize these developments. Being in many ways a traditional reference tool, Knowledge Graph presents a unique pedagogic opportunity. Just as it is critical to understand the decisions Google makes on our behalf when we use it to search the web, we must be critically aware of the claim to a newly authoritative, editorial role Google is quietly staking with Knowledge Graph — whether it means to be claiming that role or not.

Perhaps especially if it does not mean to. With interpretation comes great responsibility.

Some Questions

The value of the Knowledge Graph is in its ability to authoritatively parse semantics in a way that provides the user with “knowledge.” Users will use it assuming its ability to do this reliably, or they will not use it at all.

Does Knowledge Graph authoritatively parse semantics?

What is Knowledge Graph’s editorial standard for reliability? What constitutes “knowledge” by this tool’s standard? “Authority”?

What are the consequences for users if the answer to these questions is unclear, unsatisfactory, or both?

What is Google’s responsibility in such a scenario?

He Sings the Body Electric

Consider an example: Walt Whitman. As of this writing, the poet’s entry in Knowledge Graph looks like this (click the image to enlarge):

You might notice the most unlikely claim that Whitman recorded an album called This is the Day. Follow the link and you are brought to a straight, vanilla Google search for this supposed album’s title. The first link in that result list will bring you to a music video on Youtube:

Parsing this mistake might bring one to a second search: “This is the Day Walt Whitman.” The results list generated by that search yield another Youtube video at the top, resolving the confusion: a second, comparably flamboyant Walt Whitman, a choir director from Chicago, has recorded a song by that title.

Note the perfect storm of semantic confusion. The string “Walt Whitman” can refer to either a canonical poet or a contemporary gospel choir director while, at the same time, “This is the Day” can refer either to a song by The The or that second, lesser-known Walt Whitman.

Further, “This is the Day” is in both cases a song, not an album.

Knowledge Graph, designed to clarify exactly this sort of semantic confusion, here manages to create and potentially entrench three such confusions at once about a prominent public figure.

Could there be a better band than one called The The to play a role in this story?

Well Yeah

This particular mistake was first noted in mid-July. More than a month later, it still stands.

At this new scale for reference information, we have no way of knowing how many mistakes like this one are contained within Knowledge Graph. Of course it’s fair to assume this is an unusual case, and to Google’s credit, they address this sort of error in the only feasible way they could, with a feedback mechanism that allows users to suggest corrections. (No doubt bringing this mistake the attention of ACRLog’s readers means Walt Whitman’s days as a time-traveling new wave act are numbered.)

Is Knowledge Graph’s mechanism for correcting mistakes adequate? Appropriate?

How many mistakes like this do there need to be to make a critical understanding of Knowledge Graph’s gaps and limitations crucial to even casual use?

Interpreting the Gaps

Many Google searches sampled for this piece do not yield a Knowledge Graph result. Consider an instructive example: “Obama birth certificate.” Surely, there would be no intellectually serious challenge to a Knowledge Graph stub reflecting the evidence-based consensus on this matter. Then again, there might be a very loud one.

Similarly not available in Knowledge Graph are stubs on “evolution,” or “homosexuality.” In each case, it should be noted that Google’s top ranked search results are reliably “reality-based.” Each is happy to defer to Wikipedia.

In other instances, the stub for topics that seem to reach some threshold of complexity and/or controversy defers to “related” stubs in favor of making nuanced editorial decisions. Consider the entries for “climate change” and the “Vietnam war,” here presented in their entirety.

In moments such as these, is it unreasonable to assume that Knowledge Graph is shying away from controversy and nuance? More charitably, we might say that this tool is simply unequipped to deal with controversy and nuance. But given the controversial, nuanced nature of “knowledge,” is this second framing really so charitable?

What responsibility does a reference tool have to engage, explicate or resolve political controversy?

What can a user infer when such a tool refuses to engage with controversy?

What of the users who will not think to make such an inference?

To what extent is ethical editorial judgment reconcilable with the interests of a singularly massive, publicly traded corporation with wide-ranging interests cutting across daily life?

One might answer some version of the above questions with the suggestion that Knowledge Graph avoids controversy because it is programmed only to feature information that meets some high standard of machine-readable verification and/or cross-referencing. The limitation is perhaps logistical, baked into the cake of Knowledge Graph’s methodology, and it doesn’t necessarily limit the tool’s usefulness for certain purposes so long as the user is aware of the boundaries of that usefulness. Perhaps in that way this could be framed as a very familiar sort of challenge, not so different from the one we face with other media, whether it’s cable news or pop-science journalism.

This is all true, so far as it goes. Still, consider an example like the stub for HIV:

There are countless reasons to be uncomfortable with a definition of HIV implicitly bounded by Ryan White on one end and Magic Johnson on the other. So many important aspects of the virus are omitted here — the science of it, for one, but even if Knowledge Graph is primarily focused on biography, there are still important female, queer or non-American experiences of HIV that merit inclusion in any presentation of this topic. This is the sort of stub in Knowledge Graph that probably deserves to be controversial.

What portion of useful knowledge cannot — and never will — bend to a machine-readable standard or methodology?

Ironically, it is Wikipedia that, for all the controversy it has generated over the years, provides a rigorous, deeply satisfactory answer to the same problem: a transparent governance structure guided in specific instances by ethical principle and human judgment. This has more or less been the traditional mechanism for reference tools, and it works pretty well (at least up to a certain scale). Even more fundamental, length constraints on Wikipedia are forgiving, and articles regularly plumb nuance and controversy. Similarly, a semantic engine like Wolfram Alpha successfully negotiates this problem by focusing on the sorts of quantitative information that isn’t likely to generate so much political controversy. The demographics of its user-base probably help too.

Of course, Google’s problem here is that it searches everything for every purpose. People use it everyday to arbitrate contested facts. Many users assume that Google is programmatically neutral on questions of content itself, intervening only to organize results for their relevance to our questions; Google, then, has no responsibility for the content itself. This assumption is itself complicated and, in many ways, was problematic even before the debut of Knowledge Graph. All the same, it is a “brand” that Knowledge Graph will no doubt leverage in a new direction. Many users will intuitively trust this tool and the boundaries of “knowledge” enforced by its limitations and the prerogatives of Google and its corporate actors.

So:

Consider the college freshman faced with all these ambiguities. Let’s assume that she knows not to trust everything she reads on the internet. She has perhaps even learned this lesson too well, forfeiting contextual, critical judgment of individual sources in favor of a general avoidance of internet sources. Understandably, she might be stubbornly loyal to the internet sources that she does trust.

Trading on the reputation and cultural primacy of Google search, Knowledge Graph could quickly become a trusted source for this student and others like her. We must use our classrooms to provide this student with the critical engagement of her professors, librarians and peers on tools like this one and the ways in which we can use them to critically examine the gaps so common in conventional wisdom. Of course Knowledge Graph has a tremendous amount of potential value, much of which can only proceed from a critical understanding of its limitations.

How would this student answer any of the above questions?

Without pedagogical intervention, would she even think to ask them?

Convenience and its Discontents: Teaching Web-Scale Discovery in the Context of Google

ACRLog welcomes a guest post from Pete Coco, formerly of Grand Valley State University, now Humanities Liaison at Wheaton College in Norton, MA.

With the continued improvements being made to web-scale discovery tools like Proquest’s Summon and EBSCO’s Discovery Service, access to library resources is reaching a singularity of sorts: frictionless searching. Providing a unified interface through which patrons can access nearly all of your library’s collection has an obvious appeal on all sides. Users get the googley familiarity and convenience of a singular, wide-ranging search box and, according to a recent case study done at Grand Valley State University, the reduced friction patrons face when using library resources correlates to an increase — potentially dramatic — in the frequency with which they access them. While these tools will continue to be tweaked and refined, it’s difficult to imagine an easier process for getting students to scholarly sources.

That’s the good news, and the story you’re likely getting from your sales rep. And while none of it is untrue, in my role as a teaching librarian I’ve seen more undergraduate students struggle to get what they need from web-scale discovery than I’ve seen benefit from its obvious conveniences. These students often know intuitively how to get to results from Summon’s search box; often they figure out on their own how to get to the item itself if it is available in full-text. In the library’s statistics, these might be counted fairly as successful searches. But when I ask the student whether the article at hand is what they wanted, I get one response far more frequently than all others: “Not… exactly.”

Web-scale discovery is doing about as much for these students as we could reasonably expect, and, in doing so, offers teaching librarians a challenge and an opportunity. Both are at root about our thinking, and they stem from the same fact: these tools offer an unprecedented convenience. For all the familiarity it allows students, our decision to make library tools more similar to commercial web search can reinforce the idea — primarily amongst students, but also, potentially, amongst administrators making personnel and workload decisions — that information literacy instruction isn’t necessary because students know how to get what they want from Google. If the new tool is like Google, then why does it require instruction?

There’s a lot to unpack in that question. First and foremost, what web-scale discovery borrows from Google does not make it Google. Searching Summon for scholarly articles will never be like searching Google — not because Summon cannot approximate Google’s user experience, but because scholarly communications will never be like the things students use Google to find.

Consider the freshman student looking for a pizza parlor that will deliver to his dorm. He comes to his commercial web search with a knowledge base and a self-defined need: pizza literacy, let’s call it. Having eaten and enjoyed pizza countless times in the past, he knows what it is and the range of forms it can take. Over time, he’s developed a preference for sausage, but tonight he wants pepperoni. Perhaps in this instance, he’s working under unique constraints — he saw a coupon somewhere, and is hoping to find it online. Whatever his specific pizza need, could there be any doubt that this student has the literal and conceptual vocabulary to effectively communicate that need to Google? In a way that will undoubtedly yield him with an informed pizza-choice?

Of course not. But consider the same student, his belly now full, turning to the research paper for his freshman composition course. Unlike his soul-deep craving for pepperoni, his need for “2-3 peer-reviewed articles” has been externally defined. If she is like too many of her peers, the professor assigning this requirement hasn’t done so in detail nor explained her pedagogical purpose for including it. She has given our hero but one bread crumb: go to the library website. Assuming his library’s discovery tool is featured prominently, it can potentially spare him the UI nightmare that would otherwise be the process of selecting a database to search. That’s quite a mercy, but it really only helps him with the first of many steps.

To find the scholarly articles that will meet the paper requirement, the student will need navigate a host of alien concepts, vocabularies and controversies that will, at least at first, drive his experience with peer-reviewed scholarship. And while some degree of anxiety is probably useful to his learning experience, there can be little doubt that the process would be easier and of more lasting value to the student who has support—human support—as he goes through it.

Put another way: good learning is best facilitated by good pedagogy. The tool is not the pedagogy and it’s hard to imagine how it ever could be. Because of all the concepts and conventions implicit to scholarship, the scholarly resource that is not improved for students by expert intervention is and always will be a chimera. The future of teaching librarianship as a profession will only demand more vigilance on this point.

But for all these caveats, with the right framing discovery can be an excellent pedagogical tool. Because it relieves so many searches of the burden of that first question — which database should I search? — we can use our time with students to construct, together, answers to questions we all find more compelling. What is peer review? Why does it matter? Why would a professor use it as a standard for student research? Each can be elegantly demonstrated with discovery, and best of all, students can demonstrate it for themselves and each other while my guidance focuses on the concepts and conventions underneath all the clicking.

Rather than giving in to the temptation to compare discovery to Google as a means of marketing it to students, we should go out of our way to contrast the two. What is the difference between the commercial internet search and the library tool? What is the purpose each exists to serve? How does the commercial internet search engine decide what to show you? How does discovery? You might be surprised how sophisticated students can be when they’re given a space suited to sophistication. Discovery can help to create that space in your information literacy sessions.

Even in freshman courses, I’ve found that I’m able to dive right in to activities that lead to genuine and rewarding discussion. In one, for example, I have students choose a search term — usually the name of a superhero — and ask them to search it in both Google and in Summon (with the box checked for “scholarly” results only). To the average student my sessions, the distinction between thedarkknight.warnerbros.com and Batman and Robin in the Nude, or Class and Its Exceptions is instructive on its face. Discovery makes juxtaposition like this one quick, fluid, and highly demonstrable. My students don’t need to read more than the title and abstract of the latter to have a sense of the distinction at hand.

Discovery is also a great tool for “citation chasing.” Projecting a full citation in front of the classroom, I’ll preface the activity with a brief discussion of the citation itself. What is this text Pete is projecting on the board? Why does it exist? What are its component parts, and what do they tell us about the object it describes? Then I poll the students: how many of you think you could find the full-text of the article this citation describes using the library website? Depending on the class, anywhere from none to a half of the students raise their hands. Without discovery, I wouldn’t be able to say what I say to them next: The truth is you all can. So please: do. Within three minutes, the entire class has the full-text article on their own screens.

Discovery is not the tool for every task. Controlled vocabularies don’t federate well, and the student asking very specific questions of the literature is better off going straight to the disciplinary index. Known item searches proceeding from partial information are a recurrent challenge. We must be careful with the way we describe the scale of discovery to students. In our attempts to market discovery as convenient and easy, we may in fact be selling them on a product that doesn’t exist. In the absence of a clear purpose, convenience is not convenient.

But really, has convenience ever really been our only goal as educators? The commercial web has no doubt rattled the profession, and we must respond decisively to the vast changes it has brought to the information landscape. But when we start to speak primarily in terms of convenience, the risk is that we turn away from the terms of learning and pedagogy. It’s a choice you can make without even meaning to make it. The librarian who is able to choose between user education and user convenience, certainly, has the easier job. But will it be a job worth doing? Will his users get what they need from him? The hard thing, really, is find ways to give our users both with the fewest trade-offs. This is the tension at the heart of information literacy instruction. Romantics, we want to have it all. And so we should.

The Limits of Mobility

Some interesting articles about mobile technology caught my eye last week as I was finishing up the leftover turkey. Apple has come under fire for the reported inability of Siri, the voice recognition application on the new iPhone 4S, to find abortion clinics. As reported by CNN, quoting the American Civil Liberties Union:

“Although it isn’t clear that Apple is intentionally trying to promote an anti-choice agenda, it is distressing that Siri can point you to Viagra, but not the Pill, or help you find an escort, but not an abortion clinic,” the group wrote in a blog post Wednesday.

A spokesperson for Apple responded quickly:

“These are not intentional omissions meant to offend anyone. It simply means that as we bring Siri from beta to a final product, we find places where we can do better and we will in the coming weeks.”

This is but one example of problematic access and information issues with our mobile devices, a topic that was explored in more detail last week by Harvard professor Jonathan Zittrain in MIT’s Technology Review in his provocatively-titled article The Personal Computer is Dead. Zittrain begins by asserting that:

Rising numbers of mobile, lightweight, cloud-centric devices don’t merely represent a change in form factor. Rather, we’re seeing an unprecedented shift of power from end users and software developers on the one hand, to operating system vendors on the other—and even those who keep their PCs are being swept along. This is a little for the better, and much for the worse.

Zittrain continues with an analysis of the state of mobile software development for Apple and Android devices, and the restrictions this development operates within. In Apple’s case users are limited to the software available in the company’s commercial space: the App Store (unless the device is jailbroken). Android apps are potentially available outside of the Android Marketplace, though I wonder whether many users go to the extra effort to locate and download those apps. In both cases developers are tied to the operating system of the device which dictates the parameters of the software. Perhaps most distressingly, there are hints that a similar environment for software development may soon be prevalent even on the PC: Apple has already introduced its App Store for Mac.

How does this aspect of mobile computing affect us as academic librarians? While we still have a sizable number of students without smartphones on our campuses on average,* there’s no question that smartphone and tablet usage is on the rise overall. What challenges will we face that accompany the increasing reliance on mobile devices? Certainly library database vendors are rushing to develop apps for these devices — how will we promote these apps to our users and integrate their use with the library website and other existing services? And while many libraries are also developing apps, that strategy may not be feasible for smaller libraries that already feel stretched by the efforts to provide digital library services.

Access to information — an aspect of information literacy — may also be affected by these restrictions around mobile devices. We’ve already read about the possibility of a filter bubble that impacts Google search results. With the increasing move to an app-driven environment, could an internet search provider’s app restrict or shape search results even further?

What should academic libraries be considering as we adapt to an information landscape that’s increasingly mediated by mobile technologies? How can we help our students, faculty, and other library patrons with their information needs while ensuring that they’re aware of the strengths and limitations that these technologies have to offer?

* The latest survey results from the Pew Internet Project show that the vast majority of undergrads have a cellphone (between 94-96%), and about 44% of 18-24 year olds own smartphones.

Finding Footnotes and Chasing Citations

This week’s New York Times Book Review includes an essay by Alexandra Horowitz straightforwardly-titled Will the E-Book Kill the Footnote?, in which she laments that footnotes become endnotes when books move from paper to screen. Horowitz suggests that while this change means that the main text of a book may be more easily read from start to finish, something is lost when the intrusive interruption of a footnote morph into the more easily ignored endnote. After all, how many people actually read endnotes?

This article reminded me of one published last year in the Chronicle of Higher Ed about link rot and footnote flight (paywall alert), which made some of the same points for academic texts that Horowitz makes for popular books: electronic writing may suffer from both losing footnotes as well as from link rot, in which hyperlinks go dead over time as the site or page linked to is moved or abandoned.

Both the conversion of footnotes to endnotes and link rot can affect anyone reading a text, scholars and students alike. For scholars, I have to assume that if the information is valuable enough to be used in a research project, the researcher will have the tenacity to track down the necessary sources, whether that means jumping back and forth between endnotes and the main text or searching for the new home of a page at the dead end of a link. While it can sometimes be annoying to have to spend time chasing citations, I think many scholars actually enjoy this kind of work (or maybe I’m just looking at the task through my librarian-glasses?).

Students are busy, so I’d bet that they’re less invested in reading endnotes in electronic texts (and even footnotes in print books), and more likely to see them as an aside or as unnecessary. Of course students are very familiar with jumping from link to link on the web, and now that web browsers support tabbed browsing the process of moving between hyperlinks and the main text can come very close to the experience of reading a print volume with footnotes. And what about Wikipedia, where hyperlinks and endnotes abound? It’s easy to draw parallels between the Notes and References at the bottom of most Wikipedia entries and the same in scholarly texts. Maybe electronic texts can effectively be used to encourage students to chase down those citations and read those extra words in footnotes and endnotes.

Searching the Library Website and Beyond: A Graduate Student Perspective

This month’s post in our series of guest academic librarian bloggers is by Julia Skinner, a first year Information Studies doctoral student at Florida State University. She blogs at Julia’s Library Research.

I just finished my MLS, and one of the issues raised frequently both in and out of the classroom was how to get college students and researchers to use the library website. Academic librarians I’ve talked with have spent hefty amounts of time (and money) designing sites that meet the self-described needs of patrons, but still find most of the searches that guide students to library resources to be coming from Google. I decided to take a look at my own search habits to get a sense of how, from the graduate student perspective, these tools might be employed, and hopefully generate some discussion about searching on the library website and beyond.

Like many other people, I usually do a quick Google search on my topic early on in the research process. This isn’t necessarily to track down every resource I would be using, but it does give me a general sense of what’s out there on my topic beyond the realm of scholarly materials. Since my own work relies heavily on the journal articles, scholarly monographs, primary sources, and other reliable sources, I feel like seeing what people have said outside the ivory tower can be a good way to give myself some perspective about how my topic is thought of and applied elsewhere. Most of the time, like for my research on Iowa libraries during WWI, there’s not much. But sometimes this search helps me find something useful (for example, in my recent work writing chapters for an encyclopedia on immigration, I was able to find information about nonprofits serving the immigrant community and some news stories.)

Obviously, the university library is still my go-to source. Journal articles, ebooks, not to mention circulating and special collections, are all where the meat and potatoes of my bibliography can be found. I love that many libraries are putting these collections online and purchasing more digital subscriptions (especially in the winter when I have a serious sinus infection and am locked in my house trying to work!) Sometimes, I find these resources through Google Scholar, but most of the time, it’s through searches within the library’s resources. This is especially true for journal articles, which I’ve found Google hasn’t really nailed yet when it comes to bringing desired results from a simple keyword search (I know, it’s a lot to ask, and hence why I love the library site!)

One tool I use heavily is Google Books. Not everything is on there, and most of the things that are have a limited availability (i.e. a preview where only some pages are available) but I have saved countless hours by doing a keyword search in GBooks to get a sense of what’s out there that mentions or is relevant to my topic, but maybe isn’t something I would have grabbed while browsing the shelves. I can then go track down the physical book for a more thorough read, or if I am able to access all the information I need from the preview I can just use it as a digital resource. Some other useful documents are in full view as well: many public domain items, including some ALA documents, can be found there.

Of course I don’t just use Google Books and assume that’s all there is. I also track down public domain titles on sites like Open Library and Project Gutenberg, and approach them in the same way. It’s a great way to get that one tidbit that really pulls an article together, and I usually find that some of those works don’t overlap with the offerings I find in the databases the library subscribes to. I will sometimes use different search engines, search a variety of fields, do Boolean search, etc. all of which helps me extract more little nuggets of information from the vast world of material related to any given topic. Even though I’m an avid Googler, I use library resources just as frequently. I remember speaking with a student a few years ago who could not find anything on her topic through a keyword search, and assumed there was nothing out there on that topic. I was amazed that she hadn’t even considered the university library’s website or physical collections before throwing in the towel! It makes me wonder how many students feel this way, and how we as LIS professionals and instructors can help effectively remove those blinders.

One thing I think will be interesting in the coming years (and which is a great thing to get input about from academic librarians!) is learning more about search habits among undergraduates. I’ll be TAing for our MLIS program this semester, so I’ll be working with students who are my age, getting the degree I just recently obtained, who are tech savvy and knowledgeable about search. What happens when I TA for an undergraduate course? Is sharing my search strategies helpful for papers that only require a handful of sources, and don’t require you to look at a topic from every imaginable angle? I argue that teaching search as something done in as many outlets as possible has the potential to make students better researchers, BUT only if that goes hand in hand with instruction on critically evaluating resources.

Without that, one runs the risk of putting students in information overload or having students work with sources that are irrelevant/untrustworthy. I’m a big fan of helping students recognize that the knowledge they have and the ideas they create are valuable, and it makes me wonder if building on their current search habits in such a way that encourages them to speak about the value of those sources, the flaws in their arguments, etc. will help promote that. I remember having a few (but not many) undergrad courses that encouraged me to draw upon my own knowledge and experience for papers, and to critically analyze works rather than just write papers filled with other peoples arguments followed by I agree/disagree. I feel like teaching is moving more in the direction of critical analysis, and I’m excited to see the role that librarians and library websites play!