Leaves of Graph

ACRLog welcomes a guest post from Pete Coco, the Humanities Liaison at Wheaton College in Norton, MA, and Managing Editor at Each Moment a Mountain.

Note: This post makes heavy use of web content from Google Search and Knowledge Graph. Because this content can vary by user and is subject to change at anytime, this essay uses screenshots instead of linking to live web pages in certain cases. As of the completion of this post, these images continue to match their live counterparts for a user from Providence, RI not logged in to Google services.

This That, Not That That

Early this July, Google unveiled its Knowledge Graph, a semantic reference tool nestled into the top right corner of its search results pages. Google’s video announcing the product makes no risk of understating Knowledge Graph’s potential, but there is a very real innovation behind this tool and it is twofold. For one, Knowledge Graph can distinguish between homonyms and connect related topics. For a clear illustration of this function, consider the distinction one might make between bear and bears. Though the search results page for either query include content related to both grizzlies and quarterbacks, Knowledge Graph knows the difference.

Second, Knowledge Graph purports to contain over 500 million articles. This puts it solidly ahead of Wikipedia, which reports having about 400 million, and lightyears ahead of professionally produced reference tools like Encyclopaedia Brittanica Online, which comprises an apparently piddling 120,000 articles. Combine that almost incomprehensible scope with integration into Google Search, and without much fanfare suddenly the world has its broadest and most prominently placed reference tool.

For years, Google’s search algorithm has been making countless, under-examined choices on behalf of its users about the types of results they should be served. But at its essence, Knowledge Graph presents a big symbolic shift away from (mostly) matching it to web content — content that, per extrinsic indicators, the search algorithm serves up and ranks for relevance — toward the act of openly interpreting the meaning of a search query and making decisions based in that interpretation. Google’s past deviations from the relevance model, when made public, have generally been motivated by legal requirements (such as those surrounding hate speech in Europe or dissent in China) and, more recently, the dictates of profit. Each of these moves has met with controversy.

And yet in the two months since its launch, Knowledge Graph has not been a subject of much commentary at all. This is despite the fact that the shift it represents has big implications that users must account for in their thinking, and can be understood as part of larger shifts the information giant has been making to leverage the reputation earned with Search toward other products.

Librarians and others teaching about internet media have a duty to articulate and problematize these developments. Being in many ways a traditional reference tool, Knowledge Graph presents a unique pedagogic opportunity. Just as it is critical to understand the decisions Google makes on our behalf when we use it to search the web, we must be critically aware of the claim to a newly authoritative, editorial role Google is quietly staking with Knowledge Graph — whether it means to be claiming that role or not.

Perhaps especially if it does not mean to. With interpretation comes great responsibility.

Some Questions

The value of the Knowledge Graph is in its ability to authoritatively parse semantics in a way that provides the user with “knowledge.” Users will use it assuming its ability to do this reliably, or they will not use it at all.

Does Knowledge Graph authoritatively parse semantics?

What is Knowledge Graph’s editorial standard for reliability? What constitutes “knowledge” by this tool’s standard? “Authority”?

What are the consequences for users if the answer to these questions is unclear, unsatisfactory, or both?

What is Google’s responsibility in such a scenario?

He Sings the Body Electric

Consider an example: Walt Whitman. As of this writing, the poet’s entry in Knowledge Graph looks like this (click the image to enlarge):

You might notice the most unlikely claim that Whitman recorded an album called This is the Day. Follow the link and you are brought to a straight, vanilla Google search for this supposed album’s title. The first link in that result list will bring you to a music video on Youtube:

Parsing this mistake might bring one to a second search: “This is the Day Walt Whitman.” The results list generated by that search yield another Youtube video at the top, resolving the confusion: a second, comparably flamboyant Walt Whitman, a choir director from Chicago, has recorded a song by that title.

Note the perfect storm of semantic confusion. The string “Walt Whitman” can refer to either a canonical poet or a contemporary gospel choir director while, at the same time, “This is the Day” can refer either to a song by The The or that second, lesser-known Walt Whitman.

Further, “This is the Day” is in both cases a song, not an album.

Knowledge Graph, designed to clarify exactly this sort of semantic confusion, here manages to create and potentially entrench three such confusions at once about a prominent public figure.

Could there be a better band than one called The The to play a role in this story?

Well Yeah

This particular mistake was first noted in mid-July. More than a month later, it still stands.

At this new scale for reference information, we have no way of knowing how many mistakes like this one are contained within Knowledge Graph. Of course it’s fair to assume this is an unusual case, and to Google’s credit, they address this sort of error in the only feasible way they could, with a feedback mechanism that allows users to suggest corrections. (No doubt bringing this mistake the attention of ACRLog’s readers means Walt Whitman’s days as a time-traveling new wave act are numbered.)

Is Knowledge Graph’s mechanism for correcting mistakes adequate? Appropriate?

How many mistakes like this do there need to be to make a critical understanding of Knowledge Graph’s gaps and limitations crucial to even casual use?

Interpreting the Gaps

Many Google searches sampled for this piece do not yield a Knowledge Graph result. Consider an instructive example: “Obama birth certificate.” Surely, there would be no intellectually serious challenge to a Knowledge Graph stub reflecting the evidence-based consensus on this matter. Then again, there might be a very loud one.

Similarly not available in Knowledge Graph are stubs on “evolution,” or “homosexuality.” In each case, it should be noted that Google’s top ranked search results are reliably “reality-based.” Each is happy to defer to Wikipedia.

In other instances, the stub for topics that seem to reach some threshold of complexity and/or controversy defers to “related” stubs in favor of making nuanced editorial decisions. Consider the entries for “climate change” and the “Vietnam war,” here presented in their entirety.

In moments such as these, is it unreasonable to assume that Knowledge Graph is shying away from controversy and nuance? More charitably, we might say that this tool is simply unequipped to deal with controversy and nuance. But given the controversial, nuanced nature of “knowledge,” is this second framing really so charitable?

What responsibility does a reference tool have to engage, explicate or resolve political controversy?

What can a user infer when such a tool refuses to engage with controversy?

What of the users who will not think to make such an inference?

To what extent is ethical editorial judgment reconcilable with the interests of a singularly massive, publicly traded corporation with wide-ranging interests cutting across daily life?

One might answer some version of the above questions with the suggestion that Knowledge Graph avoids controversy because it is programmed only to feature information that meets some high standard of machine-readable verification and/or cross-referencing. The limitation is perhaps logistical, baked into the cake of Knowledge Graph’s methodology, and it doesn’t necessarily limit the tool’s usefulness for certain purposes so long as the user is aware of the boundaries of that usefulness. Perhaps in that way this could be framed as a very familiar sort of challenge, not so different from the one we face with other media, whether it’s cable news or pop-science journalism.

This is all true, so far as it goes. Still, consider an example like the stub for HIV:

There are countless reasons to be uncomfortable with a definition of HIV implicitly bounded by Ryan White on one end and Magic Johnson on the other. So many important aspects of the virus are omitted here — the science of it, for one, but even if Knowledge Graph is primarily focused on biography, there are still important female, queer or non-American experiences of HIV that merit inclusion in any presentation of this topic. This is the sort of stub in Knowledge Graph that probably deserves to be controversial.

What portion of useful knowledge cannot — and never will — bend to a machine-readable standard or methodology?

Ironically, it is Wikipedia that, for all the controversy it has generated over the years, provides a rigorous, deeply satisfactory answer to the same problem: a transparent governance structure guided in specific instances by ethical principle and human judgment. This has more or less been the traditional mechanism for reference tools, and it works pretty well (at least up to a certain scale). Even more fundamental, length constraints on Wikipedia are forgiving, and articles regularly plumb nuance and controversy. Similarly, a semantic engine like Wolfram Alpha successfully negotiates this problem by focusing on the sorts of quantitative information that isn’t likely to generate so much political controversy. The demographics of its user-base probably help too.

Of course, Google’s problem here is that it searches everything for every purpose. People use it everyday to arbitrate contested facts. Many users assume that Google is programmatically neutral on questions of content itself, intervening only to organize results for their relevance to our questions; Google, then, has no responsibility for the content itself. This assumption is itself complicated and, in many ways, was problematic even before the debut of Knowledge Graph. All the same, it is a “brand” that Knowledge Graph will no doubt leverage in a new direction. Many users will intuitively trust this tool and the boundaries of “knowledge” enforced by its limitations and the prerogatives of Google and its corporate actors.

So:

Consider the college freshman faced with all these ambiguities. Let’s assume that she knows not to trust everything she reads on the internet. She has perhaps even learned this lesson too well, forfeiting contextual, critical judgment of individual sources in favor of a general avoidance of internet sources. Understandably, she might be stubbornly loyal to the internet sources that she does trust.

Trading on the reputation and cultural primacy of Google search, Knowledge Graph could quickly become a trusted source for this student and others like her. We must use our classrooms to provide this student with the critical engagement of her professors, librarians and peers on tools like this one and the ways in which we can use them to critically examine the gaps so common in conventional wisdom. Of course Knowledge Graph has a tremendous amount of potential value, much of which can only proceed from a critical understanding of its limitations.

How would this student answer any of the above questions?

Without pedagogical intervention, would she even think to ask them?

Searching the Library Website and Beyond: A Graduate Student Perspective

This month’s post in our series of guest academic librarian bloggers is by Julia Skinner, a first year Information Studies doctoral student at Florida State University. She blogs at Julia’s Library Research.

I just finished my MLS, and one of the issues raised frequently both in and out of the classroom was how to get college students and researchers to use the library website. Academic librarians I’ve talked with have spent hefty amounts of time (and money) designing sites that meet the self-described needs of patrons, but still find most of the searches that guide students to library resources to be coming from Google. I decided to take a look at my own search habits to get a sense of how, from the graduate student perspective, these tools might be employed, and hopefully generate some discussion about searching on the library website and beyond.

Like many other people, I usually do a quick Google search on my topic early on in the research process. This isn’t necessarily to track down every resource I would be using, but it does give me a general sense of what’s out there on my topic beyond the realm of scholarly materials. Since my own work relies heavily on the journal articles, scholarly monographs, primary sources, and other reliable sources, I feel like seeing what people have said outside the ivory tower can be a good way to give myself some perspective about how my topic is thought of and applied elsewhere. Most of the time, like for my research on Iowa libraries during WWI, there’s not much. But sometimes this search helps me find something useful (for example, in my recent work writing chapters for an encyclopedia on immigration, I was able to find information about nonprofits serving the immigrant community and some news stories.)

Obviously, the university library is still my go-to source. Journal articles, ebooks, not to mention circulating and special collections, are all where the meat and potatoes of my bibliography can be found. I love that many libraries are putting these collections online and purchasing more digital subscriptions (especially in the winter when I have a serious sinus infection and am locked in my house trying to work!) Sometimes, I find these resources through Google Scholar, but most of the time, it’s through searches within the library’s resources. This is especially true for journal articles, which I’ve found Google hasn’t really nailed yet when it comes to bringing desired results from a simple keyword search (I know, it’s a lot to ask, and hence why I love the library site!)

One tool I use heavily is Google Books. Not everything is on there, and most of the things that are have a limited availability (i.e. a preview where only some pages are available) but I have saved countless hours by doing a keyword search in GBooks to get a sense of what’s out there that mentions or is relevant to my topic, but maybe isn’t something I would have grabbed while browsing the shelves. I can then go track down the physical book for a more thorough read, or if I am able to access all the information I need from the preview I can just use it as a digital resource. Some other useful documents are in full view as well: many public domain items, including some ALA documents, can be found there.

Of course I don’t just use Google Books and assume that’s all there is. I also track down public domain titles on sites like Open Library and Project Gutenberg, and approach them in the same way. It’s a great way to get that one tidbit that really pulls an article together, and I usually find that some of those works don’t overlap with the offerings I find in the databases the library subscribes to. I will sometimes use different search engines, search a variety of fields, do Boolean search, etc. all of which helps me extract more little nuggets of information from the vast world of material related to any given topic. Even though I’m an avid Googler, I use library resources just as frequently. I remember speaking with a student a few years ago who could not find anything on her topic through a keyword search, and assumed there was nothing out there on that topic. I was amazed that she hadn’t even considered the university library’s website or physical collections before throwing in the towel! It makes me wonder how many students feel this way, and how we as LIS professionals and instructors can help effectively remove those blinders.

One thing I think will be interesting in the coming years (and which is a great thing to get input about from academic librarians!) is learning more about search habits among undergraduates. I’ll be TAing for our MLIS program this semester, so I’ll be working with students who are my age, getting the degree I just recently obtained, who are tech savvy and knowledgeable about search. What happens when I TA for an undergraduate course? Is sharing my search strategies helpful for papers that only require a handful of sources, and don’t require you to look at a topic from every imaginable angle? I argue that teaching search as something done in as many outlets as possible has the potential to make students better researchers, BUT only if that goes hand in hand with instruction on critically evaluating resources.

Without that, one runs the risk of putting students in information overload or having students work with sources that are irrelevant/untrustworthy. I’m a big fan of helping students recognize that the knowledge they have and the ideas they create are valuable, and it makes me wonder if building on their current search habits in such a way that encourages them to speak about the value of those sources, the flaws in their arguments, etc. will help promote that. I remember having a few (but not many) undergrad courses that encouraged me to draw upon my own knowledge and experience for papers, and to critically analyze works rather than just write papers filled with other peoples arguments followed by I agree/disagree. I feel like teaching is moving more in the direction of critical analysis, and I’m excited to see the role that librarians and library websites play!