Searching For the Answers

An updated website is one of the most useful tools that academic libraries have to communicate with the students, faculty, and staff we serve at our colleges and universities. Our websites offer access to information sources, provide help with research, and list our policies and basic information about the library: where we’re located, when we’re open, how to get in touch with us. It’s 2013 — libraries (and colleges) have had websites for a long time, so surely our website is the first place to look to learn more about the library, right?

Maybe, but maybe not. While I always check the website when I need more information about a library, often arriving there via the college or university website, I’m not sure that all of our patrons do. More often than not I’d guess that they use a search engine to find the library website. Assuming that Google is the search engine of choice for most of our patrons, what do they see when they search for our libraries?

(Feel free to go ahead and try a Google search with your own college or university library. I’ll wait.)

I tend to search with Google, but I must not search that often for businesses or other specific locations on Google’s web search, because it took me a while to notice that Google had added a box on the right side of the search results page populated with details about a business or location. The box includes a photo, a map (which links to Google Maps for directions), and some basic information about the place: a description from Wikipedia (if one exists), the address, phone number, and hours. There’s also a space for people to rate and review the business or location, as well as links to other review websites. It seems that the information in the box is populated automatically by Google from the original websites.

This is great news, right? This Google feature can get the information our patrons need to them without having to click through to the library website. On the other hand, what happens when the information is wrong?

At my library we first learned about the Google info box last winter. A student approached the Reference Desk to verify the library’s opening hours. It seems that she’d found the library hours on Google, and was upset to learn that we’d extended our hours the prior semester. While there’s a happy ending to this story — it’s delightful when a student wants to come to the library earlier than she thinks she can! — this experience was frustrating for both of us. Since we hadn’t realized that Google added the info box to its search results, we didn’t know to check whether the information was correct. The student naturally assumed that we were in control of the information in that box, and was angry when it seemed that we hadn’t kept it up to date.

Just a month ago we encountered another issue with the Google info box for our library. I don’t know that I would expect there to be reviews of a college library on business or location review websites, but our library’s info box does have one review website listed under the Reviews heading. Following the link leads to a review that has nothing to do with the library (or the college), and is instead a post criticizing the city’s police department. While a bit jarring, it only takes a minute of reading the review site to realize that the review isn’t actually about the library, just a false hit on the review website.

While there are definitely advantages to having basic information about our library available quickly for our patrons, some aspects of the Google info box are troubling from a user experience perspective. It’s unclear how often Google updates the information in that box automatically — our experience with the incorrect library hours suggests that it’s not updated frequently. Also, it’s challenging to edit some of the information in this box. There’s a link for business owners to claim and edit their profile which does offer the opportunity to change some details displayed in the box. But we weren’t able to remove the erroneous review website from our listing; our only option was to use the Feedback link to request that the link be removed, and who knows how long that will take?

My biggest takeaway has been the reminder that we should periodically research our libraries as if we were patrons looking for information. Google offers search alerts, which can be helpful to learn when our libraries are being mentioned on other websites, but I don’t know that there’s any way to automatically learn what information has been added or changed in the Google info box. I’d be interested to know if anyone has figured out a quick and easy way to keep track of this sort of thing — please share your experiences in the comments!

The Romanian Patent From Hell

(tl;dr version – tell students to look up this patent if they ever claim, like Thomas Friedman, that “Everything is on Google.”)

A few weeks ago, in my SciFinder key contact role I received this innocuous request:

patentrequest

This is the lowest hanging fruit among my reference requests – click the “Full Text” link, another click to Espacenet, download the full-text, send to graduate student, log the transaction. Read to finish time – under two minutes.

But Espacenet, had a grey font where the “Original document” link resides – the original was not available. Well, that’s sad, but hey – I’m a professional librarian. I found and searched the Romanian patent agency.

Nope.

I also tried the Derwent Patent Index (through Thomson Reuters for us) and Google Patents – I got an abstract from Derwent, but no full text.

So I invoked the nuclear option – an open question on the Chemical Information Sources Discussion List. This invaluable treasure has taught me well, and I once answered a query off list. But there was some trepidation about asking such a learned cadre of science librarians because, frankly, there might be some easy answer I missed which would make me feel dumb. But I decided this was the best-case scenario because I would learn something; so I asked the group mind.

What came back was ninja-level patent advice, but all for naught –

There are few remaining options – a document delivery service like FIZ AutoDoc or ordering the patent file wrapper of the citing U.S. patent, (RO89171 might be included in the original filing materials). But these services are relatively expensive compared to what we will generally pay, so I would have to kick it back to the user – which feels like defeat.

Yes, I have anthropomorphized a reference request into my nemesis.

This is really the first time I’m staring down a patent retrieval defeat – and it’s chafing a little. But in terms of my duties, I have a collection to analyze, my first convention coming up (cough, cough), and the metastasizing committee responsibilities inherent to the tenure track. Among other things (like the cold call that just eroded 5 minutes of productivity). I don’t think I’m going to “win” this one and I’ve probably spent too much time on it already.

So if you ever need something ungoogleable for a demonstration, trot out Romanian patent 89171 – at least until someone gets around to scanning it.

“Power Searching” with Google

Google, common “frenemy” of academic librarians everywhere, has put together a short online class called Power Searching. The course is designed to teach you how to find good, quality information more quickly and easily while searching Google.  When I first heard about this course, my first thought was “Ah, Google is stealing my job!” After I calmed down a bit, I read over the description for the course and decided to enroll. I wanted to check out our potential competition and I hoped I might be inspired by new ideas and tools to incorporate into my teaching.

The course is divided into six classes and each class is further broken down into short videos. Each class totals approximately 50 minutes of video content. Following each short video there is an optional opportunity to test the skills demonstrated by David Russel, Senior Research Assistant, through an activity or quiz. The course contains a pre, mid, and post class assessment.  After successfully passing both the mid and post class assessments, you receive an official certificate or completion. To supplement the concepts taught in the classes, Google search experts also offer forums and Google Hangouts. When I took the course, it lasted about two weeks and a new class was released every three days or so. The classes could be completed any time prior to the specific due date.

The classes themselves definitely hit on topics that we usually cover in our library workshops, such as choosing good keywords and thinking critically about the source of the information. But for the most most part, it was about more about clicking this and then clicking that…similar to a typical electronic resource demonstration.  I did get bored a few times and skipped some of the activities. Also, I never had the motivation or desire to participate in any of the forums or Hangouts, but that was mainly due to my busy schedule. Despite all of this, I’m not too proud to admit that I also learned a few things–specifically on how to specific operators and how to do an image search.

So, is Google stealing our jobs? No. (At least not right now.) What academic librarians do that Google cannot is work with researchers on the gray, messy stuff like choosing a research topic, determining what types of info are needed, and figuring out the best way to use information. If more first-year and non-traditional students took the initiative to enroll in Google’s Power Searching class, I think it would help me as a librarian to focus more on those gray areas and less on the logistics of doing a simple search. While from a pedagogical stand point I didn’t have any “Aha!” moments, I may incorporate some of their search examples into my future library sessions.

I think it would be awesome of Google collaborated with a college or university library and did this same type of class for effectively using Google Scholar for research. (If you’re reading this, Google–I’m available!)

Have any other librarians taken Google’s Power Searching class? I’d love to hear what you think of the course and its content.

Leaves of Graph

ACRLog welcomes a guest post from Pete Coco, the Humanities Liaison at Wheaton College in Norton, MA, and Managing Editor at Each Moment a Mountain.

Note: This post makes heavy use of web content from Google Search and Knowledge Graph. Because this content can vary by user and is subject to change at anytime, this essay uses screenshots instead of linking to live web pages in certain cases. As of the completion of this post, these images continue to match their live counterparts for a user from Providence, RI not logged in to Google services.

This That, Not That That

Early this July, Google unveiled its Knowledge Graph, a semantic reference tool nestled into the top right corner of its search results pages. Google’s video announcing the product makes no risk of understating Knowledge Graph’s potential, but there is a very real innovation behind this tool and it is twofold. For one, Knowledge Graph can distinguish between homonyms and connect related topics. For a clear illustration of this function, consider the distinction one might make between bear and bears. Though the search results page for either query include content related to both grizzlies and quarterbacks, Knowledge Graph knows the difference.

Second, Knowledge Graph purports to contain over 500 million articles. This puts it solidly ahead of Wikipedia, which reports having about 400 million, and lightyears ahead of professionally produced reference tools like Encyclopaedia Brittanica Online, which comprises an apparently piddling 120,000 articles. Combine that almost incomprehensible scope with integration into Google Search, and without much fanfare suddenly the world has its broadest and most prominently placed reference tool.

For years, Google’s search algorithm has been making countless, under-examined choices on behalf of its users about the types of results they should be served. But at its essence, Knowledge Graph presents a big symbolic shift away from (mostly) matching it to web content — content that, per extrinsic indicators, the search algorithm serves up and ranks for relevance — toward the act of openly interpreting the meaning of a search query and making decisions based in that interpretation. Google’s past deviations from the relevance model, when made public, have generally been motivated by legal requirements (such as those surrounding hate speech in Europe or dissent in China) and, more recently, the dictates of profit. Each of these moves has met with controversy.

And yet in the two months since its launch, Knowledge Graph has not been a subject of much commentary at all. This is despite the fact that the shift it represents has big implications that users must account for in their thinking, and can be understood as part of larger shifts the information giant has been making to leverage the reputation earned with Search toward other products.

Librarians and others teaching about internet media have a duty to articulate and problematize these developments. Being in many ways a traditional reference tool, Knowledge Graph presents a unique pedagogic opportunity. Just as it is critical to understand the decisions Google makes on our behalf when we use it to search the web, we must be critically aware of the claim to a newly authoritative, editorial role Google is quietly staking with Knowledge Graph — whether it means to be claiming that role or not.

Perhaps especially if it does not mean to. With interpretation comes great responsibility.

Some Questions

The value of the Knowledge Graph is in its ability to authoritatively parse semantics in a way that provides the user with “knowledge.” Users will use it assuming its ability to do this reliably, or they will not use it at all.

Does Knowledge Graph authoritatively parse semantics?

What is Knowledge Graph’s editorial standard for reliability? What constitutes “knowledge” by this tool’s standard? “Authority”?

What are the consequences for users if the answer to these questions is unclear, unsatisfactory, or both?

What is Google’s responsibility in such a scenario?

He Sings the Body Electric

Consider an example: Walt Whitman. As of this writing, the poet’s entry in Knowledge Graph looks like this (click the image to enlarge):

You might notice the most unlikely claim that Whitman recorded an album called This is the Day. Follow the link and you are brought to a straight, vanilla Google search for this supposed album’s title. The first link in that result list will bring you to a music video on Youtube:

Parsing this mistake might bring one to a second search: “This is the Day Walt Whitman.” The results list generated by that search yield another Youtube video at the top, resolving the confusion: a second, comparably flamboyant Walt Whitman, a choir director from Chicago, has recorded a song by that title.

Note the perfect storm of semantic confusion. The string “Walt Whitman” can refer to either a canonical poet or a contemporary gospel choir director while, at the same time, “This is the Day” can refer either to a song by The The or that second, lesser-known Walt Whitman.

Further, “This is the Day” is in both cases a song, not an album.

Knowledge Graph, designed to clarify exactly this sort of semantic confusion, here manages to create and potentially entrench three such confusions at once about a prominent public figure.

Could there be a better band than one called The The to play a role in this story?

Well Yeah

This particular mistake was first noted in mid-July. More than a month later, it still stands.

At this new scale for reference information, we have no way of knowing how many mistakes like this one are contained within Knowledge Graph. Of course it’s fair to assume this is an unusual case, and to Google’s credit, they address this sort of error in the only feasible way they could, with a feedback mechanism that allows users to suggest corrections. (No doubt bringing this mistake the attention of ACRLog’s readers means Walt Whitman’s days as a time-traveling new wave act are numbered.)

Is Knowledge Graph’s mechanism for correcting mistakes adequate? Appropriate?

How many mistakes like this do there need to be to make a critical understanding of Knowledge Graph’s gaps and limitations crucial to even casual use?

Interpreting the Gaps

Many Google searches sampled for this piece do not yield a Knowledge Graph result. Consider an instructive example: “Obama birth certificate.” Surely, there would be no intellectually serious challenge to a Knowledge Graph stub reflecting the evidence-based consensus on this matter. Then again, there might be a very loud one.

Similarly not available in Knowledge Graph are stubs on “evolution,” or “homosexuality.” In each case, it should be noted that Google’s top ranked search results are reliably “reality-based.” Each is happy to defer to Wikipedia.

In other instances, the stub for topics that seem to reach some threshold of complexity and/or controversy defers to “related” stubs in favor of making nuanced editorial decisions. Consider the entries for “climate change” and the “Vietnam war,” here presented in their entirety.

In moments such as these, is it unreasonable to assume that Knowledge Graph is shying away from controversy and nuance? More charitably, we might say that this tool is simply unequipped to deal with controversy and nuance. But given the controversial, nuanced nature of “knowledge,” is this second framing really so charitable?

What responsibility does a reference tool have to engage, explicate or resolve political controversy?

What can a user infer when such a tool refuses to engage with controversy?

What of the users who will not think to make such an inference?

To what extent is ethical editorial judgment reconcilable with the interests of a singularly massive, publicly traded corporation with wide-ranging interests cutting across daily life?

One might answer some version of the above questions with the suggestion that Knowledge Graph avoids controversy because it is programmed only to feature information that meets some high standard of machine-readable verification and/or cross-referencing. The limitation is perhaps logistical, baked into the cake of Knowledge Graph’s methodology, and it doesn’t necessarily limit the tool’s usefulness for certain purposes so long as the user is aware of the boundaries of that usefulness. Perhaps in that way this could be framed as a very familiar sort of challenge, not so different from the one we face with other media, whether it’s cable news or pop-science journalism.

This is all true, so far as it goes. Still, consider an example like the stub for HIV:

There are countless reasons to be uncomfortable with a definition of HIV implicitly bounded by Ryan White on one end and Magic Johnson on the other. So many important aspects of the virus are omitted here — the science of it, for one, but even if Knowledge Graph is primarily focused on biography, there are still important female, queer or non-American experiences of HIV that merit inclusion in any presentation of this topic. This is the sort of stub in Knowledge Graph that probably deserves to be controversial.

What portion of useful knowledge cannot — and never will — bend to a machine-readable standard or methodology?

Ironically, it is Wikipedia that, for all the controversy it has generated over the years, provides a rigorous, deeply satisfactory answer to the same problem: a transparent governance structure guided in specific instances by ethical principle and human judgment. This has more or less been the traditional mechanism for reference tools, and it works pretty well (at least up to a certain scale). Even more fundamental, length constraints on Wikipedia are forgiving, and articles regularly plumb nuance and controversy. Similarly, a semantic engine like Wolfram Alpha successfully negotiates this problem by focusing on the sorts of quantitative information that isn’t likely to generate so much political controversy. The demographics of its user-base probably help too.

Of course, Google’s problem here is that it searches everything for every purpose. People use it everyday to arbitrate contested facts. Many users assume that Google is programmatically neutral on questions of content itself, intervening only to organize results for their relevance to our questions; Google, then, has no responsibility for the content itself. This assumption is itself complicated and, in many ways, was problematic even before the debut of Knowledge Graph. All the same, it is a “brand” that Knowledge Graph will no doubt leverage in a new direction. Many users will intuitively trust this tool and the boundaries of “knowledge” enforced by its limitations and the prerogatives of Google and its corporate actors.

So:

Consider the college freshman faced with all these ambiguities. Let’s assume that she knows not to trust everything she reads on the internet. She has perhaps even learned this lesson too well, forfeiting contextual, critical judgment of individual sources in favor of a general avoidance of internet sources. Understandably, she might be stubbornly loyal to the internet sources that she does trust.

Trading on the reputation and cultural primacy of Google search, Knowledge Graph could quickly become a trusted source for this student and others like her. We must use our classrooms to provide this student with the critical engagement of her professors, librarians and peers on tools like this one and the ways in which we can use them to critically examine the gaps so common in conventional wisdom. Of course Knowledge Graph has a tremendous amount of potential value, much of which can only proceed from a critical understanding of its limitations.

How would this student answer any of the above questions?

Without pedagogical intervention, would she even think to ask them?

Personal Content Capitalism

I’ve been hearing less and less about Google+ lately, the social network launched by the search giant over the summer. I can’t comment on its functionality because I haven’t tried it; while I’m interested, I’ve got a couple of big projects going on and don’t have the bandwidth right now for an additional flavor of social media. However, my partner is on Google+ and recently let me know that he added me to a circle. I have a Google account and use lots of other Google services, but feels weird that people I know can add me to Google+ circles even though I’m not using the service.

It’s worth thinking about the way social media and internet services are monetizing (or trying to monetize) our personal content. Like many librarians and academics I rely on these services frequently, though I’ve lately begun to question whether the advantages and convenience that they provide are worth it. Last month the professional social networking website LinkedIn retreated from an earlier decision to include photographs from their users’ profile pages in ads for the service. This was just the latest in what seems to be an ever-increasing number of news items about social media companies that push their users’ comfort levels with privacy a bit to far.

A few months ago I quit Facebook because I was concerned that their privacy policies are growing evermore fluid at the same time that everyone seems to be using it to post information about events, photos, etc. Every time I commented on a friend’s wall or uploaded a picture of my kid I felt like I wasn’t getting nearly as much out of my end of the relationship as Facebook was from me. I have to admit, though, that I do miss the easy access to information from a wide range of folks I know from many stages of my life.

Like Facebook, Google uses our personal content to sell ads. Of course, selling internet ads is Google’s whole business: we are Google’s product, and the longer Google can keep us online, the more money they can make selling ads. I don’t use Gmail because I have another email provider. But I’m a heavy user of other Google services. I keep my personal schedule in Google Calendar because at our library we use it for our internal scheduling. I use Docs to collaborate with colleagues everywhere: in my library (though we are shifting to an internal wiki for much of that), with colleagues across the university system where I work, and with long-distance collaborators. And checking in with Google Reader is a staple of my daily routine.

But lately I’m reconsidering all of the personal content I’ve willingly given to internet services. I’m not sure how to ramp down my use of these tools that I’ve become so dependent on, especially given the number of people I work and communicate with who use the same tools. What’s the appropriate balance of control over our personal content and convenient, useful services? And how should we help guide students in making these same decisions?