Computing Wikipedia’s Authority

Michael Jensen has predicted

In the Web 3.0 world, we will also start seeing heavily computed reputation-and-authority metrics, based on many of the kinds of elements now used, as well as on elements that can be computed only in an information-rich, user-engaged environment.

By this he means that computer programs and data mining algorithms will be applied to information to help us decide what to trust and what not to trust, much as prestige of publisher or reputation of journal performed this function in the old (wipe away tear) information world.

It’s happening. Two recent projects apply computed authority to Wikipedia. One, the University of California Santa Cruz Wiki Lab, attempts to compute and then color-code the trustworthiness of a Wikipedia author’s contributions based on the contributor’s previous editing history. Interesting idea, but it needs some work. As it stands the software doesn’t really measure trustworthiness, and the danger is that people will trust the software to measure something that it does not. Also, all that orange is confusing.

More interestingly, another project called Wikipedia Scanner, uses data mining to uncover the IP addresses of anonymous Wikipedia contributors. As described in Wired, Wikipedia Scanner:

offers users a searchable database that ties millions of anonymous Wikipedia edits to organizations where those edits apparently originated, by cross-referencing the edits with data on who owns the associated block of internet IP addresses. …

The result: A database of 34.4 million edits, performed by 2.6 million organizations or individuals ranging from the CIA to Microsoft to Congressional offices, now linked to the edits they or someone at their organization’s net address has made.

The database uncovers, for example, that the anonymous Wikipedia user that deleted 15 paragraphs critical of electronic voting machines originated from an IP address at the voting machine company Diebold.

Both of these projects go beyond the “popularity as authority” model that comes from Web 2.0 by simultaneously reaching back to an older notion of authority that tries to gauge “who is the author” and fusing it with the new techniques of data mining and computer programming. (Perhaps librarians who wake up every morning and wonder why am I not still relevant? need to get a degree in computer science.)

If you prefer the oh-so-old-fashioned-critical-thinking-by-a-human approach, Paul Duguid has shown nicely that one of the unquestioned assumptions behind the accuracy of Wikipedia–that over time and with more edits entries get more and more accurate–is not necessarily so. Duguid documents how the Wikipedia entry for Daniel Defoe actually got less accurate over a period of time due to more editing. Duguid shows how writing a good encyclopedia article can actually be quite difficult, and that not all the aphorisms of the open source movement (given enough eyeballs all bugs are shallow) transfer to a project like Wikipedia. Duguid also provides a devastating look at the difficulties Project Gutenberg has with a text like Tristram Shandy.

Evaluating authority in the hybrid world calls for hybrid intelligences. We can and should make use of machine algorithms to uncover information that we wouldn’t be able to on our own. As always, though, we need to keep our human critical thinking skills activated and engaged.

One thought on “Computing Wikipedia’s Authority”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.