De-Anonymizing Wikipedia


A new website (down at the moment after getting linked by nearly every high-profile blog there is) identifies the source of anonymous Wikipedia edits coming from governmental agencies, political parties, and corporations. Wikipedia Scanner correlates the edits' IP addresses with the blocks of IP addresses known to be owned by those organizations, and presents that information as a publicly-accessible database. When it gets back on its feet, that is.

Users have uncovered a number of self-interested changes. Someone at Diebold erased a swath of critical information about disputed elections and the Diebold CEO's fundraising from the Diebold entry. The edit was quickly reverted by another user, however, who admonished the vandalism. The FBI removed aerial photographs of Guantanamo Bay. The MPAA massaged the entry on DRM. Edits trace back to the Republican and Democratic parties , as well.

No matter what the search engines say , IP addresses are personally-identifying information.

More and more judges have been relying on Wikipedia in their opinions, and this should give them pause; you just can't trust the accuracy of something so mutable. Sure, the definition of “firewire”[pdf] probably won't be too controversial, but once you get in the habit of treating Wikipedia as authority, it'll be hard to stop. A friend of mine discovered this when she went to Wikipedia looking for a source for a brief she was writing. The article didn't cover what she wanted to rely on it to say – so she edited the page. (Needless to say, she didn't cite to Wikipedia after that).

Around the web