Watching a YouTube video of The Strokes at ACL Fest got me thinking about a short piece I wrote after being blown away by their first SNL performance in 2002. Just as I was typing out the story about how I lost hundreds of old blog posts from not having Good Backups, a scary thought occurred: “Hey I bet it’s in The Wayback Machine.” And of course most of it was and oh my God my writing was (is?) the absolute worst. Bonus points to anybody out there masochistic enough to try and find that stuff.
A couple weeks ago I touched on the memory of machines and how they have to be coerced into modeling the imperfections of the human mind. Like a typical human, I forgot that The Wayback Machine is an unintentionally perfect example of this: the staggering goal of archiving the web mixed with very real resource constraints necessarily forms gaps in the archive. For an archivist, these gaps are excruciating but they breathe life into a machine.
There’s a great New Yorker piece from earlier this year about The Internet Archive (which includes the Wayback Machine) that is just loaded with anecdotes like this one:
In 2006, David Cameron gave a speech in which he said that Google was democratizing the world, because “making more information available to more people” was providing “the power for anyone to hold to account those who in the past might have had a monopoly of power.” Seven years later, Britain’s Conservative Party scrubbed from its Web site ten years’ worth of Tory speeches, including that one. Last year, BuzzFeed deleted more than four thousand of its staff writers’ early posts, apparently because, as time passed, they looked stupider and stupider. Social media, public records, junk: in the end, everything goes.
These anecdotes are individual data points that combine over the course of 6000+ words to paint a picture of a digital archive. But even an article of this length, with this detail, leaves much to the imagination. For example, nowhere in that article was the physical scent of the archive documented. To many this is a trivial point, but there’s 36 million web pages out there that would suggest otherwise. So if the smell of the Internet Archive is deemed unimportant, the question becomes: what is important?
It’s a question of resolution, in multiple senses of the word. How granular can you be when describing the Internet Archive, or more broadly, when describing the Internet as a whole? And if you choose to make a high-resolution copy, do you have the will and the resources to see that through to the end? The decision to include a page, a site or an entire top-level domain in the Archive is just as valid as the decision whether or not to comment on the smell of the building. These decisions, in aggregate, determine the fidelity of a copy and it is these decisions – more than even the actions being recorded – that will form our collective recollection. In summary, always be nice to librarians.
 Man. Just watched that performance again and it is still so, so, so great.