The Internet Librarian – an interview with Brewster Kahle

07.28.13

First published in Smith Journal Vol. 5

“It’s a pretty big ambition to try and collect all the books, music, video, web pages and software ever made,” says Brewster Kahle, somewhat self-effacingly. We’ve only been talking for 90 seconds, and already he’s told me about his plans to build the second version of the Library of Alexandria – “Alexandria 2.0”. Out of anybody else’s mouth, the idea might seem grandiose, or absurd, but there’s something reassuring about Kahle. There’s an irrepressible, geeky passion here that makes you really want to believe him.

Kahle is the man behind the Internet Archive, a not-for-profit online library that takes as its motto “Universal access to all knowledge”. And when they say “all knowledge”, they mean all knowledge – everything that humans have produced that contains information about us as a culture and species. At the time of writing, the Archive is home to 1,000,395 movies, 107,400 concerts (almost one-tenth of which have been contributed by fans of The Grateful Dead), 1,391,810 audio recordings and 3,618,464 books and texts, all-increasing at considerable pace. Since 2000, it’s been recording 20 channels of TV – amongst them Russian, Chinese and Iraqi; the BBC, Al Jazeera and even Fox News – 24 hours a day in DVD quality. It’s collating lectures from universities around the world, boasts a significant collection of old and difficult to find software and has collaborated with NASA to bring in their entire multimedia collection.

Perhaps most famously, the Internet Archive has been making a full copy of the World Wide Web every two months since late 1996. This task is exactly as impossible as it sounds. Using specially developed software, similar to Google’s ‘spiders’, the Archive crawls the web and makes copies of every publicly accessible website they can find. At the moment, the internet is home to an estimated 628 million active websites. When the Internet Archive started, it was a few hundred thousand. “Right now it’s a bit of a struggle, because the web has gotten so big,” says Kahle, once more showing a knack for understatement. “Knowing exactly when you’re done is very difficult.”

The result of this Herculean effort is the Wayback Machine (named after the WABAC Machine from The Rocky and Bullwinkle Show), a free, open resource that allows users to search for individual web pages and see how they appeared at different moments in time. The Internet Archive describes it as a ‘three dimensional index’. As Kahle explains, “It’s about history having a time axis. I think the key is to not be locked into a perpetual present, or to have the Orwellian problem of seeing the past changed underneath you. We want to make it so that if you saw it before you can see it again.”

It’s believed that the original Library of Alexandria was built in Egypt by the conquering pharaoh Ptolemy I Soter somewhere around the turn of the third century BC. A general of Alexander the Great’s army turned historian and scholar, Ptolemy charged the Great Library with collecting all the world’s knowledge and it did so with zeal, archiving tens of thousands of scrolls concerning history, science, philosophy, technology and medicine. As legend has it, the phrase “The place of the cure of the soul” was carved into the walls. But the Library nonetheless remains a mystery – its significance, what it contained, how vast it was. Our ignorance is so comprehensive that no-one is entirely sure when it actually burnt to the ground, or who did it. Julius Caesar may have been involved. But that was the thing with libraries in the pre-printing age. When you destroyed them – and usually a single flame would do – that was it.  All the knowledge contained within vanished from this world. As Kahle tells me, “One of the lessons from the Library of Alexandria – a place probably best known for burning – is don’t keep just one copy.” The Internet Archive is based in San Francisco, but has back-ups in Amsterdam and, fittingly, Alexandria. Thanks to a direct cable to Europe, it survived the Egyptian Revolution without a second of downtime.

The Great Library has been Kahle’s touchstone for the length of his career. “The idea of having everything in the world online has been promised for decades. It just struck me as a problem that seemed doable,” he says, before adding with a grin, “I haven’t had a new idea since.” Kahle first started exploring the concept back in 1980, a graduate fresh out of MIT, but “there were a bunch of pieces missing. Computers were weak, search was non-existent, networks were unreliable and there was no way to publish online.” So, Brewster spent the rest of the decade filling in the gaps. He helped build the Connection Machine, a new breed of supercomputer, before creating the wide area information server (or WAIS), the first online search and publishing mechanism, and a precursor to the World Wide Web.

In 1996, after the Web became standard (and WAIS became bankrupt), Kahle founded both the Internet Archive and the web analytics company Alexa – also, unsurprisingly, named after the Library of Alexandria. He sold Alexa to Amazon in 1999 for around $250 million and in 2002 turned his attentions toward the Archive full-time. He remains both its director and primary benefactor. These days, across all its services, the Internet Archive boasts around 2.5 million unique users each day. In terms of sheer information stored, it’s on par with the world’s largest physical archive – the Library of Congress – but is growing at a much faster rate.

“I’m a geek and we’re bringing in terabytes every day. We’re almost up to 10 petabytes. It’s a lot,” he says, laughing. “The vastness of it makes my head spin. And it’s more interesting than Borges’ Library of Babel where every book was just a random collection of characters. That’s not at all what this is turning out to be.”

In 2006 the Internet Archive founded the Open Library, an attempt to create “one web page for every book ever published”. It was almost two years after Google had announced its own Books project, and less than 12 months after they had been hit with their first lawsuit. But the aims of the Open Library were and continue to be different. For one, the books are scanned in at a quality that leaves the pages looking crisp and real. To Kahle, it’s of utmost importance that the experience resembles that of reading an actual book. They also try to keep a physical copy of every book scanned, storing them in specially converted, climate controlled shipping containers. Although, as Kahle sees it, this is less like a library and more like the Svalbard Seed Vault, a huge, static repository of human knowledge ready to step in should anything untoward happen to our digital networks and/or civilisation.

For another, the Open Library is much more like a real library than Google could ever be. Staunchly non-profit and with an ethos of free access to entire works, the Open Library has set itself up in opposition to the more restrictive, low-quality uses permitted by the Google empire. “To the extent that Google can use its money and its inspiring vision to get things done, that’s great, but claiming ownership of it, or locking it up doesn’t serve anybody. And these companies come and go. With Open Source, with Mozilla, with Wikipedia – we’re kind of making a new sense of how to make infrastructure go. It’s not part of the government or the private sector, but it’s darn durable.” As a potent example of this trend, when GeoCities finally shut down in 2009, it was the Internet Archive that made the final copy of the once omnipresent network.

To Brewster, this division between Google’s efforts and his own is merely the symbol of a broader struggle between visions of the internet as the ultimate democratic medium and one controlled by corporate forces. “I remember being at a conference in 1992 and someone saying ‘I’m here as the token dot com’. It was such a novel idea that you could make money by publishing on the internet. It was all non-profit back then.” He goes on, “But now it’s become so commercial that it’s almost unrecognisable. Almost everything is basically trying to shill something to you. It’s a little slimy and gross.”

Then there’s this influence from Apple’s app world and that’s just scary,” he continues. “It feels like a closed garden rather than the Wild West of general purpose computing. I like being surprised and I don’t want to lose that to this commercialisation crunch. Because commercialisation requires consolidation and monopolisation. I really want to try and keep there from being central points of control.”

Talking to Kahle, you’re consistently given the impression of a man who sees in the internet humanity’s greatest, most interesting achievement and counts himself lucky to be so intimately involved with its preservation. When I ask him what his favourite thing in the Archive is, he responds with “Are you a geek?”. “On occasion,” I reply. I am, but I feel like geekdom is a hard thing to claim when in the presence of a man who helped build the internet. He powers on, “I think this is so cool.” He sends me the Open Library link to a beautifully designed, brightly coloured version of Euclid’s Elements, the set of theorems and proofs that birthed modern mathematics. (Aptly enough, Euclid composed his materwork while in residence at the original Library of Alexandria.) Kahle tells me that this particular version was made by a “crazy Brit” stationed in the Falklands in the mid-1840s, a guy with more money than sense, and could have been a footnote lost to the world, but now Kahle has been using this strange, improbable 170-year-old labour of love to teach his own son geometry. It’s hard not to get swept up in the arc of personal and monumental history that this small story contains.

After we’ve finished paging through a proof of the Pythagorean theorem, he continues, “I just like the wacky stuff. It’s this idea that people are very particular and very peculiar. That we’re really not a bunch of Homer Simpsons, sitting around waiting for something to happen. People, when given the opportunity to create weird things will create really weird things. The internet reaffirms this.”

Towards the end of our interview, I ask him about what he sees in the Archive’s future. After taking a second to think, he replies “I hope that the Internet Archive becomes one of many around the world built on the idea of the Library of Alexandria.” He goes on, “The dream is that wherever you might be – whether you’re in the middle of the bush in Australia, or in Cleveland, Ohio, and surrounded by people that don’t understand you – you could still go and check out anything, ever. And if you have something worth adding, it’s easy to add to the library that is us. That’s the vision that the internet was built on and it’s the vision that the Internet Archive is trying to fulfil.”