We’re halfway through the International Internet Preservation Consortium’s annual web archiving conference. Here are just a few notes from our time so far:
Auto-captioned photo of Jack, Genève, and Matt – thanks CaptionBot!
April 12
-
Andy Jackson kicks the conference off with “Have I accidentally committed international journalism?” — he has contributed to the open source software that was used to review the Panama Papers.
-
Andrea Goethals describes the desire for smaller modules in the web archive tool chain, one of her conclusions from Harvard Library’s Environmental Scan of Web Archiving. This was the first of many calls throughout the day for more nimble tools.
-
Stephen Abrams shares the California Digital Library’s success story with Archive-It. “Archive-It is good at what it does, no need for us to replicate that service.”
-
John Erik Halse encourages folks to contribute code and documentation. Don’t be intimidated and just dive in.
-
There seems to be consensus that Heritrix is a tool that everyone needs but no one is in charge of — that’s tough for contributors. A few calls for the Internet Archive to ride in and save the day.
-
We’re not naming names, but a number of organizations have had their IT departments, or IT contractors, seek to run virus scanners that would edit the contents of an archive after preservation. (Hint: it’s not easy to archive malware, but “just delete it” isn’t the answer.)
-
Some kind member of IIPC reminds us of the amazing Malware Museum hosted by the Internet Archive.
-
David Rosenthal notes that Iceland has been called the “Switzerland of bits”. After being in Reykjavik for only a few days, we sort of agree!
-
Jefferson Bailey of the Internet Archive echoed concerns about looming web entropy: there is significant growth in web archiving, but a concentration of storage for archives.
-
Nicholas Taylor of the Stanford Digital Library is responsible for the most wonderful acronym of all time, WASAPI (“Web Archiving Systems API”).
-
The Memento Protocol remains the greatest thing since sliced bread. (Here we refer to the web discovery standard, not the Jason Bourne movie.)
-
We chat with Michael Nelson about his projects at ODU, from the Mink browser plugin to the icanhazmemento Twitter bot.
April 13
-
Hjálmar Gíslason points out that 500 hours of video are uploaded to YouTube each minute. It would take 90,000 employees working full time to watch it all. Conclusion: Google needs to hire some people and get on this.
-
Hjálmar also mentions Tim Berners-Lee’s 5-Star Open Data standard. Nice goal to work toward for Free the Law!
-
Vint Cerf on Digital Vellum: the Catholic Church has lasted for an awfully long time, and breweries tend to stick around a long time. How could we design a digital archiving institution that could last that long?
-
(Perma’s suggestion: how about a TLD for URLs that never change? We were going to suggest .cool, because cool URLs don’t change. But that seems to be taken.)
-
Ilya Kramer shows off the first webpage ever in the first browser ever, running in a simulated NeXT Computer, courtesy of oldweb.today.
-
Dragan Espensch says Rhizome views the web as “performative media” while showing Jan Robert Leegte’s scrollbars piece through different browsers in oldweb.today. Sometimes the OS is the artwork.
-
Matthew S. Weber and Ian Milligan have been running web archive hackathons to connect researchers to computer programmers. Researchers need this: “It would be dishonest to do a history of the 90s without using web archives.” Cue <marquee> tags here.
-
Brewster Kahle pitches the future of national digital collections, using as a model the fictional (but oh-so-cool) National Library of Atlantis. Shows off clever ways to browse a nation’s tv news, books, music, video games, and so much more.
-
Brewster encourages folks to recognize that there is no “The Web” anymore: collections will differ based on context and provenance of the curator or crawler. (What is archiving “The Web” if each of us has a different set of sites that are blocked, allowed, or custom-generated for us?)
-
Brewster voices the need for broad, high level visualizations in web archives. He highlights existing work and thinks we can push it further.
-
And oh by the way, he also shows off Wayback Explorer over at Archive Labs – graph major and minor changes in websites over time.
-
Bonus: We’re fortunate enough to grab some whale sushi (or vegan alternatives) with David Rosenthal, Ilya Kreymer, and Dragan Espenschied.
Looking forward to the next couple of days …