Nov 1, 2022

Publication of our whitepaper on Best Practices for Admissibility of Web Archives

In the autumn of 2022, we convened two pivotal workshops to focus on the evolving role of archiving web pages and social media in the context of international justice, particularly concerning Russia’s war against Ukraine. These workshops, one technical and the other legal, aimed to explore how recent advances in web archiving could support the collection, storage, authentication, and utilization of digital evidence in accountability proceedings for victims of the conflict.

The discussions formed the basis of a whitepaper, authored by Scott Martin (Global Justice Advisors) and Basile Simon (Starling Lab) and set of best practices that outline the ideal characteristics of a web archive for use in court, drawing on the requirements of the Berkeley Protocol on Digital Open Source Investigations.

Best Practices for Web Archiving

According to this whitepaper, the ideal web archive demonstrates the following properties:

  • It can be produced by anyone, notably by individual actors with tools they can grasp and control (as opposed to using a commercial service or being granted access to a platform). This is correlated, to an extent, with the use of open-source and local software.
  • It is of high fidelity, meaning it was carried out by a tool that preserved most, if not all, of the original material.
  • It includes the content itself, its surrounding metadata, the metadata of the web scraping software. This includes cryptographic hashes of all website assets and the signature of these hashes authenticating it to the author.
    • Furthermore, cryptographic hashes and signatures must be preserved, that is to say, stored securely and made available for the long term, as would the content itself.

 

Establishing Clear Methodologies

To maximize the admissibility of a web archive as evidence, archivists and legal professionals must establish clear, detailed methodologies. These methodologies should document the provenance of the digital evidence — detailing where it comes from, how it was procured, who procured it, when it was procured, and the process followed. This includes documenting the chain of custody and demonstrating that the webpage has not been altered during archiving.

Key points include:

  • Detailed Record-Keeping: Identify the person conducting the archiving, their qualifications, and the web collection protocols observed. Describe the hardware and software used, and explain the process for selecting and assessing websites and articles for credibility and resistance to manipulation.
  • Storage Protocols: Describe measures against corruption, hacking, and other risks to ensure the integrity of the archives over time. This should be recorded in a chain of custody that tracks who has handled the document.

Background on Workshops

Technical Workshop: Enhancing the Integrity of Web Archives

On August 25, Starling brought together experts in web archiving to discuss methods to preserve information for accountability purposes in Ukraine. The workshop delved into various collection, authentication, and preservation strategies, emphasizing the technical aspects that ensure the integrity of recorded web pages and other digital materials.

Participants first examined existing web archiving practices and their operation on a technical level, then discussed the potential risks to these archives that could threaten their integrity. A significant focus was on the vulnerabilities of storing web archives using traditional archival models. The discussion highlighted how a shift towards more distributed and decentralized models could offer improved long-term resilience and availability, essential for maintaining the integrity of the archives in unpredictable environments.

We thank the following participants for their contributions:

  • Mark Graham, from the Internet Archive;
  • Ilya Kreymer, from WebRecorder;
  • Michael Nelson, from the Old Dominion University;
  • Nicholas Taylor, expert witness in the Internet Archive Wayback Machine;
  • Ed Summers, from the Stanford Libraries;
  • And Cade Diehm, from the New Design Congress.

 

Legal Workshop: Web Archives in the Courtroom

Following the technical discussions, a roundtable of legal experts convened on September 27 to explore the legal dimensions of web archiving practices. This group included lawyers specializing in war crimes and legal professionals experienced with digital evidence. The goal was to identify potential legal vulnerabilities in current archiving practices and determine how such materials could be admitted into evidence in courtrooms, particularly in war crimes and other international criminal proceedings.

The legal experts articulated best practices to ensure that web archive data are preserved, produced, and authenticated in ways that maintain their integrity. This enhances their reliability, utility, and probative value as evidence in a judicial context. The roundtable discussed the characteristics and challenges of various web archiving practices and presented a framework to assess these methods.

We thank the following participants for their contributions:

  • Scott Martin, from Global Justice Advisors;
  • Melissa Bender, from Ropes and Gray LLC;
  • Tim Parker, from Blackstone Barristers;
  • Cari Spivack, from the Internet Archive;
  • Karolina Aklamitowska, from Tallinn University;
  • Clare Stanton, from Harvard Law School;
  • Bastiaan van der Laaken, from the UN IIIM Syria.

Next Steps: Call for Contributions on Witness Servers

Finally, to improve on the process of entering web archives into evidence, Starling are formalizing the concept of “Witness Servers” as an additional layer of self-corroboration for web archives. A Witness Server is a service, hosted and run by an institution, which carries out web crawls on-demand on behalf of individuals conducting web archiving activities.

Participating institutions, e.g. the Stanford Libraries, WebRecorder, or the Harvard Library Innovation Lab, bestow the individuals or team they accept to witness with the trust that might be placed in the institutions themselves. The roundtable findings identified the reliance on the social trust placed in institutions as particularly supportive of strengthening the work of potentially vulnerable investigators and archivists.

Several Witness Servers act in concert on the instruction of a web archivist and simultaneously capture the same web page. Such an approach addresses the possibility of a webpage having slight variations depending on locale (and many other potential anomalies) and works to otherwise corroborate the contents of a website through a replication process that validates the contents of a web archive from several different locations and actors.To learn more, participate as an institution or a researcher, read the Call for Contributions.

Privacy Preference Center