Preserving the Shoah’s Memory with Web3

Preserving the Shoah’s Memory with Web3

How do we ensure these digital records survive for future generations?

Team

Reading Time: 5min

Background

USC’s Shoah Foundation Institute (SFI, or the Shoah Foundation) builds and preserves the Visual History Archive, a critical collection of more than 55,000 video testimonials of experiences of genocide, including the Holocaust and other atrocities. These firsthand accounts serve as powerful educational tools, historical documents, and memorials to those who suffered. The Foundation’s work extends beyond archiving to creating educational programs, fostering scholarly research, and promoting dialogue about prejudice and intolerance worldwide. 

The VHA Archive totals four petabytes (PBs) of digital data, and with multiple copies stored in different geographic locations for redundancy. There’s no single point of physical failure – a concern heightened by a campus located in a region known for earthquakes and wildfires: Southern California. Additionally, the Archive does not rely on a single corporate cloud storage provider. While cloud storage is generally exceptionally reliable, it can also suffer hardware and software failures, and a cloud storage provider’s corporate owners’ content policy can change to limit or refuse access to the data it holds, creating a new risk.

Context

In 2021, the Shoah Foundation decided to go a step further in the protection of this collection by implementing Web3 distributed technologies, which, by their nature, avoid single point of failure risks. We partnered with two key organizations to bring this vision to reality.

Filecoin, a decentralized storage network built on blockchain technology (and a funding and operational partner for the Lab), offers an innovative approach to securing and preserving the Visual History Archive’s extensive collection. The distributed storage infrastructure of Filecoin allowed us to ensure the protection of these invaluable testimonies across a global network.

PiKNiK, an enterprise computer infrastructure provider with extensive Web3 experience, joined our initiative as digital preservation specialists, bringing their technical expertise in adapting sensitive archival content for blockchain environments. Their experience with large-scale media collections proved essential as we navigated the complex process of implementing these new protective measures for the Foundation’s testimonials.

PiKNiK CTO Stewart Berman

Framework

The Starling Framework, which was developed at USC and Stanford and is being implemented in projects across both institutions, focuses on three phases of data’s lifecycle: Capture, Store, and Verify. The Framework provides both experts and novices with a guide to establish and evaluate data integrity.

The Challenge: Analog Problems Persist In A Digital Era

The USC Shoah Foundation’s Visual History Archive isn’t merely a collection of data—it’s irreplaceable evidence countering Holocaust denial and preserving genocide survivors’ voices. Yet despite its digital format, this 4-petabyte archive faces preservation threats eerily similar to those that plagued analog collections throughout history.

Centralization remains the core vulnerability. Just as the Library of Alexandria’s destruction resulted from concentrated knowledge in a single location, today’s digital landscape poses parallel risks. Four companies control approximately 67% of cloud infrastructure, creating dangerous points of failure. These repositories, while technologically advanced, remain vulnerable to corporate decisions, outages, and even censorship—as evidenced by content removals in countries from New Zealand to Saudi Arabia.

For the Visual History Archive specifically, the stakes transcend mere data loss. Any corruption, tampering, or inaccessibility directly threatens historical accountability. Traditional digital backups, even when duplicated across multiple systems, still face common modes of failure: bit rot (data corruption due to deterioration of storage media), natural disasters affecting data centers, technical obsolescence, or changing corporate policies.

Southern California’s earthquake and wildfire risks further complicate physical preservation of the Foundation’s digital assets. Meanwhile, the emerging threat of synthetic media and AI-generated content risks drowning out authentic historical records with false and misleading facsimiles.

Enter decentralized storage networks. With ever more powerful chips at our disposal, recent technological innovations seek to maintain, audit, and certify perfect digital preservation of content across many replicants, distributing concern and increasing resilience.

The Prototype

Modern implementations of cryptography and decentralization can be combined in powerful ways, and are critical as the internet evolves toward a new vision of the Web – an internet where data sovereignty and distributed control replace traditional centralized models.

A copy of the Visual History Archive’s collection was backed up to Filecoin. The decentralized storage network includes over 4,000 independent storage providers (also known as miners, who establish and maintain data storage nodes and win cryptocurrency rewards for proving continued storage). Filecoin has sometimes been referred to as the Airbnb of archival storage, because it allows any provider to sign up and offer space on their computer or other device. This may include traditional companies with large nodes, but can even allow individuals to support the ecosystem with the phone in their pocket. Symbolically, this is a powerful way for people to join in the preservation of humanity’s most critical records, but at scale it can also offer efficiencies. Notably, encryption ensures that the storage providers don’t know the details of any files they hold on their hard drives. All they can see is a string of incomprehensible 1s and 0s.


As a user of this system, the Shoah Foundation has more granular choices with Filecoin than simply putting data in a cloud. We chose to work with PiKNiK, one of those providers, who helped ensure the Visual History Archive collection was encrypted and then replicated across an additional seven of PiKNiK’s peer storage providers on the Filecoin network who met our pre-established criteria. This decentralized approach provides more resiliency, as copies can be distributed across geographies, providers, or servers meeting specific requirements. Instead of a complete photo, video, or document  stored as a single file on a single server, the files can be sub-divided into pieces and stored in multiple locations, on multiple storage providers, effectively scattering pieces of each file around the world. This means that even if a storage provider could hypothetically decrypt the data they hold for a user, they only have fragmentary parts of files– further mitigating security concerns. Notably, this method can make storing and retrieving (commonly referred to as ingress/egress) data slow, meaning it’s best suited for “cold storage” of data that is not regularly accessed – perfect for large backups like the Shoah Foundation’s collections that are measured in petabytes.

Technology

The preservation of the USC Shoah Foundation’s Visual History Archive (VHA) on a decentralized storage network followed the three-step methodology of the Starling Framework: Capture, Store, and Verify. Each stage ensured the integrity, security, and longevity of the archive’s four-petabyte collection, mitigating risks associated with centralized storage models.

Capture

The first stage, Capture, focused on ensuring the authenticity and integrity of the Visual History Archive’s digital assets before transitioning them to decentralized storage with PiKNiK. Initially, the collection was stored on a combination of the Shoah Foundation’s tape-based storage and their Microsoft Azure cloud instance. However, high Azure retrieval costs and concerns about long-term integrity necessitated a migration to Web3 infrastructure.

The process began with the extraction and preparation of the Visual History Archive data. Given the vast amount of data, transmitting the collection over the internet was not an option. Instead, the entire four petabyte archive was copied from tape storage to individual hard disk drives, and the drives shipped to PiKNiK in two pallets weighing a total of 795 pounds. Once at PiKNik, the drives would be loaded into racks known in the data storage industry as “JBODs”, which stands for Just a Bunch Of Disks. A rigorous file integrity check was conducted to identify and replace any instances of bit rot or data corruption. Corrupt data was repaired from other existing copies of the material.

Encryption played a critical role in this phase, with all files secured using GNU Privacy Guard with an AES-256 cipher, ensuring that only the Shoah Foundation could access them with their private encryption key. Large files were further sub-divided into smaller chunks, with each assigned a Content Identifier (CID), a unique code for each chunk of data which enables the retrieval of a specific piece of data no matter where it ends up in decentralized networks.

Store

The next stage, Store, focused on securing the collection across a decentralized storage network to prevent loss, corruption, or censorship. The encrypted testimonies were distributed to carefully selected Filecoin storage providers, each chosen based on reliability and geographic diversity. By fragmenting and dispersing the data to spread the risk across multiple groups, this decentralized strategy significantly reduced the chance of loss due to natural disasters, cyberattacks, or institutional policy shifts.

Storage providers within the Filecoin network are bound by two commitments, known as Proof-of-Replication (PoRep) and Proof-of-Spacetime (PoSt), to ensure stored data remains intact. Before they begin storing any data, storage providers must stake cryptocurrency collateral with the Filecoin network. Once they commit to store data, known as a “deal”, every day the Filecoin network tests the integrity of their stored data by challenging storage providers to prove that they have the data agreed in the deal. If they pass the test, providers can earn more cryptocurrency. But if they fail the test, some of their collateral may be taken away as a penalty. In this way Filecoin network participants have a financial incentive to ensure they take good care of the data they promised to preserve.  

Since the collection was intended for archival purposes rather than frequent retrieval, the storage model was optimized for cold storage, prioritizing long-term retention over accessibility. Metadata was preserved alongside the encrypted files, maintaining contextual integrity for future retrieval. To sustain decentralized availability, storage contracts required periodic renewals, necessitating ongoing coordination between the Shoah Foundation, PiKNiK, and the network of storage providers.

Verify – Archiving or Publishing

The final stage, Verify, ensured that the archive remained intact, tamper-proof, and accessible over time. Filecoin’s Proof-of-Spacetime validation automatically conducted daily integrity checks, confirming that storage providers continued to maintain the archive. These audits were performed without revealing file contents, preserving privacy and security, by using what is known as a cryptographic hash – a digital fingerprint that identifies a piece of data without revealing its content. The audits took advantage of a useful property of cryptographic hashes – if a piece of data changes in any way, no matter how slightly and whether due to corruption or an intentional edit, its hash changes as well.

PiKNiK developed a custom monitoring dashboard that provided the Shoah Foundation with real-time visibility into the status of stored files, retrieval performance, and hash consistency.

Verification was further strengthened through tamper detection methods that relied on hash comparisons. Upon retrieval, a file would be re-hashed and compared against its original fingerprint. Any alterations or corruption would result in a mismatch, immediately flagging data inconsistencies. This rigorous verification approach safeguarded the archive against the growing threat of synthetic media and AI-generated forgeries, ensuring that the testimonies remained authentic and distinguishable from manipulated content.

Learnings

First mover challenges

Preparation for this project began in 2018, preceding Filecoin’s mainnet launch. In this sense, the Shoah Foundation and its partners were testing the frontier of Web3 preservation before the supporting infrastructure was fully mature. The project successfully demonstrated the feasibility of securing the Visual History Archive using decentralized storage. Although the scope was ultimately limited to about half a dozen nodes. This was still an unprecedented achievement at the time—possibly the largest cultural collection ever committed to the decentralized web.

The team met its three-year preservation goal, structured around carefully designed tokenomics, but long-term sustainability depended on the broader decentralized ecosystem. Once the initial contracts concluded, continued storage faced the realities of energy demands, operational costs, and provider incentives. The project deliberately focused on integrity and redundancy of files; aspects like retrieval optimization and cost-of-service sustainability remained outside its scope.

The project used mezzanine (lower-resolution) files for distribution across Filecoin nodes, balancing resilience with feasibility. This approach allowed many providers to participate and helped distribute fragments of the collection worldwide across diverse configurations. However, the decision highlighted the ongoing trade-offs between file size, accessibility, and storage economics in Web3 environments.

Data Preparation is the Critical Phase

In the Filecoin ecosystem, preparing data for storage (rather than the storage itself) represents the greatest challenge. This highlights the value of specialized service providers like PiKNiK who can coordinate between data preparation and storage providers to ensure successful implementation.

While anyone can store files in the Filecoin network, significant data management is still required: mapping file names to CIDs (Content Identifiers); tracking storage deal durations and lifecycle management of files; encrypting and processing raw data into compatible CIDs; handling deal renewal costs; managing storage provider relationships; and providing visibility to file status and retrieval processes. These complex preparation requirements underscore why organizations undertaking large-scale preservation projects benefit from partnerships with technical specialists experienced in Web3 environments.

Strategic Planning Supersedes Technical Challenges

The most significant obstacles in the implementation of the preservation solution weren’t technical but related to project management, requirements planning, and communication. Organizations should establish clear security frameworks and implementation timelines before beginning similar preservation projects.

Financial Commitment Must Be Well-Defined

Organizations should have stable, well-planned financial frameworks when adopting Web3 preservation technologies. This includes understanding both immediate implementation costs and long-term preservation expenses, especially given the potential volatility of emerging technological ecosystems. Broader ecosystem challenges, including the “crypto winter”, constrained the ability to expand further.

The Need for Cross-Compatibility

One need that emerged clearly was the lack of S3-class compatibility—a standard that would have made the decentralized system more easily interoperable with existing archival and enterprise tools. Without this, bridging workflows between traditional storage environments and decentralized networks proved more complex than anticipated.

A Positive Experience and Next Steps

Building on the ground-breaking preservation of the Visual History Archive on Filecoin, USC Shoah Foundation is now expanding its Web3 strategy. The Foundation plans to add 2 PiBs from its innovative Dimensions in Testimony collection, which features interactive testimonies allowing viewers to converse with recordings of survivors. 

Additionally, work is underway to build a dedicated Filecoin Academic Node with a substantial 20 PB storage capacity, enabling the Foundation to not only store its collections but also actively participate in the decentralized network as a storage provider.

Privacy Preference Center