
Preserving Harvard Public Health on the Distributed Web
In late February 2025, Harvard Public Health magazine announced it was shutting down. Editor-in-chief Michael F. Fitzgerald delivered the news plainly: journalism is expensive and outside of a university’s core mission of teaching and research. The magazine, which had relaunched as a digital-first publication and tripled its readership in the final year, ran out of time to build sustainable revenue streams.
The closure is another casualty in the ongoing challenge of funding quality public health journalism. Harvard Public Health had covered topics ranging from the Flint water crisis to processed foods, mental health, and structural racism – work that won recognition and built an audience of nearly 15,000 newsletter subscribers, 90 percent of whom had no connection to Harvard.
Fitzgerald encouraged readers to download articles they found useful before the site eventually goes dark. But individual downloads are an imperfect solution for preserving a decade’s worth of public health journalism. The question became: how do we ensure this body of work remains accessible to researchers, public health professionals, and the public long after Harvard stops paying the hosting bills?
The Distributed Press Clone API
Distributed Press is an open-source publishing tool developed by Hypha Worker Co-operative and Sutty Coop that automates publishing and hosting content to both the traditional web and decentralized protocols like IPFS and Hypercore. Where traditional web hosting depends on a single provider, distributed networks allow anyone to help co-host content – making it resilient against the kind of institutional decisions that led to Harvard Public Health’s closure.
The recently launched Clone API makes this preservation process remarkably straightforward. Rather than manually downloading and re-uploading thousands of pages, the API can crawl an entire website and publish it directly to IPFS and Hypercore.
The entire preservation took three API calls and about an hour of automated crawling.
Technical Walkthrough
Here are the steps to preserve a website using the Distributed Press Clone API:
Step 1: Obtain API Access
Distributed Press requires an authorization token. Instructions for requesting access are available at distributed.press/2024/10/18/get-a-token/. Once obtained, the token is entered into the Distributed Press API Swagger interface.
Step 2: Create a Site Configuration
Using the API, create a new site with the target domain (harvardpublichealth.org) in the configuration bundle. This tells Distributed Press which domain to crawl and how to structure the resulting archive.
Step 3: Initiate the Clone
Call the Clone API endpoint to begin crawling. The system works in the background, following links and downloading assets. For a site the size of Harvard Public Health, this takes approximately one hour.
Step 4: Query Site Information
Once complete, query the site information to retrieve the archive’s addresses on both protocols. The API returns a JSON response containing the IPFS CID (Content Identifier), Hypercore key, and various gateway URLs.
Accessing the Archive
The preserved Harvard Public Health site is now available at two permanent addresses:
IPFS:
bafybeihg5mdtwrfa4ywm4orsloojvysuaurr36p57tlwshydv2rgbxws5a
Accessible via gateway:
https://bafybeihg5mdtwrfa4ywm4orsloojvysuaurr36p57tlwshydv2rgbxws5a.ipfs.ipfs.hypha.coop/
The IPFS CID is a cryptographic hash of the content itself. As long as anyone on the IPFS network continues to pin this CID, the content remains accessible – regardless of what happens to Harvard’s servers. The same principle applies to the Hypercore key. These addresses are permanent: they will always point to this exact version of the site.
Why This Matters
Web content is fragile. Studies have found that the average lifespan of a webpage is measured in years, not decades. Institutional priorities shift, budgets get cut, and valuable archives disappear without warning. The Harvard Public Health closure is a textbook case: the institution decided journalism wasn’t core to its mission, and a decade of public health reporting became an orphan.
Traditional archiving solutions like the Internet Archive’s Wayback Machine provide important preservation, but they operate as centralized services with their own resource constraints and priorities. Distributed protocols offer a complementary approach where preservation becomes a collective responsibility. Anyone who values this content can pin it to their own IPFS node, helping ensure it remains available.
For researchers and public health professionals, this archive preserves not just individual articles but the navigational structure of the original site – the way topics were organized, the relationships between pieces, and the editorial voice that connected them. That contextual integrity is often lost when content is scattered across individual downloads or incomplete archive snapshots.
Replicating This Process
Organizations with valuable web archives, whether facing shutdown or simply wanting redundant preservation, can use this same approach. The process requires:
- An account with Distributed Press (contact the team for access)
- A domain configuration for the target site
- Sufficient time for the crawl to complete
The resulting archives can be pinned by anyone with an IPFS node or Hypercore peer, distributing the preservation responsibility across multiple parties and jurisdictions.
For those interested in the technical implementation, the Distributed Press documentation is available at docs.distributed.press, and the Clone API is documented in the API reference.
Next Steps
This preservation effort was initiated independently by Hypha, but demonstrates a workflow that could be systematically applied to at-risk publications. As journalism continues to face financial pressures, having fast, reliable tools for distributed archiving becomes increasingly important.
We encourage institutions facing similar closures to consider distributed preservation as part of their wind-down planning. We also encourage researchers and archivists to pin this CID and contribute to the network of peers keeping Harvard Public Health’s journalism accessible.
If you have questions about applying these techniques to your own preservation needs, reach out to the Distributed Press team via their documentation site, or contact Starling Lab at info@starlinglab.org.
This project was documented by Hypha Worker Co-operative. The preservation was completed using the Distributed Press Clone API on March 4, 2025.