Combating Racism as a Public Health Crisis
Combating Racism as a Public Health Crisis
Black Voice News & Starling Lab analyze 35+ California RPHC declarations and use verifiable data to track government action and long-term accountability.
TeamReading Time: 5min
Prototypes
Share
Contents
Background
Framework
TechnologyLearningsArchive
Fellowship Projects and Awards
See More
Background
In the modern era, data is a vital resource that all organizations and communities need to gauge and affect change. Marginalized communities can face unique challenges accessing data, compounding long standing inequities.
In 2015, Black Voice News launched Mapping Black California as a project to better understand, report on, and visualize data important to Black Californians. The initiative utilizes geospatial technology to enhance community news and storytelling and equips the Black community with data-driven tools to address regional and local systemic inequities.
The MBC team began to search for data-supported visualizations when reporting on public health. Their coverage sought to explore the profound impacts of racism on social determinants of health, including housing, education, and employment. But all too often, they found data available pertaining to Black communities is too broad or omits key information.
In the wake of George Floyd’s murder in 2020, a nationwide movement pushed elected officials and other government leaders to reckon with racism in their communities. Public outcry specifically called for the acknowledgement of structural racism in law enforcement, healthcare and community conditions. Official responses varied, ranging from simple acknowledgement of the problem to more extensive commitments to provide funding and take measurable action. Many of these took the form of declarations that racism is a “public health crisis” This presented an opportunity for MBC to collect its own data about these structural commitments, mindful of a long history of failed reforms.
BVN Executive Editor Stephanie Williams explained:
“The uprising of 2020 is reflective of the historical uprisings across decades and centuries, just as the disproportionate impacts of COVID-19 on Blacks was reflective of the same experiences the community suffered during the great flu epidemic of 1918. Though separated by 100 years, racism drove the response in both cases.
This project is meaningful and important because the recorded commitments for structural and institutional change made in the declarations of racism as a public health crisis and captured in the dashboard and Web3 authentication technology and decentralized protocols, will provide leverage for activists and others to hold officials accountable in the years to come even as the events of 2020 fade from memory. If not, 100 years from now we may find ourselves experiencing the same traumas, telling the same stories and experiencing the same results.”
Public entities across the state of California – county supervisors and commissions, city councils, and various other government associations – soon issued nearly 35 separate declarations related to racism as a public health crisis, sometimes abbreviated RPHC.
This fellowship aimed to quantify and qualify what political promises were made, to measure what action was in fact taken, and to document both in ways that could support long-term accountability.
Contents
Scope
FrameworkTechnologyLearningsArchive
Scope
For this project, The Starling Lab, Black Voice News, and Esri Mapping teamed up to develop a new-and-improved data dashboard and series of articles. Each would incorporate web archives (i.e., preserved web pages), verified as authentic through cryptographic methods, to create an evidence-based resource that could be used as a tool to measure and affect social change. The archives included data available on government and other public websites, which together help assess and quantify the efforts of various different jurisdictions.
The outcome of this project includes decentralized preservation of authenticated web archives, encompassing roughly 350 different web pages (roughly 10 per formal government declaration). This body of evidence for the reporting was published using a specially developed WordPress plugin that invites readers to verify pages as they existed on official channels.
Dr. Paulette Brown-Hinds, publisher of Black Voice News and founder of Mapping Black California, explained the focus:
“From my perspective, the project with Starling Lab allowed us to experiment with a new technology to address an old challenge: how to keep important records accessible to the public, transparent, and available indefinitely.”
Towards these goals, Black Voice News created the Combatting Racism as a Public Health Crisis data dashboard. This accountability tool and content aggregator provides users with a more detailed examination of nearly three dozen official resolutions, along with demographic data, progress reports promises, and comprehensive resources of other sources of data about these resolutions.
Researchers analyzed each jurisdiction’s resolution and mapped the commitments to a comprehensive rubric. This rubric was defined initially by a template provided by the American Public Health Association's analysis of RPHC resolutions across the nation. It was then augmented to reflect the specific high needs areas in the California landscape; specifically COVID-19 tracking metrics and inclusion of race specific language (Black, African American, Hispanic, Latino, etc).
This data dashboard establishes categories and sets of commitments within each category, allowing tracking and comparison between jurisdictions. Categories include Data & Accountability, Community Engagement, Policies & Programs, Funding, Organizational capacity/training, Economic Opportunities, Education, Housing, basic amenities, environment, and Public Safety/Policing.
These nine general categories are further delineated into 39 strategic actions, quantifying how many of these actions each jurisdiction’s public pledges promised to address.

In addition to categorizing and adding quantitative measurements, this data dashboard aimed to:
- Establish a baseline of commitments for each jurisdiction.
- Set a quantitative standard for markers of progression towards the goals.
- Utilize web tools to track, measure, clarify and report on the progress of commitments made by declarative entities.
- Use this data to publish data-informed stories to Black Voice News and external newsrooms.
- Inform local advocates, leaders, community-based organizations (CBOs), and the general community on the progression of commitments to address the real-life, day-to-day impacts of structural racism in their localities.
In each of the categories, the government organizations were rated on a scale of 0-9. Zero means none of the strategic actions in that category were taken, and nine means all strategic actions in that category were covered. The majority of the organizations have a range of 3-6 strategic actions covered in their pledges.
This project required extensive research to identify pertinent government and social media web pages, both of which are particularly susceptible to link rot. The original pages live on a variety of different platforms, many maintained by counties or cities that do not have comprehensive archiving or data preservation policies. This vulnerable set of information is likely to be important for underrepresented groups to have evidence to point to for accountability purposes. The result was an ideal situation to prototype authenticated archives, as Alex Reed, project manager of Mapping Black California, explains:
“The incorporation of encrypted and distributed archiving into our content aggregation platform is what will ensure its longevity decades to come. The documents, images, and web pages that we were able to Archive will never be destroyed and will serve as a reference point for researchers, analysts, journalists, and the public at Large for as long as the internet exists.”
Using Mapping Black California’s dashboard as the basis for her investigative series, reporter Breanna Reeves examined four different local governments – among the many – who passed resolutions that declared racism a public health crisis.
In the first article of the Combating Racism as a Public Health Crisis series, Holding Leaders Accountable, Reeves provides an overview of the events that led many jurisdictions across California to sign resolutions and make declarations. Her reporting focused on local governments who had some of the most comprehensive declarations, meaning they addressed nearly all strategic actions in their declaration. Additionally, her reporting examined local governments who made declarations but have been slow turning the declarations into actions.
The second article of the series, Santa Cruz County’s Inclusive Resolution, highlights Santa Cruz County as having a very inclusive resolution that utilized community engagement to bring many of their pledges to life.
The third, Oakland Addresses Systemic Racism with Data-driven Approach, focuses on the City’s 2022 resolution. Oakland also has a comprehensive resolution that acknowledged white supremacy as the root cause of racism. While their resolution may not meet all criteria outlined in Mapping Black California’s dashboard, the reporting suggests they were ahead of the curve in addressing racism as a public health crisis with their Department of Race & Equity, which was created in 2016.
The fourth and final article of the initial series, Riverside and San Bernardino Counties Take Action Against Racism as a Public Health Crisis, analyzes jurisdictions with resolutions that are less comprehensive – or even make no mention of taking action. Specifically, the coverage explores how San Bernardino County was the first jurisdiction in the state to pass a resolution in 2020, but in the last three years has been slow to act.
This series showcases the power of data and verifiable evidence, and demonstrates a new methodology - and standard - for evidence-based data reporting. As Reeves shared about the experience:
“Collaborating with Starling Lab and Mapping Black California to publish the Combating Racism as a Public Health Crisis is perhaps one of the most important projects I've done thus far. As a reporter, my job is to hold those in power accountable. I believe this project did exactly that by analyzing promises made by dozens of local governments and inquiring on behalf of the communities these promises were made to.”
Contents
Framework
TechnologyLearningsArchive
Framework
Mapping Black California had previously published an interactive map of official RPHC declarations by local and regional governments. This map, however, was out of date, lacked ways to present verification information about the data, and needed quantifiable measurements and comparisons. We saw this as an opportunity to implement the Starling Framework: Capture, Store, Verify.
The Challenge
Working with Starling Lab, the Black Voice News (BVN) investigative team worked to update the site and to archive and analyze data available on government websites, social media, public databases, and recordings of meetings and events.
The original version of the map links to public records scattered across the web, each at risk of disappearing when the owner of a site changes what is hosted on that URL. When Pew Research looked at web pages from 2013 to see if they were available a decade later, they found that 38% had disappeared by 2023. The same study discovered that a quarter of pages online from 2013 to 2023 were no longer available. In addition to the pages going offline, redirect links can also lead users to dead ends. Google announced that they will soon shutter their URL shortening service, breaking all of its historical links.
Throughout the reporting done for the original interactive map, an important theme emerged: Members of the public wanted to hold their leaders accountable for pledges and promises made after these resolutions were passed. Oftentimes, records which are supposed to be publicly available are in reality difficult and time-intensive to access.
It’s easy to assume that information about these official commitments would be kept available through the government itself. On paper, California has several good public records laws. In practice, entities all over the state have faced scorn over their inability or even refusal to provide a variety of public records to the public.
Even outside of government, all too often digital records are kept by singular, powerful organizations or services that can remove or modify the information at will – or charge users exorbitant fees. Vows by officials made on social media posts can be deleted or entire accounts made private.
Since this map and data was meant to be a public-facing reference, an interface was needed that is accessible to a general audience. The tool should present an understandable and quantifiable summary of efforts towards combating racism as a public health crisis. The original map didn’t show quantifiable progress or data comparing one region to another, so it was difficult for the public to gain meaningful insight through cross-comparisons. Displaying a view that enables the comparison of different jurisdiction’s efforts – with evidence to support this quantified data – is an important step towards understanding progress and holding governing bodies accountable.
Finally, in addition to official documents, the team wanted to capture content from sites like YouTube, social media, and other places where evidence might be published online. This supporting material is essential for the work of journalists and investigators who are looking for corroborating information and context around a story or investigation. It must also be preserved and displayed in a way that holds up against standards of evidence they may be subject to in the future, a necessary step when public records are maintained by fragile, poorly-resourced public jurisdictions.
The Prototype
Using the Starling Framework, we can preserve cryptographically secured versions of these documents and records to build a better data resource. By capturing, preserving, and displaying complete versions of these webpages we are able to demonstrate rich context as part of a new integrity workflow, and to showcase the value of using cryptographic signatures and registered hashes to secure provenance of captured content. This project resulted in a verifiable body of evidence, with integrity, for holding leaders accountable through reporting on facts.
For this project, Starling Lab teamed up with Black Voice News to collect about 350 websites, including social media posts, along with metadata files for each archive (a total of over 1100 records) that supported the data shown in the interactive dashboard. Because this was such a large body of evidence, we needed a streamlined way of capturing and preserving the websites with evidence of pledges, promises, and statements. Using web archive technology called Browsertrix developed by the Webrecorder team, in combination with Starling Integrity tools, we prototyped a new workflow for journalists and investigators to use for capturing and preserving public data from the web.
To present the captured information, BVN and Starling Lab worked with the Esri special projects team to build an interactive data dashboard that brings a massive body of data together in one place. This dashboard aggregated and analyzed information, the authenticity of which was preserved by Starling Lab, about the progress in different categories of combating racism as a public health crisis. Having this information in one comprehensive tool enables others to understand, compare, and hold jurisdictions accountable. To ensure data redundancy and availability, the evidence files are distributed on IPFS and archived on Filecoin, which allows them to be verifiably retrieved from multiple sources in the future.

In addition to preserving and distributing replayable web archives as evidence and sources, a complete package of the data was hashed and signed at time of capture. These cryptographic hashes serve as fingerprints registered on several distributed ledgers, creating an immutable index of what exactly existed, who witnessed it, and when the data archiving occurred.
This prototype is an experiment to snapshot web content, from hundreds of sources, into authentic public record, with special consideration for preserving original source information from capture to presentation. Through this project, the team hopes to inspire new standards in journalism and other investigative research.
Contents
Scope
Framework
Technology
LearningsArchive
Technology
Capture
Archiving Content on the Web
The approach to web capture taken by Black Voice News and Starling Lab was unique. Much of the data was collected from local government websites which may self-host video and databases which are hard to search and access without special investigative techniques. This is web content that is also subject to take down and link rot as budgets are changed or staff turns over, and systems aren’t maintained. To address the lack of access and likely disappearance of this web content, we created authenticated web archives.
Web archives created with the Webrecorder suite of tools enabled the team to capture the full context of everything that existed on the web page in a zip archive called a WACZ file. The information collected includes all content on a webpage, such as articles, comments, likes, and multimedia files such as audio and video. A WACZ file is a copy of the code and media that makes up that webpage as it appeared, including an index of what content was captured. When users later display (or “replay”) the page using certain tools, it remains fully interactive like it was at the time of capture.
First, most of the web archives were created with an automated tool called Browsertrix (on a custom instance operated by Starling Lab). Some were created manually with a Chrome extension called Archiveweb.page when websites were too complex to crawl with a bot. These tools visit a site on the world wide web and scrape all the content loaded during the browsing session, including site code and assets.
The list of all the URLs to crawl, along with the other metadata fields were researched and populated by the Black Voice News investigative team. In addition to specifying names for each web archive, we included several additional metadata fields. Each URL was assigned a jurisdiction and geographic identifier so it could later be correlated with the correct region in the interactive map created by Esri.

A custom Python script was used to take the spreadsheet that listed all the URLs and metadata fields to automate the Browsertrix web crawls via its API. When these tools capture an archive, it preserves the data in such a way that, when the archives are replayed, each part of the website can be replayed as it was witnessed by the Browsertrix crawler, along with information that the end user can inspect about the authenticity of this web archive.
Signing and Verification of Web Archives
When a WACZ record of a website was created, an immutable digital ‘fingerprint’ called a hash was also generated. If any single byte of the data is changed, be it a pixel of an image or the timestamp of when it was collected, the hash used to verify copies of that page will change as well. It is important for a newsroom like BVN to establish this fingerprint in case the provenance and integrity of a web archive is called into question. If that happens, we have a reliable record of this hash when it was recorded.
The hash representing the crawled data in the WACZ file was also cryptographically signed by one of Starling Lab’s Let’s Encrypt certificates as it is the operator of the Browsertrix server. This signature represents an attestation to the web content as it was observed and when, and ties it to a known authority via its domain name. This cryptographic provenance is packaged into the web archive file and is described in the WACZ specification and WACZ Signing and Verification specification.
In the case of manual crawls using the ArchiveWeb.page, the WACZ files were signed by a keypair locally generated by the Chrome extension. The tool securely signs the crawled content with a private key belonging to a verifiable identity, identifiable with the public key associated with the ArchiveWeb.page user.
For each crawled URL, the hashed, signed, and timestamped provenance bundle was packaged alongside the crawled web content into a ZIP file with a .wacz extension in the file name, and verifiable with viewers developed by Webrecorder.
Timestamping and Registering Web Archives
Once an authenticated web archive is produced, it is uploaded to the Starling Integrity pipeline.
The Starling Integrity pipeline streamlines the complex data processing workflow to ingest digital media and to establish records of provenance. A pipeline like this streamlines the complex steps of ingesting, hashing, signing, timestamping, and registering content for newsrooms like BVN. Without the pipeline, all 350 records would have to be manually processed with multiple tools. The following diagram illustrates the registration process.

As part of the Starling Integrity workflow, the hash of each WACZ file was added to OpenTimestamps servers for a proof of existence at the time of registration.
File hashes and metadata, including tools and process information specified in the crawl spreadsheet, were then registered on distributed ledgers to establish a public record of the web archive creation. These provenance information were stored and distributed with a peer-to-peer data sharing system called IPFS, while the WACZ files themselves are stored separately, so users who gain access to the web archives may verify their integrity against their blockchain registrations.
The provenance data were registered on three different blockchains: Numbers, Avalanche, and LikeCoin. These registrations represent a public and immutable index that anyone using the Combating Racism data dashboard can inspect and verify against, if the authenticity of the web content is called into question, or if another newsroom wants a reliable source for information for writing a story about the data in this project.
Store
Preserving Web Archives with Distributed Storage
While the Starling Integrity pipeline has the authenticity information of web archives locked-in onto immutable public ledgers, the next step was to ensure the WACZ files themselves are properly preserved. Redundancy of storage is crucial in preventing content at risk of disappearing from the web, therefore Starling has chosen to preserve its archives on decentralized storage networks in addition to centralized systems.

The WACZ files are identified using content identifiers, or CIDs which represent the exact data of the file, and pinned in the peer-to-peer data sharing system called IPFS using Web3.Storage. This allowed downloading of the files from different servers, but IPFS is not a long-term preservation system as the servers make no promise for hosting the files for a particular amount of time.
Decentralized preservation of the files rely on the Filecoin network. In addition to IPFS pinning, Web3.Storage also takes uploaded data and stores them onto Filecoin nodes operated by several storage providers, who make collateral-backed promises to preserve the data over a specific time period.
Regardless of where one retrieves the web archives, whether that be IPFS, Filecoin, a centralized host or other decentralized networks, the authenticity of the files can be verified against blockchain registration records and/or WACZ-bundled signatures such that each crawl can be attributed to its archiver.
Verify
Screen captures are simply pixelated recreations of what an observer saw on a website, and they contain no audit trail that provides a record of what the observer captured. Embeds from social media posts can also be taken down or changed at any time, as the actual content is still hosted by the website or company that originally published it. This puts embedded versions of web content at risk of disappearing, erasing the sources of what a journalist is trying to reference in their reporting.

Self-contained WACZ files containing provenance information can be preserved and distributed without the constraints and risks of embedding third-party-hosted web content. This case study showcases the embedding of self-hosted (and redundantly preserved) WACZ files alongside additional authenticity information, which enables readers to reliably inspect the rich context captured in these web archives, as they were seen by the observers who digitally signed their crawls.
In the following sections, we will discuss the general design for the displays created to show web archives and related metadata. Next, we will discuss the implementation details for each of the two places these designs were implemented; on a data dashboard, and within news articles on a separate website.
Display Design for Web Archives and Metadata
For this project, web archives needed to be presented on two different sites. The Mapping Black California interactive data dashboards, as well as the Black Voice News website which are hosted on separate WordPress instances. The data dashboard is a microsite built by the Esri team, and the website is a website managed by Newspack.
In order to provide the body of evidence presented in the Authenticated WACZ Display seen on the Black Voice News website and the displays in the Combatting Racism as a Public Health Crisis data dashboard, the Black Voice News team tirelessly researched and collected content from across the web, which was then archived and integrated into a custom web component we called the wacz-lightbox.
The Starling Lab team, alongside developer Giacomo Boscani-Gilroy and Esri’s Joe Allen, prototyped a new kind of visual display for both web archives and their accompanying authenticity information. Two versions of the component were implemented. One version of this display was integrated into the data dashboard’s WordPress source code as a web component. The second version, used on the Newspack-managed news website, incorporated the web component as a WordPress plugin.
This component was not only able to display the explorable WACZ file, but also incorporated metadata files related to each web archive. Each web archive has two associated metadata files, the first one contains file metadata such as the description containing the crawled URL, crawler identity, and the time and date the site was crawled.
The second metadata file contained information about the publicly registered authenticity records of the web archive, such as the hashes of the WACZ file and the blockchain registration information.
Displaying Web Archives in a Data Dashboard
Using the design for the display, Joe Allen from Esri developed a customization in the Combating Racism as a Public Health Crisis data dashboard WordPress instance that enabled us to add the web component created by Giacomo Boscani-Gilroy.
This display enabled us to upload a WACZ file with two JSON metadata files as media, create “Jurisdictions” posts which correlated to areas of the California map, and filter the data and web archives shown in this interactive dashboard. This was accomplished by modifying the PHP files of the WordPress instance to embed the web component used for displaying explorable web archives.

Once web archives are added, the reader can use the data dashboard to filter down to specific jurisdictions to reveal the relevant archives. Clicking on a web archive card will reveal an explorable interface for archived content and the WACZ file’s metadata, giving the reader all the necessary information to verify the preserved records.

Displaying Web Archives in a Managed WordPress News Site
The data dashboard built by Esri for this project was referenced in a five-part series of articles titled Combatting Racism as a Public Health Crisis, published in November 2023. The team implemented a display for the web archives on this Black Voice News WordPress website, which is managed by Newspack. Newspack’s WordPress management platform prevented direct editing of the source code, as was done for the data dashboard. To implement this, Starling Lab developed a WordPress plugin called starling-replay-web-page, which Newspack could add to their managed sites.
The plugin enabled the use of a shortcode containing a media ID to embed an explorable web archive onto a news article.
Once rendered on the Black Voice News website, the web archive and its associated metadata are rendered for readers to explore and verify.


Verifying Metadata of a Web Archive
A reader can navigate to the “Archive” tab to view verification information related to the web archive itself.
Here are what the fields represent:
-
- Archive Name – The human-readable name given to the web archive based on the jurisdiction and type of content
- Original URL – The original link these webpages were archived from
- Archived On – The date that the website archive was crawled
- Observed By – The signing identity of the observer, which may be a Starling Lab SSL certificate or a public key associated with the ArchiveWeb.page user
- Package Hash – The hash of the web archive
An important part of any investigative process is the ability to cross reference information. In order to create a useful data dashboard and report on racism as a public health crisis, auditable records of evidence are required. By adding information such as a human readable name, the original URL and information about who observed it and when, one can more easily cross reference similar information and to track down supporting evidence.

Verifying Authenticity Information of a Web Archive
A reader can navigate to the “Registration” tab to view blockchain registration and preservation information about the web archive.
The blockchain registrations contain information specifically about the provenance of the web archives. They allow readers to navigate to records on several public blockchains to verify when the web archives were established and whether a WACZ file they have (which may be downloaded from other sources) is the authentic version crawled by Starling Lab and BVN.
Here are what the fields represent:
-
- Blockchain Registration – Hashes of the web archives & metadata about the archive are registered on different blockchains to establish an immutable record of what was captured and when
- ISCN on LikeCoin – Registrations on LikeCoin can be explored on ISCN, searching by the Transaction ID
- Numbers Protocol on Numbers – Registrations on Numbers can be explored with Numbers Explorer, searching by the Transaction ID
- Numbers Protocol on Avalanche – Registrations on Avalanche can be explored on Snowtrace, searching by the Transaction ID
- Storage and Archiving – Copies of these web archives were stored in a resilient, peer-to-peer system (IPFS), and archived in a long term crypto-incentivized distributed storage system (Filecoin)
- IPFS CID – The hash-based content identifier of the web archive, if even a tiny detail (from a pixel to a character in a document) in the WACZ file changes, this identifier will change
- Filecoin Piece CID – A unique identifier that can be used to locate the web archive stored on Filecoin
- Download Archive – This enables readers to download the WACZ file, which they can use to produce a hash and verify against the blockchain records
If a reader chooses to download the WACZ file, they can use ReplayWeb.page to explore it without using the BVN website. Should the BVN website become unavailable in the future, the reader can still explore the web archive should they need it for their own investigative or reporting work. They can also use the blockchain registrations to establish provenance of the archives.
Contents
Scope
Framework
Technology
LearningsArchive
Technology
Capture
Archiving Content on the Web
The approach to web capture taken by Black Voice News and Starling Lab was unique. Much of the data was collected from local government websites which may self-host video and databases which are hard to search and access without special investigative techniques. This is web content that is also subject to take down and link rot as budgets are changed or staff turns over, and systems aren’t maintained. To address the lack of access and likely disappearance of this web content, we created authenticated web archives.
Web archives created with the Webrecorder suite of tools enabled the team to capture the full context of everything that existed on the web page in a zip archive called a WACZ file. The information collected includes all content on a webpage, such as articles, comments, likes, and multimedia files such as audio and video. A WACZ file is a copy of the code and media that makes up that webpage as it appeared, including an index of what content was captured. When users later display (or “replay”) the page using certain tools, it remains fully interactive like it was at the time of capture.
First, most of the web archives were created with an automated tool called Browsertrix (on a custom instance operated by Starling Lab). Some were created manually with a Chrome extension called Archiveweb.page when websites were too complex to crawl with a bot. These tools visit a site on the world wide web and scrape all the content loaded during the browsing session, including site code and assets.
The list of all the URLs to crawl, along with the other metadata fields were researched and populated by the Black Voice News investigative team. In addition to specifying names for each web archive, we included several additional metadata fields. Each URL was assigned a jurisdiction and geographic identifier so it could later be correlated with the correct region in the interactive map created by Esri.

A custom Python script was used to take the spreadsheet that listed all the URLs and metadata fields to automate the Browsertrix web crawls via its API. When these tools capture an archive, it preserves the data in such a way that, when the archives are replayed, each part of the website can be replayed as it was witnessed by the Browsertrix crawler, along with information that the end user can inspect about the authenticity of this web archive.
Signing and Verification of Web Archives
When a WACZ record of a website was created, an immutable digital ‘fingerprint’ called a hash was also generated. If any single byte of the data is changed, be it a pixel of an image or the timestamp of when it was collected, the hash used to verify copies of that page will change as well. It is important for a newsroom like BVN to establish this fingerprint in case the provenance and integrity of a web archive is called into question. If that happens, we have a reliable record of this hash when it was recorded.
The hash representing the crawled data in the WACZ file was also cryptographically signed by one of Starling Lab’s Let’s Encrypt certificates as it is the operator of the Browsertrix server. This signature represents an attestation to the web content as it was observed and when, and ties it to a known authority via its domain name. This cryptographic provenance is packaged into the web archive file and is described in the WACZ specification and WACZ Signing and Verification specification.
In the case of manual crawls using the ArchiveWeb.page, the WACZ files were signed by a keypair locally generated by the Chrome extension. The tool securely signs the crawled content with a private key belonging to a verifiable identity, identifiable with the public key associated with the ArchiveWeb.page user.
For each crawled URL, the hashed, signed, and timestamped provenance bundle was packaged alongside the crawled web content into a ZIP file with a .wacz extension in the file name, and verifiable with viewers developed by Webrecorder.
Timestamping and Registering Web Archives
Once an authenticated web archive is produced, it is uploaded to the Starling Integrity pipeline.
The Starling Integrity pipeline streamlines the complex data processing workflow to ingest digital media and to establish records of provenance. A pipeline like this streamlines the complex steps of ingesting, hashing, signing, timestamping, and registering content for newsrooms like BVN. Without the pipeline, all 350 records would have to be manually processed with multiple tools. The following diagram illustrates the registration process.

As part of the Starling Integrity workflow, the hash of each WACZ file was added to OpenTimestamps servers for a proof of existence at the time of registration.
File hashes and metadata, including tools and process information specified in the crawl spreadsheet, were then registered on distributed ledgers to establish a public record of the web archive creation. These provenance information were stored and distributed with a peer-to-peer data sharing system called IPFS, while the WACZ files themselves are stored separately, so users who gain access to the web archives may verify their integrity against their blockchain registrations.
The provenance data were registered on three different blockchains: Numbers, Avalanche, and LikeCoin. These registrations represent a public and immutable index that anyone using the Combating Racism data dashboard can inspect and verify against, if the authenticity of the web content is called into question, or if another newsroom wants a reliable source for information for writing a story about the data in this project.
Store
Preserving Web Archives with Distributed Storage
While the Starling Integrity pipeline has the authenticity information of web archives locked-in onto immutable public ledgers, the next step was to ensure the WACZ files themselves are properly preserved. Redundancy of storage is crucial in preventing content at risk of disappearing from the web, therefore Starling has chosen to preserve its archives on decentralized storage networks in addition to centralized systems.

The WACZ files are identified using content identifiers, or CIDs which represent the exact data of the file, and pinned in the peer-to-peer data sharing system called IPFS using Web3.Storage. This allowed downloading of the files from different servers, but IPFS is not a long-term preservation system as the servers make no promise for hosting the files for a particular amount of time.
Decentralized preservation of the files rely on the Filecoin network. In addition to IPFS pinning, Web3.Storage also takes uploaded data and stores them onto Filecoin nodes operated by several storage providers, who make collateral-backed promises to preserve the data over a specific time period.
Regardless of where one retrieves the web archives, whether that be IPFS, Filecoin, a centralized host or other decentralized networks, the authenticity of the files can be verified against blockchain registration records and/or WACZ-bundled signatures such that each crawl can be attributed to its archiver.
Verify
Screen captures are simply pixelated recreations of what an observer saw on a website, and they contain no audit trail that provides a record of what the observer captured. Embeds from social media posts can also be taken down or changed at any time, as the actual content is still hosted by the website or company that originally published it. This puts embedded versions of web content at risk of disappearing, erasing the sources of what a journalist is trying to reference in their reporting.

Self-contained WACZ files containing provenance information can be preserved and distributed without the constraints and risks of embedding third-party-hosted web content. This case study showcases the embedding of self-hosted (and redundantly preserved) WACZ files alongside additional authenticity information, which enables readers to reliably inspect the rich context captured in these web archives, as they were seen by the observers who digitally signed their crawls.
In the following sections, we will discuss the general design for the displays created to show web archives and related metadata. Next, we will discuss the implementation details for each of the two places these designs were implemented; on a data dashboard, and within news articles on a separate website.
Display Design for Web Archives and Metadata
For this project, web archives needed to be presented on two different sites. The Mapping Black California interactive data dashboards, as well as the Black Voice News website which are hosted on separate WordPress instances. The data dashboard is a microsite built by the Esri team, and the website is a website managed by Newspack.
In order to provide the body of evidence presented in the Authenticated WACZ Display seen on the Black Voice News website and the displays in the Combatting Racism as a Public Health Crisis data dashboard, the Black Voice News team tirelessly researched and collected content from across the web, which was then archived and integrated into a custom web component we called the wacz-lightbox.
The Starling Lab team, alongside developer Giacomo Boscani-Gilroy and Esri’s Joe Allen, prototyped a new kind of visual display for both web archives and their accompanying authenticity information. Two versions of the component were implemented. One version of this display was integrated into the data dashboard’s WordPress source code as a web component. The second version, used on the Newspack-managed news website, incorporated the web component as a WordPress plugin.
This component was not only able to display the explorable WACZ file, but also incorporated metadata files related to each web archive. Each web archive has two associated metadata files, the first one contains file metadata such as the description containing the crawled URL, crawler identity, and the time and date the site was crawled.
The second metadata file contained information about the publicly registered authenticity records of the web archive, such as the hashes of the WACZ file and the blockchain registration information.
Displaying Web Archives in a Data Dashboard
Using the design for the display, Joe Allen from Esri developed a customization in the Combating Racism as a Public Health Crisis data dashboard WordPress instance that enabled us to add the web component created by Giacomo Boscani-Gilroy.
This display enabled us to upload a WACZ file with two JSON metadata files as media, create “Jurisdictions” posts which correlated to areas of the California map, and filter the data and web archives shown in this interactive dashboard. This was accomplished by modifying the PHP files of the WordPress instance to embed the web component used for displaying explorable web archives.

Once web archives are added, the reader can use the data dashboard to filter down to specific jurisdictions to reveal the relevant archives. Clicking on a web archive card will reveal an explorable interface for archived content and the WACZ file’s metadata, giving the reader all the necessary information to verify the preserved records.

Displaying Web Archives in a Managed WordPress News Site
The data dashboard built by Esri for this project was referenced in a five-part series of articles titled Combatting Racism as a Public Health Crisis, published in November 2023. The team implemented a display for the web archives on this Black Voice News WordPress website, which is managed by Newspack. Newspack’s WordPress management platform prevented direct editing of the source code, as was done for the data dashboard. To implement this, Starling Lab developed a WordPress plugin called starling-replay-web-page, which Newspack could add to their managed sites.
The plugin enabled the use of a shortcode containing a media ID to embed an explorable web archive onto a news article.
Once rendered on the Black Voice News website, the web archive and its associated metadata are rendered for readers to explore and verify.


Verifying Metadata of a Web Archive
A reader can navigate to the “Archive” tab to view verification information related to the web archive itself.
Here are what the fields represent:
-
- Archive Name – The human-readable name given to the web archive based on the jurisdiction and type of content
- Original URL – The original link these webpages were archived from
- Archived On – The date that the website archive was crawled
- Observed By – The signing identity of the observer, which may be a Starling Lab SSL certificate or a public key associated with the ArchiveWeb.page user
- Package Hash – The hash of the web archive
An important part of any investigative process is the ability to cross reference information. In order to create a useful data dashboard and report on racism as a public health crisis, auditable records of evidence are required. By adding information such as a human readable name, the original URL and information about who observed it and when, one can more easily cross reference similar information and to track down supporting evidence.

Verifying Authenticity Information of a Web Archive
A reader can navigate to the “Registration” tab to view blockchain registration and preservation information about the web archive.
The blockchain registrations contain information specifically about the provenance of the web archives. They allow readers to navigate to records on several public blockchains to verify when the web archives were established and whether a WACZ file they have (which may be downloaded from other sources) is the authentic version crawled by Starling Lab and BVN.
Here are what the fields represent:
-
- Blockchain Registration – Hashes of the web archives & metadata about the archive are registered on different blockchains to establish an immutable record of what was captured and when
- ISCN on LikeCoin – Registrations on LikeCoin can be explored on ISCN, searching by the Transaction ID
- Numbers Protocol on Numbers – Registrations on Numbers can be explored with Numbers Explorer, searching by the Transaction ID
- Numbers Protocol on Avalanche – Registrations on Avalanche can be explored on Snowtrace, searching by the Transaction ID
- Storage and Archiving – Copies of these web archives were stored in a resilient, peer-to-peer system (IPFS), and archived in a long term crypto-incentivized distributed storage system (Filecoin)
- IPFS CID – The hash-based content identifier of the web archive, if even a tiny detail (from a pixel to a character in a document) in the WACZ file changes, this identifier will change
- Filecoin Piece CID – A unique identifier that can be used to locate the web archive stored on Filecoin
- Download Archive – This enables readers to download the WACZ file, which they can use to produce a hash and verify against the blockchain records
If a reader chooses to download the WACZ file, they can use ReplayWeb.page to explore it without using the BVN website. Should the BVN website become unavailable in the future, the reader can still explore the web archive should they need it for their own investigative or reporting work. They can also use the blockchain registrations to establish provenance of the archives.
Contents
Scope
Framework
Technology
LearningsArchive
Learnings
Data Preparation and Publishing Workflow
To capture and archive web content, we used the Starling Integrity Backend, along with a custom instance of the Browsertrix crawler. Using a Python script, we ingested a spreadsheet of URLs and relevant metadata as input, and the crawler produces WACZ web archives, which are then processed by the pipeline. This process also generated two JSON files containing metadata that are shown in the Archive and Registration tabs of the web component. These files need to be manually associated with each WACZ file when publishing to WordPress.
The Starling Integrity Backend was designed to archive files sent to it, but it does not have sophisticated features for aiding publishing workflows. For example, retrieval of the WACZ file that results from crawling a particular URL, or querying for every WACZ file with a particular attribute, would be really helpful to analysts and publishers. However, these are not simple tasks to perform on archival data, and finding the right files often required assistance from the engineering team.
In this case study, the team learned some of the feature requirements around designing a backend and processing workflow that can handle the structuring, processing, and querying of data. For future projects, a new database is being developed to hold content and archival metadata. A user will then be able to query metadata, both auto-generated in the archival process (e.g. blockchain registration Tx) and human-inputted fields of metadata (e.g. collection name), all while maintaining metadata integrity. Unlike traditional databases, these data are cryptographically signed and timestamped, and additions and changes are tracked. This database, alongside a CLI tool to support the processing of data, and the updating of metadata associated with a content identifier, will enable the Lab to create more streamlined and cohesive media collections and archives. The data can be exported and streamlined for integrating with standard publishing workflows.
Displaying Web Archives for Verification
On the BVN news site, since this display is designed to show archives taken from a desktop web browser, it is not very mobile friendly. The smaller the screen gets, the more difficult it is to view the embed display. Not only does it display a full size webpage that doesn’t scale down to smaller browsers, the info tooltips that describe the metadata and provenance information are also difficult to see.
Creating a display that allows experts and readers to inspect and verify content, without overwhelming them with too much information or a confusing interface, continues to be a challenge both for the Lab, and other organizations that are creating authenticated, verifiable content.
Embedded ads and subscription popups on news sites add to the number of elements on a page, and having all these elements makes it difficult to see and assess a web archive. Displaying all the information made space on the page seem crowded, overwhelming readers. An improved design that both distinguished the embed on the page, as well as had more mobile-responsive features would be a welcome addition to future versions.
The version of the web archive embedded on the Combating Racism data dashboard, however, didn’t have to compete for space with news article text and have a much cleaner display.
Wordpress Plugin Development
Initially, the team developed a web component and planned on integrating it directly into the PHP source code for the WordPress websites. However, these sites are managed by Newspack, which requested that we instead bundle the web component as a WordPress plugin. As Starling Lab had previously created WordPress plugins, it was relatively easy to assist Giacomo in implementing the custom web component for displaying WACZ files as a new plugin.
Working with service provider-managed platforms, however, is something that is common to many newsrooms. When creating new features, it is important to understand from the beginning what the restrictions and requirements may be from third parties who are involved in the development, hosting, and technical maintenance of news sites.
Contents
Scope
Framework
Technology
LearningsArchive
Archive
Original, archived interactive map display: https://mappingblackca.com/project/rcphc/
New updated dashboard: https://combatingracism.com/
Links to News Articles
- Landing Page: Combating Racism as a Public Health Crisis
- Combating Racism as a Public Health Crisis, Part 1: Holding Leaders Accountable
- Combating Racism as a Public Health Crisis, Part 2: Santa Cruz County’s Inclusive Resolution
- Combating Racism as a Public Health Crisis, Part 3: Oakland Addresses Systemic Racism with Data-driven Approach
- Combating Racism as a Public Health Crisis, Part 4: Riverside and San Bernardino Counties Take Action Against Racism as a Public Health Crisis
Creating the First Cryptographic Archive for a War Crimes Investigation
The Advanced Tech Breaking Open A War Crimes Investigation
A first-of-its-kind cryptographic archive published by Rolling Stone helped reopen a 30-year-old cold case. Released at the dawn of consumer-available generative AI, it illuminates new paths to overcome a range of modern challenges including denialism, deepfakes, and link rot.
Adam RoseReading Time: 10min
Prototypes
Share
Contents
Context
Framework
TechnologyLearningsArchive
Fellowship Projects and Awards
See More
Context
How can you trust that a news photograph is real?
We looked into the future of investigative journalism — by looking into the past.
In 1992 as civilians were killed in the streets, Ron Haviv captured the defining image of the Balkan War. But for some, seeing was not believing.
It had been three decades without justice or accountability for the soldier in this frame. He appears callous, flicking a cigarette as his boot – suspended in time – hangs over innocent civilians who were later found dead.
(This case study crops out the more graphic portions of the images.)
The photo has long been subject to denialism. It went “viral” – or what would be the equivalent in 1992 – after it first ran in Time Magazine. The image was soon used in a TV news report to confront the general who oversaw the soldier’s paramilitary unit. He denied that it shows what it clearly did.
More recently, when Russia first invaded Crimea in 2014, the image reemerged online paired with false claims that it depicted a Ukrainian soldier committing war crimes against civilians. This deception through false captions represents a more common twist on the deepfake – the “cheapfake.”
“People have said it's fake. People have said I set it up. People have said these people aren't dying. Captions have been changed. So there must be a way to authenticate what photographers are seeing.” – Ron Haviv
In recent years, society has been introduced to groundbreaking new technologies. Some, like generative AI, made it even harder to trust our own eyes.
But other technologies can help us restore that trust.
In The DJ and the War Crimes, Rolling Stone published a first-of-its-kind cryptographic archive. It authenticated Haviv’s photos, along with hundreds of other records from the conflict. Classic investigative and documentary journalism was paired with an immersive microsite.
We invited audiences to become the investigator.
Contents
Scope
Framework
TechnologyLearningsArchive
Scope
Starling Lab’s collaboration with Rolling Stone exemplifies our work. Along with the world-class investigation, we teamed up to publish a unique front-end journalism archive that lets readers explore evidence in a way never available to audiences before. Meanwhile, the equally innovative back-end ensured that this evidence would be resilient against efforts to undermine its credibility or availability.
This particular story began with the iconic photo taken by Haviv in Bejilina, Bosnia. Ultimately the journalistic piece is a classic who dunnit. The reporting team, led by Sophia Jones, set out to answer the lingering question: “Who is the soldier in this iconic photo depicting an alleged war crime?”
The result included all the classic elements of investigative journalism. Over 11 months, reporters collected a staggering amount of evidence from firsthand witnesses, archives, and on-the-ground reporting.
The scale of the investigation was matched by the team involved. Jones led fellow reporters Nidžara Ahmetašević and Milivoje Pantović, as well as other local freelancers and photographers who opted to remain anonymous for their safety. At least two dozen individuals played a role, including editorial staff at Rolling Stone, contributors and engineers from Starling Lab, and designers from Gladeye.
Haviv always hoped that his photojournalism would lead to accountability. Not only did the investigation focus on his famous image, it included several others from his rolls of film that day which had never been published before.
The group wanted to ensure that ephemeral media – whether aging film slides or social media posts – could be preserved and made available to others along with all the evidence in the investigation. Our accounting identified nearly 2,000 documents (PDFs), 40 key images, and 183 web archives. The latter are more than just a screenshot, but a more robust way of capturing information that's on the internet – and very much at risk of deletion or link rot.
Contents
Framework
TechnologyLearningsArchive
Framework
We think about three important stages of a piece of digital media's life cycle – Capture, Store, Verify – and address each through the Starling Framework.
- The Capture phase is based around a cryptographic root of trust. How do we embed verifiable metadata into documents?
- Next, how do we Store this material? Digitization is not the same as preservation.
- Finally, audiences must be able to Verify what they see.
We asked: How might we use advanced cryptographic systems to verify and secure assets and their metadata (like the date and time)?In the end, we applied over a dozen technologies to capture, store and verify these digital assets: C2PA content credentials, a custom capture app paired with a Canon 1Dx Mark III, the ProofMode authenticated capture app, WebRecorder website archiving, PGP encryption, Authsign, "ZK" proofs, and preservation on blockchains (OTS/Bitcoin, Avalanche, ISCN/Likecoin, IPFS, Filecoin, Storj).
Contents
Scope
Framework
Technology
LearningsArchive
Technology
Capture – Film Slides
The physical slide seen below was developed from the actual film in the camera used by Haviv in Bijeljina. That’s his hand holding the paper frame. In order to bring these images into the 21st Century, we had to digitize several of them. Haviv is depicted inserting one into physical bellows, which hold it in place for a modern digital camera to capture in high fidelity.
The camera – a Canon 1Dx Mark III – was equipped with special firmware provided by the manufacturer so that we could tether it to a mobile phone.
The phone – an HTC Exodus 1 – came from the manufacturer with a cryptographically secure chip. While popularized as a “wallet” by cryptocurrency fans, it has far more utility for our use case by authenticating the data from digitized photos.
A custom app on the phone, developed by Numbers, allowed us to take the digital equivalent of a fingerprint, add our own digital signature, and then record all of it in a virtual fingerprint registry.
In technical terms, we used a secure enclave, hashed and signed each file, and registered them on blockchain. This allowed us to establish a cryptographic “root of trust.”
Along the way we wanted to add additional authenticity markers. Fortunately, we had access to the one person who could make an attestation about which version was his own original film slide – not the ones that had been manipulated or lied about on the internet in recent decades.
Haviv’s personal testimony on video is more relatable to audiences than the 1s and 0s under the hood. Even our most modern tech-forward approach can’t exclude humans from the process. To the contrary, we sought ways to include them, their corroborations, and acknowledgements of their role.
https://www.youtube.com/watch?v=4UWieqM_s_Y
Capture – Payroll PDFs on UN Servers
Another crucial piece of evidence was a collection of payroll records from the paramilitary unit at the center of the alleged atrocity. These first surfaced in a war crimes tribunal, and copies were eventually stored on United Nations servers. Unfortunately, custodians told our reporters that these were “confidential” and declined to release them. Fortunately, those same reporters kept searching and found them sitting in the clear (i.e., unencrypted) on the UN’s public servers. This led to a pair of challenges that technology was able to overcome.
First, the team had to assume that after publication the record keepers might respond by taking the documents offline. Beyond mere linkrot, the history of denialism around this unit raised legitimate concerns that someone might dismiss our version of the records as “fake.” With AI making it even easier to generate assets, our team was concerned about the general public being conditioned to doubt anything is real – a phenomena psychologists label the “liar’s dividend.”
To ensure proper preservation we used Webrecorder, a free and open source tool that works as an extension on any Chrome-based browser. No mere screenshot or “print” feature, it carefully captures all data and metadata during a browsing session. This means you can relive the browsing experience as it was when journalists visited the site – even offline years later. As part of the replay experience, links on a saved webpage can be clicked, and the linked pages can be read (assuming the initial investigator made sure to capture each page). Webrecorder backs this with cryptographically secure proof that the files were from specific servers, which we could use to demonstrate the materials indeed came from the UN’s tribunal archives.
Verify - Redacting Names From Payroll Records
Second, as a matter of good journalistic practice, it was important to redact names from the payroll records if someone wasn’t a subject of the investigation. If a person was a cook or truck driver and had no involvement with the atrocities, it wouldn’t be appropriate to put them through the same level of public scrutiny.
In order to redact the records, we turned to Trisha Datta, working on her PhD at Stanford and advised by Professor Dan Boneh, one of Starling’s principle investigators. She implemented a zero knowledge proof, which uses cryptography to guarantee that the only changes made to published PDFs were the additions of block boxes over specific pixels. This means that someone with a sophisticated understanding (like an expert witness in court) can literally check our math. This confirms nothing else was manipulated from the moment data left UN servers to the time the redacted records are viewed on your screen.
Capture - Social Media (OSINT)
Webrecorder was also a powerful tool to capture social media, like this Instagram post. The one person we didn’t blur out happens to be one of the individuals named on the payroll records. Jones did a masterful job tracking down and identifying the connections that different people from the unit still had in modern day.
Of course, social media can be ephemeral for many reasons. We’ve seen how mercurial CEOs can cause content to disappear. Users may set their accounts to private or delete individual posts. When the story went live, it was a real risk that accounts might go dark and make it challenging for prosecutors to reconstruct who was connected to whom.
This subject in the photo with wolves is one of several individuals who appeared to be leading a life of impunity. He was spotted in photos on various sites associating with the Nightwolves, a notorious Russian motorcycle gang which supported Russia’s initial invasion of Crimea.
Verify - User Experience For The Archive
The story’s archive lets audiences explore the evidence for themselves. All the documents are spread out so that anyone can build their own associations between them – social media posts, news clippings, payroll records, photographs scanned from 30 years ago, and more recent images captured in the current day. Any of these assets might be vulnerable, but because they are all stored and secured using decentralized technologies (more on that later) it empowers us to ensure they survive and are not prone to future denialism.
Each one of those assets – whether seen in the narrative’s main story, featurettes, or full archive – has an “eye” icon in the lower-left corner that when clicked opens up an authentication certificate. It’s a similar idea to the Content Credentials being popularized by major industry leaders through the Coalition for Content Provenance and Authenticity (C2PA). This coalition includes Adobe, Google, Meta, Canon, Sony, BBC, and many more hardware manufacturers, software developers, and media companies.
C2PA is a technical standard that produces a “manifest” of authenticity metadata for a digital asset. We were also able to incorporate it into each item for this project.
In the authentication certificate you can see how we establish different chains of trust, inspect the asset, and examine additional details. Readers again see the Starling Framework – Capture, Store, Verify – as applied to an individual asset.

In the capture phase, we indicate where the original version was registered on a variety of blockchains – OpenTimestamps, Numbers Avalanche and ISCN. Each is linked so that you can confirm the underlying transaction with an on-chain explorer. (For the non-technical audience, this is simply where we put the fingerprints of each file into a registry – and importantly, the registry is cryptographically secure so that no one can edit it later.)
In the storage phase, copies were preserved using decentralized storage systems like IPFS, Filecoin, and Storj. These are designed to offer a resilient, tamper-evident, and censor-resistant alternative to common corporate cloud-based systems like Google Drive, Microsoft Azure, Amazon Web Services, or Dropbox.
Under the thumbnail of each asset are a pair of buttons for the “full manifest” (which includes code for tech-savvy verifiers) and “inspect.” The latter allows audiences to examine changes to the evidence through a verification tool on the Content Credentials site. It provides a visual and user-friendly way of exploring the C2PA metadata – and lets anyone audit the edit history of an asset.
Any photo published in the news is likely to be edited somehow. We’re all used to cameras on our phones processing an image, often thought of as a filter. And in a journalistic project, it’s normal to do cropping and color correcting. We can think of these as permissible edits – but in journalism they should be transparent.
Adobe Photoshop was used to make all the photo adjustments in this story. Anyone who uses that software can go into preferences to turn on Content Credentials before starting an edit, and that generates metadata about the changes certified by Adobe itself.
The Verify tool allows you to explore those verified edits, either side-by-side or overlaid with a slider. Looking closer at the first edit to this image using the side-by-side view, there’s a noticeable black border around the top version – an artifact from the digitization process. The paper frame of film slides, which you’ll recall Haviv held to insert them in the bellows, doesn’t allow light to penetrate. The existence of the border is a clear indicator to the newsroom that the complete slide was digitized – but it’s also something a page designer wants to chop to the virtual cutting room floor.
The second edit to the photo, seen with a slider, confirms that color correction was relatively minor. Notice the sky’s lighter exposure to the left of the slider compared to the right.
Verified with your own eyes, you can be assured the changes were helpful – not a deceptive wholesale change like adding a bayonet.
Store - Where Does It All Go?
As seen earlier in the authentication certificates, all these assets were stored using decentralized networks like IPFS, Filecoin, and Storj. These offer promising alternatives to entrusting assets with one of the few megacorporations that dominate the commercial storage market.
Traditional digital storage uses “location addressing,” meaning a file can be located only by knowing the specific path through a directory and subdirectories (think of folders nested inside of other folders on your computer). It’s the same concept as a website address structured as www.website.com/directory/subdirectory/subsubdirectory/file.pdf. If “subdirectory” is renamed “subdirectory2,” everything breaks. In contrast, these innovative decentralized systems use “content addressing.” This approach starts by taking a digital fingerprint from the file, resulting in a code that can be about 64 characters long. When you later look for the file by searching for its unique code, the system checks for a matching copy anywhere on the internet. Ultimately you don’t care where it finds that copy, only that it’s a 100% perfect match as proven by cryptography.
The implications are profound. If a file is destroyed at a data center due to a natural disaster or an authoritarian government raid, you can confirm a full and authentic recovery when anyone else – even an unknown stranger – hands you a copy matching the original file’s fingerprint.
Instead of requiring you to place your trust in a centralized (and potentially corruptible) authority, this is considered a “trustless” system.
This fundamental design shift is a potential long-term solution to linkrot and related problems that have plagued newsrooms, archives, and even Supreme Court opinions.
Capture - Live From Novi Sad
The authentication certificate for this final photo shows a colorful scene. It also demonstrates additional ways to authenticate journalistic work in the field.
The person at the center of this modern day image is the same person in the middle of Haviv’s iconic image of a war crime in progress. In between, he went on to a notable career as a DJ, playing in European festivals over the years with crowd sizes comparable to Coachella.
This particular image came from an event where he was spinning in Novi Sad, Serbia. It depicts his life of impunity, as there had been no accountability for the war crimes decades before. This was one of several images and videos captured by a freelancer we hired to cover the event. However, we didn’t have a go-to person in Serbia. With the show scheduled days after we caught word of it, how could we quickly develop trust in a total stranger? This became more sensitive when the most promising local freelancer asked to remain anonymous, not wanting his name to be on the radar of his neighborhood war criminal.
Technology once again provided a solution. We decided to use Proofmode, developed by Guardian Project, which is a free and open source app that can be installed on any iOS or Android phone. It uses software signatures to authenticate each image as it’s taken, including a full C2PA manifest. This solution worked especially well for the freelancer, as it cost nothing to set up and allowed them to operate discretely with a phone – looking like any other partygoer. But when he returned to his car in the parking lot, he was able to upload images with manifests allowing us to confirm we were seeing the same moments as they were witnessed by the lens of his phone.
Contents
Technology
LearningsArchive
Learnings
Data Can Disappear
Not long before his set in Serbia, the subject of this story had started a new Instagram account and was leaving more of a digital trail than ever. Hours after the story published, he set it to private.
Data Can Disappear
Fortunately, we have most of that account saved thanks to Webrecorder and thanks to our content addressed decentralized archives. All of that evidence is available for investigators even though it's no longer available to the public on Instagram.
Those subjects and their relationships – on social media and in real life – are the heart of any investigation. Here we see the primary subject on the top left still enjoying his cigarettes. On the far right is a younger version of the man seen posing with wolves. Another one of the people in this network carried the casket of the unit’s general at his funeral. The are nodes in a social network – and poignantly now captured on nodes in our decentralized archive.
Where There Is Tech, There Is Hope
Days after this story was released, local prosecutors reopened this 30-year-old cold case. Hopefully we will eventually see justice for the victims.
Over the next few months, the story earned a number of journalism industry accolades. Awards juries recognized the fine investigative work, and they also recognized the need to authenticate and archive. Starling’s hope is that these approaches illuminate new paths for all newsrooms to pursue and inspire new implementations that will benefit society and restore trust in journalism.
Contents
Scope
Framework
Technology
LearningsArchive
Archive
Links to News Articles
- Rolling Stone story
- 9-min mini-doc
- Direct link to the archive
- Featurettes: The Photograph• The Document• The Network
- Gladeye case study
Credits
A large team within Rolling Stone was led by Executive Editor Sean Woods, Creative Director Joseph Hutchinson, and Digital Director Lisa Tozzi.
A world-renowned design firm, Gladeye, handled the custom site development, overseen by Tarver Graham and implemented by Nathan Walker.
Starling’s work was spearheaded by Jonathan Dotan, Adam Rose, Benedict Lau, Yurko Jaremko, and Josh Lee.
Narrative Watch

Narrative Watch
Does bodycam footage promote accountability for police use of force? We build an authenticated survey questioning videos and claims, and archive the original material for the long-term.
Starling LabReading Time: 5min
![]()
Prototypes
Share
Contents
Background
See More
Background
Over the past decade, there’s been a significant push for police departments to add more body cameras in the hopes of creating accountability and being a tool for reform. This collaborative project confronts the difficulty of establishing facts and interrogative bodycam footage, as well as to cryptographically authenticate and archive material primary and supportive to protect vulnerable public records.This case study follows publication of news articles detailing our findings, and is accompanied by a survey of criminal justice experts investigating the relative absence of reforms many advocates expected.
Contents
Context
FrameworkTechnologyLearningsArchive

Context
Since 2020, the Starling Lab for Data Integrity awards fellowships to journalists who integrate and collaborate in case studies exploring the integration of technologies used to capture, store, and publish authenticated media. This project, called Narrative Watch, is one such fellowship between the Lab and two reporting groups: The Grio and Big Local News.
Launched in 2016, The Grio is an American television network and website with news, opinion, entertainment and video content geared toward Black Americans. Big Local News, a Stanford University-based team led the research into problems around access to public records and data for journalists working in policing, public health, policing, and more. The group works to develop tools to help journalists access, analyze, publish, and archive data.
Narrative Watch culminated in December 2023 with the publication by The Grio of two articles about police body cam footage, leveraging technical authentication tools developed by Starling. Both articles addressed the difficulty in using and obtaining body cam footage from police departments, despite regulations put in place aimed at facilitating police accountability.
While this project was a collaboration between the three teams, the following individuals were most notably involved:
- Big Local News is led by Chery Phillips (a 2023 Starling Journalism Fellow), along with Senior Data Scientist Eric Sagara acting as technical lead on the project.
- The Grio’s SVP and Chief Content Officer Geraldine Moriba oversaw the project, with additional reporting and editing support provided by Natasha Alford, and Josiah Bates.
- Starling’s work was overseen by Journalism Fellowship Director Ann Grimes and Project Manager Lindsay Walker.
- Related work around this subject was done by the California Reporting Project. Additional research and reporting support was provided by Dana Amihere, Dilcia Mercedes, Lisa Seyton, Irene Casado Sanchez, Lisa Pickoff White, and Ananya Tiwari.
The Narrative Watch project made public records requests from police departments for Use of Force cases where individuals were seriously injured or killed. They gathered reports, photos, audio, and video, focusing on cases involving claims like "I feared for my life" or "I was attacked." The final project involved three data sets that could be cross-referenced with police reports. A survey of criminal justice experts was also conducted to see if consensus could be reached on disputed details from body cam footage.
One key set of records involved the 2020 beating of Tyre Nichols in Memphis, which were archived by the Starling Lab for Data Integrity using advanced authentication technologies. These tools preserve vulnerable public records, ensuring their accuracy and availability, especially in combating misinformation. As civil rights attorney Benjamin Crump explained, many states delay video release, leaving families without access to critical evidence. By using cryptographic methods and decentralized systems, public records can now be safeguarded against manipulation, even as the rise of AI makes such concerns more pressing.
Contents
Framework
Framework
Our guiding principle at Starling is to establish provenance as the backbone of authenticity, and to cryptographically secure the integrity of digital content. To do so, the Lab applied our three-step framework in this implementation: Capture, Store, Verify.
- Capture: Most of the ‘Capturing’ involved researchers at Big Local News send out public records requests to police stations. Though we couldn’t add authenticity at the point of capture (say from a police camera, or when this was entered into evidence), we used a leading web archiving tool, Webrecorder, to capture a small selection of public-facing documents related to the case study, such as the edited versions of the video used as a part of the survey.
- Store: Police stations sent us hours of video, police reports, images, and more, but only edited versions of the police videos were included in the survey. We decided to cryptographically preserve public-facing files, and to store them on IPFS and Filecoin to create a public record that can be recovered and verified in the future.
Verify: As much of the content from this study is sensitive in nature, we didn’t release or archive most versions of this content. The archives of websites and four video files used as a part of the survey can be downloaded and inspected. Most notably, the integrity and provenance of the files can be verified by comparing respectively their cryptographic hash, as well as their digital signature.
The Challenge & Prototype
A central focus of this project was the archiving and preservation of crucial public records. Starling played a vital role in the preservation of all materials received through public records requests, ensuring their long-term accessibility. These materials, obtained from police departments nationwide, included body camera footage, audio recordings, photos, and police reports related to Use of Force incidents where individuals were seriously injured or killed. Once received, all materials were authenticated and preserved to maintain their integrity and ensure they could be used for cross-referencing with police reports.
A significant component of the project’s work was the preservation of materials related to the high-profile Tyre Nichols case. Records sourced from the City of Memphis website and Vimeo account were meticulously archived and authenticated by Starling. Through the use of emerging cryptographic methodologies and decentralized systems, these public records are now safeguarded against potential disappearance and remain accessible for public review. This ensures that the authenticity and source of these historically significant materials can always be verified, providing a critical layer of transparency and accountability.
Following preservation the material, the next challenge was to highlight discrepancies between police written reports and body camera footage. At present, police incident footage plays a limited role in law enforcement investigations or disciplinary actions. To test hypotheses as to why, the team selected cases based on claims by officers about their reasons for the use of force, such as statements like “I feared for my life” or “I was attacked.” These cases were from Memphis, TN, Richmond, CA, and Rochester, NY, and involved incidents that were heavily documented with audio and video recordings.
This project sought to investigate how authenticated text and video could be utilized to verify the factual accuracy of reports and clarify critical statements such as, "Did the subject receive a command?", "Was the subject visibly armed with a firearm?", "Did the officers use force?", or "Did the subject engage in a physical struggle?" To address this, the team conducted a survey involving various experts to determine if a consensus could be reached on these points, where the facts were often unclear. Notably, the experts found it difficult to reach agreement on many of these factual elements.
Read our technical dispatch on this preservation by clicking here.
Contents
Technology
Technology
Capture
Starling Certificates
This project focused on archiving verified content. The team used Starling Signing Certificates to attest to the authenticity of captured data, then registered that content on decentralized networks, in order to create an audit trail (or “chain of custody”) that identifies the history (known as provenance) of a piece of digital content. This serves to reduce information uncertainty and bolster trust in digital records. In the process of creating audit trails, Starling Lab surfaces metadata, which is data about digital media.
Hashing and Signing
When Starling archived these public records, an immutable digital ‘fingerprint’ called a hash was created using a mathematical formula to establish a snapshot of the records captured. If any single byte of the data is changed, be it a pixel of an image or the timestamp of when it was collected, one can use the hash to verify if a copy is different, which indicates the copy may have been manipulated. A hash mismatch is an indicator of altered data. It functions like a tamper-evident seal. A hash protects an original version of the public record, it is preserved as a part of the record we store, as it can be used down the line if the source of this research, the videos and archives of webpages, is called into question.
Because a hash is nearly impossible to fake, we can use the hashed version of this data to sign with a cryptographic signature, to establish exactly what version of this record was created and when. Using an asymmetric cryptographic key makes it possible to sign these records with a known identity, like a witness can sign a document stating that they have seen and can attest to the authenticity of something. Hashes are also used to create the CIDs (Content Identifiers) that are used in IPFS and as general identifiers that we use to name the files with original videos, websites, and metadata related to the media.

Webrecorder Tools
Web archives created with the Webrecorder suite of tools enabled the team to capture the full context of everything that existed on the web page in a zipped archive called a WACZ file. The information collected includes all content on a webpage, such as articles, comments, likes, and other multimedia. A WACZ file is a copy of the code and media that makes up that webpage, including an index of what content was captured. When users later display (or “replay”) the page using certain tools, it remains fully interactive like it was at the time of capture.
Store
Distributed Storage
In addition to blockchain registration, the WACZ files are given content identifiers, or CIDs, and pinned in the peer-to-peer data sharing system called IPFS using web3.storage. This service also packaged and created archives of the WACZ files on the Filecoin network.
Filecoin involves long-term agreement with Filecoin providers that store this data. By experimenting with immutable ledgers to register digital content, Starling enables experts to audit, or verify, the provenance and integrity of that content. Users can inspect these deals and the identity of the nodes that are archiving this data.
Verify
Blockchain Registration
These records, or the hashes of the content, were also registered on various blockchains to establish exactly what content existed, and when it existed. Manifests, or metadata records for each of the files were created that include the hash of the digital media alongside the transaction IDs of the registrations on public blockchains so users can validate what content existed, and when we established the record of their existence.
For example, this registration on the Numbers blockchain of one of the videos posted of Memphis PD body cams contains a pointer to the metadata file on IPFS labeled as “assetTreeCid”. In the metadata file you will find both a CID for the original content, as well as a Content ID for the entire archive. Content also was stored and registered on Avalanche, a fast, decentralized, open-source blockchain that offers smart contract functionality and LikeCoin, a go-to chain for decentralized publishingAfter creating and registering the manifests they were then stored in a distributed, peer-to-peer data sharing system called IPFS (InterPlanetary File System) from which users can download and view copies of some of the archives (only public facing archives, the websites, are available to inspect).

Validate the videos and web pages you have a copy of are the same ones registered on the blockchains by looking at the metadata record called "assetTreeCid". You can view this record by using an IPFS gateway with one of the assetTreeCids from a blockchain registration record, such as https://ipfs-pin.numbersprotocol.io/ipfs/<assetTreeCid>.
Example: https://ipfs-pin.numbersprotocol.io/ipfs/bafkreiauefa46ksrrtvws7g6wszvck7jaf5oorwb3wf7dv7dykyljo3kq4
Contents
LearningsArchive
Learnings
From Cheryl Phillips, Director, Big Local program Stanford University, Department of Communication:
“We learned a lot during this project about the potential of cryptographic tools to assist in preserving immutable records of evidence – especially public records at risk of disappearing or being unavailable through other means. Especially now, in this era of generative AI, these tools are key to combating mis- and disinformation. By using emerging cryptographic methodologies and decentralized systems, these public records can now be protected from disappearance and made available for the public to review, should the source or authenticity of the digital media be called into question. This is very important. A lot went into pulling together the data used in this report: Time-consuming, costly FOIA requests to municipalities that were slow to respond.”
Analyzing and Selecting from a Large Set of Data
“Once we collected police reports and body cam footage, it also took a lot of time to Identify a selection of cases we wanted to consider as candidates for our analysis. We had to Connect incident reports to body cam footage (we wanted to have both the officer’s report and their footage). We had to break down the narratives into individual sentences - essentially statements of fact. We then had to Compare those statements of fact to other reports filed by officers who also responded, keeping an eye on when those reports diverged from each other. We had to review where these gaps occur in the body cam footage.Then we had to isolate the interesting statements we want to fact check, drilling down to a small number of cases. Once we received records, a key technical hurdle included connecting video to incident reports, figuring out a method to filter key points, working with the tech team to understand and streamline the authentication workflow and figure out how to best present the video and statements.
Body Cam Footage Interpretation
“Finally, more challenges arose when viewing police body cam footage - which is hard to interpret - as our survey of experts showed: Once we were able to narrow down footage to three municipalities our panel of 10 experts could not agree on what they saw and heard in that footage.
As reported by The Grio: “They had the most difficulty determining whether or not subjects were armed or even holding anything in their hands. In most cases, they couldn’t agree on whether subjects complied with police commands or if the subjects tried to back away from officers. They often could not agree on what types of force were used by police or even whether the officers tried to de-escalate the situation beforehand. There were some areas where the experts agreed: Whether or not the police issued orders, whether there was a foot pursuit, and whether the subjects approached or attacked police. Interesting findings.
As our data analysis showed, there are a lot of shortcomings of body cam footage as a driver for accountability and reform. The inherent subjectivity of the footage and the importance of perception when trying to derive meaning from body cam videos points to much needed work in the technical area of computer vision to help hold law enforcement agencies accountable.”
Contents
LearningsArchive
Archive & Resources
As mentioned above, records pertinent to the Tyre Nichols case used in the project which were sourced from the City of Memphis website and Vimeo account, were archived and preserved by the Starling Lab for Data Integrity. By using emerging cryptographic methodologies and decentralized systems, these public records, which have historical importance, can now be protected from disappearance and made available for the public to review, should the source or authenticity of the digital media be called into question.
Users can validate these records against the version of the videos they have to see if that version has been altered, faked, or modified.
(NOTE: If you are viewing body cam footage it is highly recommended you take the time to review the Dart Center’s trauma training.)
- IPFS File Copies
- Original Body Cam Videos – Video 1 | Video 2 | Video 3 | Video 4
- Website Archives – Vimeo Webpages | Memphis PD Website
- Filecoin Archive Information:
- Records of Storage – Video 1 | Video 2 | Video 3 | Video 4 | Vimeo Webpages | Memphis PD Website
- Blockchain Registrations:
- Numbers – Video 1 | Video 2 | Video 3 | Video 4 | Vimeo Webpages | Memphis PD Website
- Avalanche – Video 1 | Video 2 | Video 3 | Video 4 | Vimeo Webpages | Memphis PD Website
- ISCN on Likecoin– Video 1 | Video 2 | Video 3 | Video 4 | Vimeo Webpages | Memphis PD Website
How to Download and Inspect Records
- Click on the links in ‘IPFS File Copies’ to download this from the distributed storage network
- Locate where you downloaded the file. It should be in your Downloads folder.
- For videos, change the name by adding a .mp4 the end of the filename for body cam videos, and for the web archives add a .wacz to the end of the filename
- You can open the .mp4 file on your computer using any video player.
You can open the web archive files by visiting the website https://replayweb.page/ and dragging and dropping the .wacz files into the web interface.
Also by this author
Authenticating Election Coverage in Hong Kong
New Standards for Photojournalism Election Coverage
A deep dive into the challenges and technical solutions, including C2PA manifests and cryptographic signatures, that enabled the South China Morning Post to publish authenticated, trustworthy photos.
TeamReading Time: 5min
Prototypes
Companion Secure Enclave Authentication
Share
Contents
Background
Context
Framework
TechnologyLearningsArchive
Fellowship Projects and Awards
See More
Background
Starling Lab, anchored at Stanford University and the University of Southern California, is an applied research lab innovating with open-source tools, best practices, and case studies to securely capture, store, and verify digital content.
Given the rampant misinformation and disinformation landscape – and Hong Kong’s drop in press freedom rankings – SCMP wanted to explore a method of documenting and publishing events in a way that would be secured against attempts to manipulate public perception and opinion.
As Starling founding director Jonathan Dotan put it: “Would it be possible for the SCMP to deploy a team and use a system that would allow for there to be authenticated photographs? Where every time a photo was taken, there would be a way of preserving the time, the date, and the pixels so we had a record of the authentic, original photograph that was taken?”
SCMP Managing Editor Brian Rhoads remarked: “While we cannot predict controversy in these elections, having the blockchain technology available to authenticate/verify would help if and when controversy arises or simply be an exercise giving us clear provenance over the images.” He added: “In the long run, it would be a useful exercise to know we had proper, reliable verification tools available for other coverage as well.”
For this project Starling Lab and an interdisciplinary team of SCMP journalists, editors, and technologists worked together to capture, store, and verify the information and photographs collected during the Hong Kong Legislative Council and Hong Kong Chief Executive elections.
Contents
Context
FrameworkTechnologyLearningsArchive
Context
Within less than six months Hong Kong saw two major political elections: the Legislative Council polls to elect lawmakers on December 19, 2021, and the Chief Executive Election to select the city’s leader on May 8, 2022. The polls were the first to be held after Beijing’s overhaul of Hong Kong’s electoral system, which led to more pro-Beijing representation in the legislature – and just one candidate running for chief executive.
Compared to previous years, election events were more muted and distrust was high, making the need for transparency and the ability to verify the accuracy of SCMP’s journalism more important than ever. Put simply, proving that we took news photos where and when we said we took them would help to combat misinformation.
Collaboration
Chief Technology Officer Benedict Lau and engineer Yurko Jaremko led Starling Lab’s technology team, overseeing the development and implementation of photo capture and authentication solutions. Starling Lab’s editorial team, led by Managing Director of Journalism Ann Grimes and Executive Editor Sophia Jones, worked with SCMP staff to implement and integrate new technologies into the newspaper’s workflow over the course of the two elections.
Working with Starling Lab to add cryptographic reliability and authenticity to the journalistic process was a natural next step, as SCMP had already begun experimenting with the potential of blockchain technology with print and online media. SCMP’s spin off company, Artifact Labs, creates and sells non-fungible tokens of the publication’s front pages.
We embarked on a collaboration with SCMP to apply authentication technology to the capture, storage, and verification of the unfolding of a sequence of events during these two Hong Kong Elections where parties of either side of an issue have a vested interest in altering public opinions.
Chow Chung-yan, SCMP’s Executive Editor, echoed that view. Speaking with the Starling Lab he said, “misinformation and disinformation has become increasingly a real problem for the newsroom.” The matter of photos being photoshopped now has given way to the use of artificial intelligence to “basically create an AI character that can actually do an interview with reporters” and be used by political parties to manipulate public opinion.
“Here in Hong Kong, I will say that we have experienced the deliberate use of misinformation,” he said, adding: “When people talk about misinformation or disinformation in Hong Kong some might think that this is the government, particularly when you have an undemocratic government, and you will think that the government is the most obvious culprit. In many cases, that’s the case but that’s a simple view, because the fact is, anyone who has the resources and is interested in manipulating the narrative is basically going to seize on this weapon and then use it to their advantage.”
Contents
Framework
TechnologyLearningsArchive
Framework
The guiding principle at The Starling Lab is establishing “provenance” as the backbone of authenticity and integrity of digital content. To do so, the Lab follows a three-step framework – Capture, Store, Verify.
The Challenge
Before working with Starling Lab, the existing photo capture framework for SCMP consisted of:
- Capture: SCMP photographers in the field capture images with their Canon cameras
- Store: The photographers then download the pictures using SD card readers to their mobile phones, and caption the images before sending them to SCMP’s shared storage using File Transfer Protocol (FTP)
- Verify: Data is stored on Alibaba cloud, archived, and also published in SCMP articles on the world wide web. Content can be changed without a record of changes and edits.
Capture: When images are captured with SCMP’s usual method, limited information is packaged with the photos, making it difficult for anyone in the future to use the images to establish them as irrefutable records of the truth.
Storage: Without a reliable, secure way to store photo metadata, such as the device used, location, time and date, etc., there is little for courts, archivists, or journalists to point to during the validation process to guarantee that photos are a reliable part of a narrative or historical event.
Verification: For the images and other reporting data that is done for SCMP, there is no way to cryptographically validate the identity of who captured a photo and no method for checking or protecting the files to guarantee against modification, interception, or “man in the middle” attacks between source and storage.
Contents
Framework
TechnologyLearningsArchive
The Prototype
In December of 2022, Journalists using Starling Lab technology were able to take photos and create a reliable and verifiable photographic record.
- Capture: Starling prototyped workflows with mobile apps and camera firmware to authenticate digital photos, and the photos’ metadata and cryptographic signatures, at the time the photos were taken using Canon EOS R3 and R5 cameras in conjunction with the HTC Exodus 1S smartphone.
- Store: Starling used advanced cryptography and decentralized networks that can securely distribute and store content over time, using IPFS and Filecoin to preserve copies, with OpenTimestamps and Likecoin used to preserve immutable records of photos taken.
- Verify: Starling experimented with C2PA and CAI tools to register photos, and created a custom front end element to embed into the SCMP stories, enabling experts to audit, or verify, the provenance and authenticity of the photos.

Contents
Context
Framework
Technology
LearningsArchive
Technology
For this experiment with the Starling Framework, SCMP journalists were deployed to cover the two Hong Kong elections with a set of tools to securely capture, store, and verify the photographic evidence for the stories.
The capture methodology used was different for each of the two elections. For the first election, the field team used cameras that were tethered to phones that helped capture metadata using a WiFi connection to transmit photos to the device. During the second election, SD cards were used to capture and store the data from the pictures along with information from the cameras that could be used later in the verification of those photo’s authenticity.
Because of the differences in technology for these two elections, the process of ingesting and storing this information was done differently for each election.
Capture
The first step in this process is capturing the images in a way that is both trustworthy, and ensures that the authenticity of the photos can be validated later on. The capture phase for this evolved between the first and second election, due to learnings about the challenges encountered with the first election.
The most important aspect in the capture step is ensuring you are capturing a provably accurate image, that is accompanied by metadata that supports the authenticity of the image.
Canon EOS R3 and R5 Cameras were used to capture both the photos, and an array of metadata about the device and conditions in which the picture was taken.
HTC Exodus 1S is a mobile phone with an Android operating system and the Zion vault secure enclave for cryptographic key management. The Zion vault is an isolated subsystem on the phone that stores & protects a cryptographic private key that you can use to sign information, such as captured photos or transactions on a blockchain. This is done without risking exposure of your private key to the operating system which could make it vulnerable to attack.
Each of the elections used different workflows and sets of tools for capturing during the election. After the coverage of the first election, we switched from using WiFi to using SD cards, due to limitations of sending over a WiFi network, and a different workflow was used for each.
Election 1: Hong Kong Legislative Council
Canon Capture API (CCAPI) is a set of rules for communicating (also known as an API) that allows the Canon camera to connect to the HTC Exodus1S and transmit the photos and data over WiFi.
Starling Capture is an application developed by Numbers Protocol installed on the HTC Exodus1S phone that enables the phone to collect both the data for the photographs as well as data about the device that received and transmitted the information such as the GPS location and time, and more when the photo is sent using CCAPI to the phone.
Most importantly, the application is authenticated to Starling API via credentials provided by Starling Lab, and it utilizes the cryptographic features of the Exodus 1S to sign every photo coming through CCAPI.
Zion Vault is a secure enclave, an isolated, highly-secure subsystem, on the HTC Exodus 1S that stores private keys used to sign photos before they are sent to the Starling Integrity Pipeline for storage. This vault was set up before the phones were given to journalists, and the public key used (tied to a private key in the secure enclave) with Starling Capture is recorded ahead of time, so that identities of who sent photos and data can be verified later down the line.
Election 2: Hong Kong Chief Executive Election
ProofMode, an application that can work with media captured or uploaded to a phone to add phone-provided metadata to that photo, was used during the documentation of the second election to gather, bundle, and send photos and data.
Safetynet Attestation API is a protocol and service that ProofMode uses to check information about the Android device such as OS version, and type of device you are using.
Signal is an end-to-end encrypted messaging app that was set up on each phone before they were given to journalists and used to send ProofMode zip files. When apps are sent out over the end-to-end encrypted app, a private key and the phone number set up by Starling Labs signs and sends those files, and the public key can be used to verify these messages.
Signal Chat Bot was created using Signal’s API, and was a part of the Starling Integrity Pipeline during the second election only. It was used to monitor and move the packages of media & information from ProofMode on a journalist’s phone to the Starling Integrity Backend where photos, metadata, and signatures can be verified and stored.
In the second election, hashes of photos and metadata were registered to the Bitcoin blockchain using the OpenTimestamps network. The timely inclusion of these hashes onto the Bitcoin blockchain provides strong proof of the existence of these assets at a particular block height, or location within the blockchain, which can be correlated with a point in time.
In the second election, photos were bundled with metadata as encrypted archives, and their hashes were registered onto the LikeCoin blockchain using the ISCN specification for digital content registration. The registrations can be viewed on Likecoin.
Store
Starling Integrity Pipeline
The pipeline is a process and set of tools used to transport photos and data to the storage server for photographs from both elections. The Integrity pipeline includes several pieces:
Starling Integrity Preprocessor and API are set up and maintained by Starling Lab to process the photos and metadata that has been signed and encrypted. These take the photos and metadata, then validate signatures, and pass the data to both Starling servers and web3.storage.
The Starling Integrity Backend is maintained by Starling Lab where the images are initially stored (before they are also processed into distributed storage), and where the C2PA Manifest is created and signed by Starling Lab, which makes it possible to track incremental edits and changes and tie it back to the original photo with a verified source.
Once original photos are bundled with a C2PA Manifest, these are placed in a shared FTP directory or Dropbox folder from which SCMP editors were able to access, make trackable edits to photos with Photoshop (which uses the C2PA standards to record edits), and upload new versions of those images with updated manifests.
web3.storage
This is a storage tool used by Starling Integrity Pipeline that takes the image from the Starling Integrity Pipeline, packages it appropriately for storage on distributed systems, then adds and ensures the persistence of these files on both the IPFS and Filecoin distributed networks.
A distributed cold storage solution used as a part of web3.storage, Filecoin is a token-incentivized storage in a distributed network of providers who are required to regularly prove the integrity and availability of data. When you store with web3.storage, the platform will make deals with storage providers on the Filecoin network, paying them FIL (the Filecoin cryptocurrency) over the lifetime of a storage deal as the Filecoin blockchain runs computational proofs to ensure that data is being stored.
IPFS is a peer-to-peer distributed storage network that allows anyone who wants to maintain a node to add and provide data to a network that is independent of the client-server model and large corporate storage providers. When content is published on IPFS, a node will create a unique content identifier (CID), which functions as a unique digital fingerprint for and pointer to the content. The IPFS CID for data (such as the bundles of photos and metadata created with ProofMode) points to a tamper- and censor-proof copy of that data. An IPFS node is maintained by web3.storage and CIDs are pinned and maintained on the IPFS network.
As a part of this project, Starling Lab worked with Number Protocol to mint NFTs on the Flow blockchain. With this project, photos and captions about those photos can be minted and sold as a kind of ‘digital news clipping’ with value and verifiable ownership. This is an early experiment to the broader ARTIFACTs initiative project at SCMP.
Verify
CAI Toolkit is a set of tools created by the Content Authenticity Initiative is an organization and community made up of companies, NGOs, and academic organizations. This toolkit makes it possible to create C2PA manifests with signatures of contributors, edits, and other changes to media files, embed information from these manifests in websites and applications, as well as provides the specifications used by Photoshop Content Credential to track changes made to photos.
CAI C2PA Command Line Tool is used to generate and read cryptographically signed manifests compliant to the C2PA open technical standard. At the time of the elections, Starling Lab used a pre-release version of the tool from Adobe both in the Starling Integrity Pipeline, and it is also used as a part of Photoshop Content Credentials to record editing steps.
The Photoshop editing tool by Adobe includes a feature called Content Credentials that, using the C2PA CLI and the Rust SDK reads and adds to the -When you open a C2PA file, it allows you to track the edits made and repackage the edited photos with a manifest, that is then shared back with Starling Labs, that has a record on the manifest you can use other tools to see.
This website can be used to preview and see metadata created with a C2PA manifest. You can visually inspect changes to an image, and see data from the manifest about the who produced or created versions of an image, a signature timestamp, how changes were made, what edits were made with Photoshop, and what assets were used.

In order to preview the images and manifests of the signature of the individual making changes, edits, and other changes, the JavaScript Software Development kit was integrated into the SCMP website and allows the user to see information included on the C2PA Manifest.
With this SDK Starling Lab helped SCMP to develop an `info icon` that not only previews publishing and editing data like Verify, but also additional data that was captured in the manifest, such as data about the location, who produced the photo, which app the image was captured with, the date and time, decentralized storage information, and data about the NFT produced from that image and caption.
Contents
Context
Framework
Technology
LearningsArchive
Learnings
Due to the back to back elections, events moved quickly and a lot of ideas for improvement were generated in short order. In both elections, Starling Lab worked closely with the photo and engineering teams at SCMP to effectively deploy these technologies. Here is what we learned:
Metadata Collection
Due to GPS sensors on the phones not always having an up-to-date location (especially indoors where GPS signals are weak) some photos ended up missing location information in the metadata collected. Although lack of precise location is problematic, the GPS timestamp data included with the metadata is a valuable record indicating when location data was last acquired.
Signature Challenges
The HTC Exodus 1S’s hardware-backed signer does not make it possible to sign photos as a background process (i.e. in the background, as photos are being taken) which means photojournalists had to manually type in a PIN to unlock the signer after each photo is taken. As journalists take multiple continuous shots at a time, it’s important that any signage can hash and sign many full-size photos without manual PIN input.
To solve this, Starling Capture created an implementation that generates a software session key that can quickly sign camera photos and their metadata as photos are taken. This session key is itself later signed by the Exodus 1S hardware keys and persists on the device for the duration of capture sessions. An ideal solution would be for a hardware signer to support touch-free fast signage natively.
Several issues also came up with the cryptographic signature implementations in the two experiments. These include proper recording of all cryptographic material, such as:
- Public keys per device
- Any intermediary keys and their attestations (e.g. software session keys and their hardware signatures in the trust chain)
- Recording of algorithms used to generate signatures
- Reproducible messages used to generate the hashes for signage
Missing pieces has led to some of the metadata lacking proper cryptographic attestations, and arising issues were addressed throughout the project. The learnings from here is to implement logic at the beginning of the pipeline that proactively validates the full trust chain of signatures, and to adopt standards such as RFC 8785: JSON Canonicalization Scheme (JCS) to ensure signed content produces reproducible hashes.
Logistical Challenges
In the course of the project, we encountered several issues for the first time:
A general need for the capture process is equipment that simultaneously captures location data the moment that photos are taken. Though there are some cameras that can do this, not all journalists have those models of camera, so cell phones were used in order to enable all journalists to capture data with their photos. Collecting the exact time and location of photograph capture can potentially be refutable when there is any delay between capture and timestamp (such as problems with WiFi connectivity, or a lapse in time when an SD card is transferred), and having journalists having to take extra steps to transfer photos to the phone while on-site was a challenge during fast-moving news coverage.
In the first election, a local WiFi connection between the Canon camera and the HTC phone was required for journalists to connect their cameras over the CCAPI to the phones collecting and signing the data & photos with Starling Capture, and the WiFi connection was unstable, meaning that not only was it difficult to transit photos, but that there was some loss of photo records if too many were captured simultaneously. This is especially problematic when in crowded areas with a lot of WiFi noise, such as indoor spaces where an election is held. This led to the use of SD for the second election, where SD cards can be manually moved from the camera to the phone to transfer photos and metadata, eliminating the reliance on an always-on WiFi connection via CCAPI.
Application Improvements
In this experiment, Starling Lab worked with the CAI JavaScript SDK to develop custom previews or ‘info icons’ that display custom C2PA manifest data such as the Filecoin Piece Content Identifier (CID) and IPFS CID, which are identifiers of storage information about the photos. At the time, the pre-release SDK (which is now publicly released) was being actively developed by Adobe, and the SCMP engineering team worked through several technical hurdles to integrate the frontend onto the production news site. The SDK is now publicly released.
Data Storage Considerations
There were also logistical road bumps uncovered as the team tried to work quickly and publish articles to the SCMP as the stories unfolded. The original intention was to have images with proofs of storage embedded at the time of publication, however because of factors like the amount of time it takes to make a deal to store on the Filecoin network (this can take over a day), images without signed metadata (Complete C2PA Manifests) were first published, then they were replaced with images that included proofs of storage (CIDs) a few days after the original article publication.
In addition, although there was a pipeline set up to automatically capture the original photos from journalists with Starling Capture or ProofMode + Signal using an API, the process of sharing edited photos between SCMP and the Starling Lab team was rather manual, involving editors having to manually upload new versions of edited photos, and the Starling Lab team needing to check a shared FTP or Dropbox directory for updated versions of images, so that Starling Lab can prepare photos and data for Filecoin archival and additional signatures.
Though the data transfer process is inconvenient, it isn’t by any means a major consideration. It was clear from the beginning that various tools lack support for attesting to content changes and the Starling Lab team will have to manually retrieve assets to inject attestations. A future solution to this would be for more tools to natively support signed attestations as they are used to edit photos, and that versions of photos can be synced over shared cloud storage.
Authenticity in the Field
This project demonstrated that journalists could successfully deploy sophisticated cryptographic and blockchain-based authenticity tools on a significant live news story under real-world time and operational pressures. The authenticated photographs of this experiment serve as a verifiable and important historical record preserving the time, date, and pixels of SCMP’s photojournalism of two significant elections.
Contents
Context
Framework
Technology
LearningsArchive
Archive
Publications / News Articles
Photo published is of the 40 lawmakers who make up the Election Committee; no location data
Published CAI content credentials here
Photos published include photo of Candidate Regina Ip of the New People’s Party calling for votes in Aberdeen and moderate candidate Jason Poon out in Kornhill seeking support.
Published CAI content credentials here
Published CAI content credentials here
Published photo shows Ng Chau-pei (left) and Edward Leung who defeated Jason Poon in the Hong Kong Island East constituency
Published CAI content credentials here
Published CAI content credentials here
Published photo shows Tik Chi-yuen (second from left) of the middle-of-the-road party Third Side who won the seat for the social welfare functional constituency.
Published CAI content credentials here
Published photo shows two women walking out of a polling center at Shek Wu Tong in Kam Tin.
Published CAI content credentials here
Mapping of Photojournalists at the Two Elections
Spreadsheet of photographs & GPS data
Map: https://www.google.com/maps/d/edit?mid=17-NQuxlFtazY-zjBnvc8xAt66CPcegY&usp=sharing
Participating Photographers & Published Photos
- May Tse - no Starling Framework photographs published in SCMP
- Felix Wong - Article 3160469 (1 image, LegCo)
- Nora Tam - Article 316033 (2 images, LegCo), Article 3176982 (3 images, Executive)
- K.Y. Cheng (Cheng Kok Yin) - Article 3160321 (2 images, LegCo)
Dickson Lee - Article 3160475 (1 image, LegCo), Article 3160466 (1 image, LegCo)
Also by this author
Reuters and Canon Deploy Verifiable Photo Newswire

End-to-End Content Authenticity, from the Canon Camera to Reuters Desk
Learn how a collaboration between Reuters News Agency, Canon technologies, and the Starling Lab for Data Integrity created a novel end-to-end authenticity workflow to protect photojournalism.
Basile SimonReading Time: 5min
![]()
Prototypes
Share
Contents
Background
Background
Reuters News Agency, Canon technologies, and the Starling Lab for Data Integrity teamed up to demonstrate a novel end-to-end authenticity workflow. By incorporating a content management system created by Fotoware, which is widely used by reporters and news agencies, the project achieved end-to-end preservation of all provenance metadata.
Editor’s Note (2025): Since the conclusion of this project, the landscape of image authentication has rapidly evolved. While this proof-of-concept required bespoke, modified prototype hardware, we are now seeing the integration of cryptographic hardware and software signatures directly into standard, consumer-grade cameras. Devices such as the latest Sony and Leica models, as well as smartphones like the Google Pixel 10, now provide built-in authenticity layers.
This commercial availability signals a significant step toward the widespread, accessible adoption of the provenance frameworks pioneered in this prototype.
Contents
Context
Fellowship Projects and Awards
Context
For this project, Starling Lab teamed up with Thomson Reuters and Canon to prototype the future of photo authentication, addressing the growing threats of visual mis- and dis-information. As a leading global news agency, Reuters works relentlessly to capture the first draft of history. To protect the hard-earned trust newsrooms have built—especially as generative AI makes digital deception increasingly accessible—we need robust technology that guarantees the provenance and auditability of photojournalism.
Canon, a world-leading imaging provider and Content Authenticity Initiative (CAI) member, built key cryptographic features directly into their camera firmware for this collaboration. This allowed the Canon camera to serve as the project's root-of-trust, setting a new precedent for authenticity standards in the news community.
This project builds directly upon Starling Lab’s earlier 78 Days initiative. While that project successfully preserved capture information, it relied on tethering a camera to a cell phone via Wi-Fi to establish a root-of-trust. This method was logistically challenging and presented usability issues for journalists in the field. Furthermore, it lacked an automated way to track metadata end-to-end as media inevitably moved through newsroom content management systems (CMS) and photo editing software.
For this iteration, the team set out to solve these gaps by integrating emerging technologies from Canon, Hedera, Photoshop, and ProvenDB to build a seamless, end-to-end preservation pipeline.
The project's images were captured by photojournalist Violeta Santos Moura in eastern Ukraine during March and April 2023, documenting the devastation of Russian attacks along the front lines. Armed with a prototype Canon EOS R3 camera featuring built-in cryptographic integrity, Moura's photos were processed through an integrated workflow that tracked every subsequent edit via Fotoware CMS and Starling’s Integrity Backend.
Because documenting conflict involves severe safety risks, extreme care was taken regarding data publication. While extensive contextual metadata was securely preserved and sealed for future evidentiary or historical use, only two photos and their provenance records were made public. These results were published on a Reuters microsite, providing a powerful demonstration of how end-to-end authenticity workflows can successfully protect the integrity of digital media.
Contents
Framework
Framework
The Starling Lab Capture, Store, Verify framework was applied to the existing Reuters journalistic workflow and interlaced with distributed web technology and traditional cryptographic practices to create a record and interface to understand the provenance of images that are captured in the field.
The Challenge
The traditional journalistic process relies heavily on public trust in a publisher's reputation. However, in the digital age, a photograph rarely reaches the public exactly as it was captured.
A core challenge in authenticating news media is accounting for the reality of "permissible edits." In photojournalism, editing a raw file is not inherently deceptive; it is a necessary step in the editorial process. Photo editors routinely crop images to fit specific publication layouts, adjust exposure or color balance to ensure visual clarity, and append critical contextual metadata such as captions, location data, and copyright credits. While these routine, ethical adjustments do not alter the factual truth of the scene, they inherently change the digital fingerprint of the file.
At a global news agency like Reuters, this process happens at an industrial scale. Thousands of images flow daily from the front lines into centralized Content Management Systems (CMS), such as Fotoware, where they are accessed by editors worldwide. These enterprise systems are built for speed and efficiency, often automatically compressing files, reformatting data, or overwriting previous metadata with every single save. Because standard asset management systems lack the ability to immutably version these changes, the cryptographic chain of custody is easily broken the moment a photo enters the newsroom pipeline.
Therefore, the central challenge of this project was not just technical, but operational. A successful provenance system cannot disrupt the fast-paced business of producing the daily news. If journalists are burdened with clunky field equipment, or if editors must drastically alter their workflows to manually log changes, the solution will not scale. The goal was to establish a secure root-of-trust at capture and seamlessly track every permissible modification through existing CMS platforms, creating a friction-free, verifiable audit trail from the lens to the reader.
The Prototype
To solve this, the Starling Lab engineering team designed a seamless, end-to-end pipeline to register and preserve every step of a photograph’s journey. Crucially, this system was engineered to operate in the background, ensuring it would not disrupt the fast-paced newsroom environment. Moving away from the logistical constraints of previous experiments, the new prototype was built around three continuous phases:

First, the establishment of a root of trust. The process begins by embedding cryptographic capabilities directly into the journalist's equipment. Rather than relying on clunky external tethers, Wi-Fi pairings, or companion smartphone apps, the prototype relies on a camera capable of cryptographically signing the image and its foundational metadata at the exact moment the shutter is pressed. This ensures the original file's integrity is sealed at its inception.
Second, tracking the editorial pipeline. Once captured, the system creates an immutable audit trail that works invisibly alongside standard newsroom tools. As the image is transmitted to the publisher's asset management system and undergoes permissible edits by photo editors—such as cropping, color correction, or captioning—an automated background process tracks the file. Every single modification is recorded in a private, verifiable database that is anchored to a public distributed ledger, creating a mathematically provable edit log.
And finally, the delivery of a verifiable asset. The final step focuses on empowering the end consumer. The prototype packages the initial hardware signature, the original metadata, and the complete, cryptographically secure edit history into a standardized, open-source manifest. This manifest is embedded directly into the final published image, allowing anyone—from researchers to everyday readers—to inspect the file and independently verify its authentic journey from the frontlines to their screen.
Contents
Technology
Technology
The Starling workflow we developed enabled Violeta Santos Moura, the photographer for this project, to take photos in Ukraine using a special concept camera from Canon, CMS software from FotoWare, and the Starling Integrity pipeline and authenticate that data immediately upon upload to the Fotoware CMS. The editing team was enabled in tracking all their edits and changes with integrations between Fotoware CMS & the Starling Integrity backend, as well as the use of Photoshop for any edits, tracking changes in a C2PA manifest.
Starling was able to use this process of authenticated capture by the photographer, put it into secure and verifiable sharing and storage, and publish verifiable versions of this data with ProvenDB and Hedera (hashgraph?). This work done alongside the Reuters team preserves the data about the lifecycle of an image and its integrity, which is tracked, verified, and registered on the blockchain.

Capture
The journey of establishing an unbroken chain of provenance began directly on the frontlines with a first-of-its-kind hardware integration. The prototype process starts with a photographer in the field who was given one of six prototype cameras. In our previous 78 Days project, establishing a root of trust required tethering a camera to a cell phone via Wi-Fi. This created severe logistical hurdles: congested networks interfered with the connection, and the camera could not simultaneously connect to the internet to upload files. Furthermore, the system only supported individual photographic shots, lacking the capability to hash and sign burst mode or large video files.
To solve this, Canon, Reuters, and Starling Lab spent three years developing a modified firmware for the Canon EOS R3 prototype. Advised by Stanford cryptography professor Dan Boneh, Richard Shepherd and Akiyoshi Ishii from Canon created a camera that can digitally assign a time, date, and location, then digitally hash and sign that set of information – without external tethers, using a firmware software signature.

A private key is written directly into the firmware, programmed at the factory with a unique private key associated only with that specific device. At the exact moment of capture, the camera computes a combination hash derived from both the raw image pixels and the EXIF metadata. Images taken by the photographer are now hashed and signed before they are written to an SD card where they are stored. The camera, pre-programmed at the factory with a unique private key, uses this key to cryptographically sign the combination hash. This signature is then appended directly to the end of the JPEG data—specifically after the 0xFFD9 marker—using a custom scheme.

Bypassing the need for any companion apps or external devices, the camera instantly transmits these signed images straight from the device to a Fotoware FTP server via Wi-Fi.
Store
Once the image leaves the camera, the system must immediately secure the original asset before it can be altered by standard newsroom ingestion processes. When a new file hits the Fotoware FTP server, a custom webhook notifies the Starling Integrity server. Starling instantly downloads a copy of the file and verifies the Sig66 signature against the specific camera's known public keys, ensuring the file hasn't been tampered with in transit.
To guarantee the image's long-term survival and immutability, the original file is preserved on decentralized storage networks, specifically utilizing the IPFS, Filecoin, and Storj protocols. Simultaneously, Starling constructs a secure metadata package and registers the asset immutably across several public blockchains. This registration process leverages OpenTimestamps, which is anchored to Bitcoin, as well as the Avalanche, Numbers, and LikeCoin (ISCN) networks, creating a permanent, decentralized public record of the original capture.
Verify
The most complex phase of the project involved tracking the permissible edits made by Reuters photo editors. Integrating the authentication into a piece of software like Fotostation adds in a critical piece of processing power that enables the creation of provenance records alongside the media itself. Centralized systems like Fotoweb are not built to preserve cryptographic integrity.
Once photos are uploaded via FTP to a Content Management System (CMS), journalists can view and download them on the FotoWeb asset management system. Editors can manipulate the photos; add metadata and change the content with cropping, recoloring, and other enhancements in preparation for the image’s publication. FotoWeb currently has no versioning features, so there are no logs or indicators when images are changed. Each change overwrites the previous metadata and image without a log to refer to and there is no way to see the original image taken by the photographer. Furthermore, when an image is first ingested, Fotoweb automatically parses the photo's EXIF metadata, turns it into XMP, and puts it back into the photo. The new timestamp included in the XMP fundamentally changes the identifying hash of the image.
To create a verifiable edit log, Starling augmented the system to track specific actions. The content management system is augmented with triggers and processed by the Starling Integrity backend. Anytime Fotoweb is used—whether an editor is changing metadata, adding a caption, or altering the image—the XMP data changes and the timestamp updates. In response, Starling Lab adds another entry in a C2PA manifest. Simultaneously, the system runs a Hedera webhook to register the change on-chain. This makes it possible for us to record changes in a local database with hashes that are anchored to a public ledger.

The system meticulously recorded three main types of actions on Hedera and ProvenDB:
- Creation (the initial capture and ingestion).
- Metadata changes (e.g., when someone goes into Fotoweb and edits a caption; mapped as Fotoweb > ProvenDB > Hedera).
- Photoshop edits (executed within FotoStation, mapped as PhotoWeb > FotoStation > Photoshop > FotoStation > PhotoWeb + ProvenDB > Hedera).
Not only does this mean that a version history is created, it also means that we have an immutable, linked cryptographic record of all changes, in several systems, that can be used if the veracity of a journalistic image is ever called into question. Ultimately, the initial capture data and the complete editorial history are combined to produce the final C2PA image for publishing, empowering the public to independently verify the photograph's authentic journey from the frontlines to their screen.

To protect potentially sensitive metadata, the original assets registered in the Reuters V2 proof-of-concept are encrypted using an AES key before being stored on IPFS and Storj. Because this specific prototype does not contain sensitive data, a single AES key is shared openly, allowing users to inspect the archives using three technical methods:
AES key: 8ff68fd1321c51570bee444eabcc44fd90610567e4bc87749bae46ad2537af67
- Starling Archive Explorer (URL Parameters) For seamless access without technical expertise, users can view the decrypted assets directly in their browser using the Starling Archive Explorer. This is achieved via direct URLs that encode both the IPFS source file location and the AES encryption key, automatically decrypting the content upon loading.
The links for the two assets are as follow:
- Lit Protocol via Web3 Wallet Authentication To demonstrate decentralized, permissioned access, the system utilizes the Lit Protocol – a distributed key management network. Users can mint a "Starling Access Token" to a MetaMask wallet via the Starling faucet. By navigating to the token on OpenSea and clicking "View website," users are redirected to the Starling Archive Explorer. Upon signing a MetaMask transaction to prove token ownership, the Lit network securely provisions the decryption key to reveal the archive.
There is an animation on how this works at the bottom of https://github.com/starlinglab/archive-explorer
- Command Line Decryption (OpenSSL) For manual inspection, users can download the encrypted file directly from an IPFS gateway using command-line tools like wget. Once downloaded, the file can be decrypted locally into a standard ZIP file using OpenSSL's AES-256-CBC cipher. This requires running the openssl command:
openssl aes-256-cbc -d -in bafybeidgi3yqkppxydeh5znm4jg3pycjnteyt25nogzj7ru7yek2lzttcy -out archive.zip -iv 0 -K 8ff68fd1321c51570bee444eabcc44fd90610567e4bc87749bae46ad2537af67
Contents
Learnings
Learnings
Tooling and Infrastructure Constraints
During the development of the prototype, the team encountered multiple software conflicts. For example, they discovered that Photoshop was unable to execute both C2PA injection and change log exports at the same time. Furthermore, the C2PA verification site struggled to decode large integers correctly, which required a collaborative bug fix from Adobe.
Integrating with enterprise content management systems also presented friction. The team had to figure out how to deal with FotoWeb fundamentally altering files upon every single upload, a process that inherently breaks C2PA manifests. To properly manage this metadata, a dedicated Starling XMP area was created. Finally, the project faced infrastructure disruptions when Hedera's public tooling (DragonGlass) stopped working unexpectedly, forcing the engineering team to rapidly switch over to alternative tools
Overcoming Previous Tethering Limitations
The hardware integrations in this iteration were specifically designed to solve the significant hurdles encountered during the previous "78 Days" project. In that earlier workflow, establishing a root of trust required tethering a camera directly to a cell phone via Wi-Fi. This presented considerable usability challenges, starting with the difficulty of simply establishing the connection between the camera and the phone. In the field, noise and congestion over various network bands could easily interfere with this Wi-Fi tethering. Moreover, while the camera was tethered to the cell phone, the photographer could not upload photos because the camera lacked concurrent internet connectivity
Camera Capabilities and Future Work While the new prototype successfully eliminated the need for tethering, the bespoke camera hardware still faces functional limitations. For this specific proof-of-concept, the system only supported individual photographic shots. Advanced capture modes, such as burst mode and video capture, were not supported. In order to enable these capabilities in future iterations, additional engineering work will be necessary to support the hashing and signing of the significantly larger files associated with video media.
Editor’s Note (2025): Since the conclusion of this project, the landscape of image authentication has rapidly evolved. While this proof-of-concept required bespoke, modified prototype hardware, we are now seeing the integration of cryptographic hardware and software signatures directly into standard, consumer-grade cameras. Devices such as the latest Sony and Leica models, as well as smartphones like the Google Pixel 10, now provide built-in authenticity layers. This commercial availability signals a significant step toward the widespread, accessible adoption of the provenance frameworks pioneered in this prototype.
When A Screenshot Isn’t Enough
When A Screenshot Isn’t Enough
Starling Lab and the Associated Press teamed up to investigate the extent and implications of government monitoring, building an authenticated archive with evidence that had been posted online.
Starling LabReading Time: 5min
Prototypes
Tags
Share
Contents
Background
Framework
TechnologyLearningsArchive
Background
Data can paint an intimate portrait of any person in modern society. In the hands of authorities, the way data is collected and used can present important privacy concerns.
During the early days of the COVID-19 pandemic, governments around the world got a firehose of individuals’ private health details – including photographs that captured people’s facial measurements and home addresses – to power surveillance tools that government officials said would help stop the spread of coronavirus.
For more than a year, Associated Press journalists interviewed sources and pored over thousands of documents to trace how some of those technologies marketed to “flatten the curve” were put to other uses. But most importantly, they wanted to understand who was impacted.
Working with staffers from Hyderabad, India, to Beijing to Jerusalem and Perth, Australia, the AP team found that authorities used these technologies and data to halt travel for activists and ordinary people, harass marginalized communities and link people’s health information to other surveillance and law enforcement tools. In some cases, data was shared with spy agencies.
India, which has been a global leader in tech development, provided a particularly interesting example. As the pandemic took hold in 2020, local police were tasked with enforcing mask mandates. The AP team soon saw via social media that officers had turned to facial recognition technology software to zero-in on people not wearing masks. But how were those facial scans being used?
When an AP reporter met with high-level police officials in Hyderabad, first they denied using facial recognition. But lower-level officers later divulged that they did, and could decide whose face they scanned in part based on who they deemed “suspicious.” That stoked fears among privacy advocates, some Muslims and members of Hyderabad’s lower-caste communities, who urged the journalists to press further.
Contents
Context
Framework
TechnologyLearningsArchive
Context
Garance Burke is a global investigative journalist from the Associated Press. As part of her Starling Lab journalism research fellowship, she wanted to incorporate new open-source methodologies into reporting on the misuse of surveillance tools deployed by authorities globally during the pandemic. Examples from her research would be displayed in an article entitled Police seize on COVID-19 tech to expand global surveillance, part of the award-winning Tracked series.
Working with Avani Yadav (a colleague at the University of California, Berkeley Human Rights Investigations Lab), Burke used open-source investigation methods to identify and authenticate social media posts and video/audio/photo material from police agencies and individual officers about their use of facial recognition and other AI-powered technologies in India. The team built spreadsheets of Twitter, WhatsApp, Reddit, Telegram, Facebook posts and began archiving that material using Hunchly.
Then, Burke turned to Starling Lab for assistance with the secure capture, authentication, and storage of this material.
Meanwhile, AP colleague Krutika Pathi continued to investigate police use of facial recognition cameras during the pandemic in predominantly Muslim neighborhoods. She and video journalist Rishabh Jain got rare access to police headquarters, allowing what they deemed a fair portrayal of the agency’s tech arsenal inside Hyderabad’s Command and Control Center. There, officers showed them how they run CCTV footage through facial recognition software that scanned images against a database of offenders.
Indian privacy advocates said these kinds of stepped-up actions under the pandemic could enable what they called “360 degree surveillance,” under which things like housing, welfare, health and other kinds of data are all linked together to create a profile.
Government officials would sometimes reveal more details about how they were using surveillance technology by posting about their methods on social media. A 2020 tweet from the police chief of Telangana state included photos of unsuspecting locals with colored rectangles overlaid on their maskless faces, apparently automated by their new tools. The following year, police shared photos of themselves using handheld tablets to scan people on the street using facial recognition.
These sorts of posts added more dimension to the story, but could be deleted later on. Entire accounts could be suspended or set to private. In order to preserve such ephemeral records – and to prevent future denialism – posts and other webpages in this reporting would need to be carefully archived and published in a verifiable way.
Contents
Framework
TechnologyLearningsArchive
Framework
Starling Lab uses several technologies to implement its three-part framework of Capture, Store, Verify. In this project, Starling Lab employed web archiving tools to document how surveillance technology was used not only to control the spread of COVID-19, but also as a means of social surveillance. The framework was applied to capture high quality and authenticated archives of social media posts and other web-based evidence.
The Challenge
Research and data collection for this project was done in the summer and fall of 2022. The goal was comprehensive capturing and archiving of social media pages in a way that preserved the full context of the website.
AP uses several methods to save web content in their traditional research and documentation workflow. A simple option is to take screenshots or similar screen captures of websites. They might also refer to old snapshots taken of webpages by the Internet Archive’s Wayback Machine. Within their stories, they might also embed content (like a Twitter post) directly from a website using a simple API format called oEmbed.

Unfortunately, externally-hosted methods mean reliance on a single, centralized entity to keep these resources available on the web. These sites could go offline, and artifacts and evidence can be altered in a way that is not evident to end users. The Wayback Machine has faced attacks from hackers. Google announced that its URL shortening service – which content links may depend on – will be shut down and all historical links will break.
Screen capture, on the other hand, may have its authenticity called into question as an image can easily be tampered with – especially in the age of generative AI.
Different editors and reporters use different methods for managing content and resources gathered for an article. These might include Google Drive, Microsoft OneDrive, or Dropbox. They might also create a shared document with a list of links and resources. As a whole, there is no standard process for keeping notes and managing the content gathering. This isn’t unusual for a large, spread out organization, when there are a lot of different processes that work for each individual unit.
During the investigation phase for this story, reporters and investigators used web archiving tools such as Hunchly to keep records on local machines, cloud drives, and on AP secure servers.
Once stories are developed, one of two Content Management Systems (CMSs) are used for publishing content. The primary CMS used is an in-house custom built tool designed for authoring and publishing content. This system supports text, photo, and video media types. Formats for full web archives (ex: WARC or WACZ) cannot be directly embedded, and therefore screen captures and external resource linking are common ways to include such evidence in stories.
Contents
Framework
TechnologyLearningsArchive
The Prototype
This was Starling Lab’s first project involving the capture and display of web pages. In this case, social media posts were the primary focus, especially given their vulnerability to disappearance. In order to preserve this evidence, Starling Lab worked with Webrecorder to implement their suite of tools that can capture authenticated web archives. We then stored them using decentralized systems, and embedded web archives directly in an AP story for readers to explore the rich context and authenticity information packaged into the archive.
Burke served as the point person translating and coordinating technical needs across the Starling engineering and AP teams. AP’s then-data editor Justin Myers ran point on implementing the technical requirements on the AP CMS.

Through this collaboration with Starling Lab, the AP team was able to capture dozens of verifiable social media posts using Webrecorder. The team put together a collection of web archive files that were cryptographically hashed, signed, and preserved in redundant storage systems, with a public record of exactly what was captured – and when – using blockchain registrations. To support content verification by general audiences and other investigators, an authenticated web archive is embedded in the article on AP’s website. It includes authenticity information, and the archive file itself can be downloaded for independent auditing.
Contents
Context
Framework
Technology
LearningsArchive
Technology
The Webrecorder suite of tools makes it possible to capture a snapshot of a website at a certain point in time, including all the individual elements such as embedded photos, links, scripts, and other types of media on the page. For example, when capturing a social media post, comments by other users, including usernames and profile pictures are also preserved in the web archive. This produces an interactive copy of a website that can be embedded in articles, even if the originals are removed from the original location on the web.
Starling Lab engineers and the AP team developed a workflow that would enable the capture of dozens of social media posts using Webrecorder. The tools are paired with the Starling Framework for Integrity—Capture, Store, Verify—to ensure records of their authenticity are not only contained within the web archives, but also registered on immutable ledgers and the archive content themselves are redundantly preserved on distributed storage networks.
Capture
Identifying Web Content for Archiving
During the investigation, Burke first identified records that were at risk of being wiped from the internet. These include PDF files, social media posts, photos, videos, leaked documents, and more.
An important part of this process is also the identification of effective and efficient data scraping tools/techniques, and setting up a central repository that can be collaboratively worked on. To accomplish this, reporters from AP emailed lists of links that they wanted to capture, and the list was added to a self-hosted version of Browsertrix operated by Starling Lab. The sites were crawled and the produced web archives were signed with a Starling signing certificate to ensure this “observation” of the website is attributable to Starling, who conducted the web crawl. These produced web archives are shared with AP with cloud storage.
Producing Web Archives with Webrecorder Tools
Using the open source software developed by Webrecorder, namely Browsertrix, the Starling team was able to capture authenticated and high fidelity websites identified by AP. These include web applications containing content that is highly interactive, such as social media posts where comments are dynamically loaded as a user scrolls through the comment feed.
Browsertrix downloads everything that exists on a web page in a zipped archive called a WACZ file. A WACZ file is a copy of the code and media that makes up that webpage, and the package includes an index of what content was captured and provenance information about the downloaded content such as cryptographic hashes, signatures using Authsign, and information about the Webrecorder tool used to produce the WACZ file. When users later display (or “replay”) the page using ReplayWeb.page, it remains fully interactive as the original content, and progressively verifies the loading content against its hash and signature to ensure elements of the page have not been tampered.
While Browsertrix was used to crawl a long list of websites and was able to handle most dynamically loaded content, some types of websites evade automated crawling. For example, one of the PDF files was behind a CAPTCHA, which required a human to solve. In this instance, the Starling team employed another Webrecorder tool—ArchiveWeb.page, to manually crawl the site.
In the case of Twitter posts, Webrecorder developed one specialized tool—OEmbed, to help render the embeddable version of a post. Anticipating that articles would prefer to embed the cleaner embeddable view, the team crawled both the original Twitter link and the version rendered through embedded, in the same WACZ file. For example:
- On Twitter: https://twitter.com/TelanganaDGP/status/1258675268924739584
- Through oEmbed: https://oembed.link/https://twitter.com/TelanganaDGP/status/1258675268924739584
This way, both versions of the snapshotted site can be inspected, to ensure that the content is not altered in the oembed.link domain.
Ensuring Web Archives are Authentic and Tamper-evident
The Webrecorder team developed the Authsign specification and tools so WACZ files can be authenticated at the time they are produced. The technology relies on using Let’s Encrypt certificates associated with a domain name to sign content fingerprints in the web archive. This allows the operator, in this case Starling, who is observing the web to produce web archives to attest to the authenticity of the content using their domain name.
In addition to the built-in authenticity mechanism of WACZ files, Starling also used its integrity pipeline to register the WACZ files themselves on immutable ledgers. Nine of the files from the Tracked series were registered using the Numbers Protocol on the Avalanche blockchain (a fast, decentralized, open-source blockchain that offers smart contract functionality) and using the ISCN standard on the LikeCoin blockchain (a chain specialized for decentralized publishing).
Store
Unlike web archives hosted on platforms like the Internet Archive, WACZ files have content and authenticity information self-contained and can therefore be stored anywhere. For example, AP stores each WACZ file in a S3 bucket, meanwhile Starling has them stored on our internal systems as well as on the Filecoin distributed storage network. This prevents censorship risks associated with centralized storage providers.
Verify
In order to display a web archive as embedded content on an article, Webrecorder provides scripts that developers can use to “animate” a WACZ archive referenced. In this article from the AP Tracked series, about 2/3 of the way down, is an example of an archived Tweet. The version on this page is an example of the OEmbed-filtered embeddable version of the Tweet.
During this collaboration, the Webrecorder team worked with Starling to develop a UI element that allows readers to view provenance information of the web archive they are interacting with. They can see the URL crawled, the producer of the Authsign signature and their signing certificate, among other authenticity data, as well as a button to download the original archive that they can import into ReplayWeb.page to independently explore.

When the downloaded WACZ file is imported into ReplayWeb.page, two URLs are available to be explored. They represent the oEmbed version (as displayed on the AP site) and the original Tweet crawled from Twitter directly.
Unlike a simple screen capture, each individual element is captured independently, and can be inspected as their original resource.
Readers can scroll through the comments, and for example, playback videos that is captured from the comment section.
This native support for verification workflow offers clear benefits to adopt authenticated WACZ files for evidence collection and preservation. Additionally, the self-contained nature of WACZ files, when paired with the Starling Framework for Data Integrity allow flexible registration and preservation strategies that enhance the provenance and availability of WACZ files themselves.
Contents
Context
Framework
Technology
LearningsArchive
Learnings
Quality Assurance of Crawled Pages
Throughout the investigation, many links were submitted to Browsertrix for web archiving. Although Browsertrix is one of the best web archiving tools, automated crawling of websites is inherently a tricky procedure. For example, the content of interest may be behind a popup dialog, or requiring page scrolling in order to load, or at the time of the crawl the content somehow failed to load. To reliably determine whether a web archive captures the content of interest, human inspection is necessary.
At the time of the investigation, Browsertrix did not have an integrated review system, which means each time we discovered a web archive that failed to capture the content of interest, we had to resubmit the crawl, and if automated crawl is not possible, we would have to opt for a manual crawl by scrolling through the page to load the content of interest in the browser.
In order to streamline the quality inspection process, Starling discussed with Webrecorder a review system within Browsertrix that allows the user to first playback the web archive within the platform, then approving or resubmitting for crawl. Browsertrix now has added a review and tagging system to support this workflow.
Scheduling Recurring Crawls
Another feature AP requested was the ability to schedule crawls of a website on a regular basis, as some sites have content that changes over time, and also as a way to monitor whether site content or social media posts are taken down. An additional bonus feature would be to monitor when and if a social media post is taken down.
At the time of this investigation, Browsertrix’s scheduling feature was still in development, so we did not schedule recurring crawls, although links were often crawled at multiple points in time through manual submission. It is now possible to automate recurring crawls using Browsertrix, which would be useful for future investigations. Watching for changes, or content disappearance, over time, remains an item on AP’s wishlist.
Embedding a Web Archive in an Article
To embed an archived Tweet, we need to load a JavaScript plugin on the AP website. However, Content Security Policy (CSP) on AP’s website prevents loading of resources from other domain names, which prevents us from using a copy of the ReplayWeb.page component from a content distribution network (CDN).
To resolve this, the team decided to host the ReplayWeb.page component directly on apnews.com, which is an exception to the standard content management workflow, and involved unanticipated operational overhead to execute. This has a direct impact on the maintainability of our site going forward because the component needed to be kept up to date and each update required additional verification. The correct rendering of this article now depends on the ReplayWeb.page component, and therefore is more brittle than regular text and image articles.
The WACZ files themselves are stored in, and loaded from, Amazon S3 buckets. Here we encountered another problem related to web security. While CSP prevents the AP site from including resources from other domains, Cross Origin Resource Sharing (CORS) protects resources from being inserted into other domains. Browsers will block resources loaded from a different domain than the website unless the remote resources explicitly allows for such behavior. This is a security measure to prevent an attacker from making unauthorized requests on behalf of the user. To resolve this we simply had to configure S3 to allow all origins when sending the WACZ resource.
Attesting to the Creation of a Web Archive
To create a web archive, one needs to crawl the website and record the content into WACZ format. It is crucial for the organization conducting the crawl to add a claim that this archive is based on their observation of the website. Starling Lab explored with AP several arrangements for making these attestations, which are secured by digital signatures.
In this implementation, the Browsertrix crawling service is operated by Starling, acting in collaboration with AP, so it is debatable as to who should be signing these attestations of creation. In other words, who should be responsible for maintaining the signing certificate for these web archives, which is associated with an organization’s domain name.
Some of the options explored include:
- Pointing an AP subdomain to a Starling-operated signing server, so Starling can generate signing certificates (with AP’s domain) to attest to crawls conducted by Starling on AP’s behalf. This was deemed unacceptable from an information security perspective by AP’s technology team.
- Having AP operate a signing service that will sign whatever is presented to it by authorized Starling crawl servers. This was also too much of a security risk for AP’s technology team, as they have little control over what web archives Starling sends to them, and their signing service would blindly attest to them.
- Having Starling sign the archives with a Starling domain name. This way, Starling is not signing on behalf of AP, but rather signing on its own behalf. AP will simply publish an archive attested to by Starling.
Despite AP’s signature being more publicly recognizable, the team ultimately decided to take the path of having Starling sign the web archives with its signing certificate, because there is no straightforward way for AP to ensure proper use of their signing certificate unless the news organization operates the web archiving platform themselves. AP may of course speak to Starling as a reliable collaborator, but the digital signature should be from the creator of the web archives.
Contents
Context
Framework
Technology
LearningsArchive
Archive
News Articles
- AP News, December 20, 2022 Police seize on COVID-19 tech to expand global surveillance
- AP News, Tracked Series landing page
- Pulitzer Center Update, Garance Burke, June 2023 - Tracked: How AP Investigated the Global Impacts of AI
Awards and Recognition
- National Headliner Awards - First place for “Public service in newspapers in top 20 media market”
- News Leaders Association - finalist for First Amendment Award
- Clarion Awards (Association for Women in Communications) - winner for Newspaper Investigative Series
- Deadline Club Awards (NYC Society of Professional Journalists) - finalist for “Science, Technology, Medical or Environmental Reporting”
The 78-Day Archive: Rebuilding Trust in a Digital Age
78 Days Case Study
Creating a Photographic Archive of Trust
Starling LabReading Time: 5min
Prototypes
Share
Other Fellowship Projects →
Background
For 78 days, teams at the Starling Lab and Reuters worked together to document the presidential transition from Donald Trump to Joe Biden with an array of new image authentication technologies and decentralized web protocols.
Read our complete case study here.
Mapping the Web3 Ecosystem for Educational Curricula
Mapping the Web3 Ecosystem for Educational Curricula
Explore the Starling Lab's Web3 Visual Maps project & curriculum. Learn how the provenance framework (Capture, Store, Verify) uses cryptography and decentralized networks to fight mis/disinformation in journalism and media literacy education.
Team
Reading Time: 5min
Share
Contents
Background
Framework
TechnologyLearningsArchive
Fellowship Projects and Awards
Background
Instructional modules were developed by Starling Journalism Fellowship Director Ann Grimes and 2022 Starling Fellow Aaron Huey. Working with Stanford Electrical Engineering Professor Tsachy Weissman’s interdisciplinary SHTEM summer internship program for high school and community college students, Huey and Grimes designed and experimented with a curriculum that addressed the problem of mis/disinformation and introduced how new “web3” technologies could be applied and potentially reduce information uncertainty and increase users trust in journalism and the media writ large.
Grimes is a veteran journalist who previously held senior editorial positions at The Washington Post, the Wall Street Journal and served as Director of Stanford’s Graduate Program in Journalism. Huey is a National Geographic photographer, former Media Experiments Fellow at Stanford’s Hasso Plattner School of Design (the d.school), and Founder and Chief Creative of Amplifier, a nonprofit design lab that builds art and media experiments around technology, cultural and social justice movements. Amplifier’s experiments are built on open source art, and the human centered design process, with the goal of realizing new possibilities when analog and digital technology merge. Huey’s combination of art and storytelling have resulted in the creation of Amplifier’s global art phenomenon “We The People” with collaborator Shepard Fairey, the Sherpa Photo Fund, and a recent series of Pre-Colonial History and Cultural Heritage lessons in virtual reality, and now metaverse spaces, that will become part of K-12 curriculum across the U.S. His Bear Ears National Monument VR experience won the 2019 Webby for best VR Interactive Design.
Contents
Framework
TechnologyLearningsArchive
Framework
The Challenge
As a Starling Lab 2022 Journalism Fellow, the goal of my project was to help answer the question “How might we design for authenticity?” through the creation of a 12 week class this past summer. To be more specific, we were building a curriculum to answer the question : “How might we visualize that design so that we can share this process for broader understanding and adoption?”
The basis for our work was the Starling Framework for authenticity and its three stages of capturing, storing, and verifying information. The guiding principle at The Starling Lab is establishing “provenance” as the backbone of authenticity and integrity of digital content. To do so, the Lab follows a three-step framework:
- Capture: Starling prototypes mobile apps and camera firmware to authenticate digital content and metadata at the point of capture.
- Store: Starling researches how advanced cryptography and decentralized networks can securely distribute and store content over time.
- Verify: Starling experiments with immutable ledgers to register digital content, enabling experts to audit, or verify, the provenance and authenticity of that content.

Journalism has always been the ground we stand on, but in recent years the terminology “post truth world” has been circulating more and more. It has felt harder and harder to verify information with the exponential growth of image creation on mobile devices and social media “news.” And harder still as Artificial Intelligence image creators have become publicly accessible this summer. Journalism, the very definition of verified information, and more specifically journalists, need new tools in this fight to root us back in reality.
We are currently on the cusp of a “new internet,” commonly referred to as “Web 3.0,” and with this moment comes an opportunity to redesign systems to protect not only our personal data but also the integrity of our news media; truth itself.
The Prototype
To unpack this framework and realize its potential, we tasked a class online that included Stanford undergraduates and high school students from around the world with distilling the information and answering these questions to create a toolkit. The audience for the toolkit was to be a range from middle school students to their peers in high school to their not-so-tech-savvy elders. But (!) there was also always the potential that their learnings and final project could be the first draft of a training tool for media professionals, tech teams at media companies, educators, and really anyone who teaches or touches media in this rapidly evolving world of mis and dis-information.
Coming from my background as the founder of Amplifier.org, and from a decade of building viral campaigns for grassroots movements around the world, I know that visual storytelling is most effective when it is both simple and visually striking, especially when attempting to explain complex tech tools and tricks.

And as a journalist myself, having covered issues around the world for National Geographic, the New Yorker, the New York Times, and many more, I know that our information is both more valuable and more in danger than at any point in my life.
Deepfakes (made with Artificial Intelligence) and what are called cheap fakes (photoshop or similar simple manipulations) are seen more and more in the spread of purposeful dis-information. But equally as dangerous are the false stories and out of context images spread as social media “news” by populations deep in the trenches of political and culture wars around the world. Add to that the venture capitalists who see big profits ahead in AI generated content and editing, and the public appetite for new text-to-image and text-to- video capabilities and we have the perfect formula for a world where we will soon no longer know what is real and what is not without further proof of the images source and alteration history.

Contents
Background
Framework
Technology
LearningsArchive
Technology
Working with Ann Grimes from the Starling Lab, I was tasked to help lead our class of students Over the course of eight weeks – they analyzed the various types of mis-and disinformation in both traditional and social media. They also analyzed the Open Source players operating to combat the problem, learned the foundation of the Starling Framework, and themselves began building tools to help others better understand both the threat and the response.
Not unlike what happened at the dawn of the dot.com era, the story of “Web 3” has eluded most of the public and, to be honest, most professional journalists as well! So to keep things from getting too abstract we developed a context for upstream and downstream solutions. From downstream failures like social media platforms debunking bad content in a game of "whack-a-mole." To downstream regulation, which was also not working. We then shifted our vision upstream. We asked: What if we used new "web3" cryptographic and decentralized technologies to tackle the problem and authenticate digital content from the beginning, from the “capture” stage?
But before any of that, and to navigate this project we had first to translate potentially confusing terminology and basic concepts, as well as creating a simple map of the journey from “Web 1” to “Web 2” to “Web 3.” Clear and simple visual storytelling on this topic has been needed for some time now, especially because what's out there now is written way above most peoples’ heads. All of this needed to be in layman's terms that could be understood by anyone.
Beyond finding and refining strong visual tools we needed students to place them in a narrative story arc, identifying and then answering the questions that arose.
We first introduced key concepts and terms: metadata, provenance, cryptography, decentralized networks, blockchain - and more. We considered hope vs. hype. We talked about “opportunity costs.” What would journalism lose if – as at the dawn of Web 2 – media professionals were too slow to understand and run with the opportunities the Internet opened up? (Only to get run over by the platforms).

Once the base level of understanding was secured with a set of visuals, we turned to explaining and mapping the Starling framework and new journalistic best practices for capturing, storing, and verifying digital content. We began with a “Lit Review” of readings and studies and then built a library of existing infographics of previous attempts to explain the difficult terminology of Web 3 and the history of the internet. Students also created their own attempts at distilling and simplifying this information and added to that database.




We also invited speakers, leaders in their fields, all of whom took different approaches but who shared similar solutions in their quest to show provenance. These lectures and testimonials from working professionals like Adobe’s head of Advocacy and Education Santiago Lyon, Starling Lab Founding Director Jonathan Dotan, and The New York Times R+D Lab’s Deputy Director Scott Lowenstein, all of whom made the story clearer and its relevance more obvious and real.
Contents
Background
Framework
Technology
LearningsArchive
Learnings
The narrative arc of the class turned our weekly lessons into the roadmap for the student’s final video. That 10-minute video, the final output of this journey, is the first draft of an educational curriculum that can be evolved into a tool for widespread distribution at the collegiate and high school levels, and to media professionals around the world. The students worked as a team, having a wide range of skill and knowledge, and in the end were able to deliver a usable lesson by distilling down all that had been learned over the 8-week course.
http://starlinglab.org/wp-content/uploads/2026/01/Designing-for-Authenticity-10-minute-version-.mp4
The Challenges (the feedback) and Lessons Learned:
Part of the learning journey of this project was acknowledging that some things just cannot be simplified as much as we would like, and that the steps to complex interactions and transactions are just that: COMPLEX! But…the narrative we use to carry those complex steps, and the way we talk through these visual lessons determines how engaging and clear those lessons are. Still, common feedback from reviewers of the students’ project was that this video is a great output from a group of high school students (and could be shared with other highschool students as such), but is not necessarily something we'd want to yet put out publicly or share with others as a representation of Starling. Visuals let down the overall presentation by looking too cheap and pasted-in. Feedback from reviewers didn’t express an "aha" moment. Several said the video could be more relatable if we included more real-world scenarios so people understand how this affects their daily lives. (To review specific feedback we received, click here: )
It is clear that there remains an unfulfilled need among our professional colleagues on both the media side (who need to understand the tech better), and the tech side (who need to better understand media and its needs as an important use case of blockchain extending beyond crypto).
What did the students learn?
Some parts we know translated quite clearly. Concepts such as metadata, provenance and distributed networks and are ones that they will never forget. Other, more technical concepts that might be familiar to “power users” – public and private signature keys, for example – need more work and resulted in incorrect visuals.This is actually extremely helpful because we now know now where to refocus, provide more context, and double down on creating an even more simplified explanations for these sections.
We implemented the Stanford d.school methodology of “flaring” – then “focusing.” We went wide first exploring the large open source ecosystem. How did each of them tackle the “authenticity issue''? We then drilled down to a shared concept (provenance) and then identified specific technologies – not all of them, mind you – but key tools that are being used to “design for authenticity” in the media field.
It took us some time but collectively we isolated the question: “How Might We Design for Authenticity? That served as an organizing principle and helped us focus. It also helped us identify and explain which new technologies we could explore in answering that question. We discussed real use cases: The New York Times provenance project, for example. And the students learned by doing. The key way we measured their learnings and success was by their “deliverable,” the 10-minute video discussed above.
What’s next?
The ultimate goal of this project is a professionalized visual output that can serve as a curriculum for technologists, academics, teachers and students; from future fellows to graduate students, undergrads, high schoolers and middle schoolers in media and tech literacy courses. And of course, the journalists on the ground that are gathering the data that is so in need of protection, and the media companies they work for, who create and distribute that news.
It is clear that we need to break this complex ecosystem down even further and show without a doubt how an upstream technical solution is an effective way to dig ourselves out of the disinformation hole. We can create a series of “modules” that unpack more complex key concepts and In the final output we need to build in more specific real world examples - based on Starling Lab case studies, and we may need to include clips of interviews with Starling Fellows explaining how to use the tools.
Using the student video and visuals as the foundation, and after integrating the feedback from web 3 experts and media professionals, we plan to develop a curriculum for professional development and training introducing practitioners to the tools and technologies offered by cryptography and decentralized protocols, all of which can be used to better authenticate digital content.
This project also provided an opportunity to expand this teaching experiment - which the Starling Lab team did, building on the learnings and offering a fully accredited course for Stanford undergraduate and graduate students during the winter 2024 quarter. The class was well-received and will be expanded and offered again during the winter 2025 quarter. This provides an opportunity to build out a bigger, bolder, Stanford level curriculum directing students toward research that integrates the evolving questions on both the media and journalism front, and match that with the ever-evolving tech (because, as we all know, it will continue to change. Through all the next iterations the same question will be at the heart of the project: "How Do We Design for Authenticity?"
What to Get Right First

What to Get Right First
By learning from the shortcomings of its predecessor, Web3 can avoid repeating the same mistakes and create a more just digital future.
Rebecca MacKinnon, Starling Fellow 2021
Reading Time: 5min
![]()
History, Journalism
Share
Background
In this piece, our 2021 fellow, Rebecca MacKinnon, lays out a crucial guide for the Web3 community, urging it to learn from the human rights failures of Web2.
At Starling Lab, we believe it's essential to proactively build a more equitable internet. Rebecca contends that the concept of "technological neutrality" is a fallacy that allowed Web2 to amplify societal biases, and calls on Web3 to instead embed human rights considerations into the very fabric of its technology and governance. To achieve this, she advocates for a proactive approach to risk mitigation, a critical examination of business models, and the establishment of robust grievance mechanisms from the outset.
Highlighting the importance of collaboration, this article stresses the need for the Web3 ecosystem to work with civil society and other stakeholders to identify blind spots and ensure accountability, ultimately making a compelling case for a more responsible approach to innovation.
The Single Paved Line Threatening the Achuar People

The Single Paved Line Threatening the Achuar People
How the Achuar community of Copataza is defending their ancestral territory against the illegal logging, cultural shifts, and economic forces that accompany a new highway.
TeamReading Time: 5min
![]()
Prototypes
Share
Contents
Background

Fellowship Projects and Awards
Background
In the heart of the Ecuadorian Amazon, the Achuar people – a community of approximately 6,000 – have long stood as guardians of their ancestral territory. For the Achuar, the jungle is not merely a resource; it is the foundationthat molds their culture, spirituality, and daily survival. However, this deep connection is under constant siege. Since the Spanish conquest, the vast resources beneath the Amazon floor have attracted juntas and corporations alike, subjecting Indigenous peoples to a centuries-old fear of expulsion.
Today, this chronic scramble for resources has manifested as deforestation, violent standoffs, and encroachments by multi-billion dollar mining operations. In partnership with Protocol Labs and the USC Shoah Foundation, photojournalist Pablo Albarenga traveled to the community of Copataza in March 2020 to document this collision of worlds—capturing the Achuar’s traditional way of life juxtaposed against the scars of industrial expansion.
The Achuar face a dual threat. The first is physical: the illegal entries, mining projects, and construction of roads that divide their land and invite illegal logging. The second, however, is digital and systemic.
In an era increasingly defined by deepfakes, AI-generated imagery, and the manipulation of digital context, the burden of proof for marginalized communities has become heavier than ever. When Indigenous communities like the Achuar attempt to hold state and corporate actors accountable in the court of law – or the court of public opinion – their visual evidence is often met with skepticism or dismissal.
How can a community prove, beyond a shadow of a doubt, that a specific event happened at a specific time and place? How do we ensure that the digital testimony of the Achuar preserves the same forensic integrity as physical evidence? The challenge was not just to take photos, but to create a chain of custody so robust that the truth of the Achuar’s struggle could never be denied.
Contents
Framework
Framework
The Challenge
In the remote Amazon, the distance between an incident and a courtroom is measured not just in miles, but in data integrity. For the Achuar, documenting environmental crimes like illegal logging or unauthorized road construction is only half the battle. The greater challenge is proving that these images have not been manipulated. In a legal landscape increasingly wary of digital fabrication and "deepfakes," standard metadata is no longer sufficient. To admit visual evidence into a court of law to defend land rights, the Achuar needed a tool that could guarantee the provenance of every pixel.
The Prototype
To bridge this gap, Starling Labs equipped Pablo Albarenga with the experimental "Capture" framework. This prototype moved verification from the editing room to the moment of capture.
Pablo utilized a modified workflow involving a Canon camera tethered to an HTC EXODUS 1 smartphone. This device was selected for its Trusted Execution Environment (TEE), allowing the team to leverage the Zion Vault: a hardware-backed key management platform that generates and stores keys in an isolated area of the processor, independent of the Android OS.
As he documented the mining operations and anew road in Copataza, the framework cryptographically hashed each image immediately upon capture. It did more than save a photo: it sealed the image with a hardware-generated signature unique to the device, binding it to authenticated sensor data (GPS, time, and orientation).
This process created a tamper-proof chain of custody anchored in the decentralized web. Content was cryptographically hashed and stored using the InterPlanetary File System (IPFS), ensuring data is content-addressed rather than location-based. This allows independent reviewers to verify the integrity of the files against their original hashes.
Contents
FrameworkLearnings
Learnings
While the deployment successfully proved the concept of authenticated capture, the field test in the humid, politically charged environment of the Amazon revealed critical tensions between technology and human reality.
The Friction of Verification
Forensic rigour often came at the cost of journalistic agility. The prototype workflow introduced significant friction. Pablo reported delays caused by the app requesting confirmation for every photograph taken. To operate discreetly and avoid drawing attention from authorities, Pablo resorted to dimming the screen and using the lower volume key to trigger the shutter, but the software's mandatory confirmation steps made capturing fleeting or stealthy moments in batches nearly impossible. Furthermore, the hardware struggled with the environment: the phone used for hashing frequently overheated after just minutes of video recording, and the app suffered from stability issues that disrupted the workflow.
The Paradox of Privacy
The most profound learning, however, was ethical. The very feature that made the evidence legally powerful – immutable GPS and metadata logging – posed a severe security risk. In a conflict zone where activists are often targeted, carrying a device that broadcasts an unalterable record of one's exact location can endanger both the journalist and their subjects. Pablo noted that the inability to selectively toggle metadata collection meant that protecting the "truth" of the image potentially compromised the safety of the people in it. This feedback highlighted a crucial need for future iterations: the ability to balance forensic transparency with the human need for obscurity and safety.
Contents
Dispatch
Pablo Albarenga’s Dispatch from the Field
It was barely dawn. The first rays of sunshine already tempered the thick foliage of the forest, causing the moisture accumulated during the nightly rains to rise above the treetops creating a heavenly landscape. There, where the Andes Mountains meet the Ecuadorian Amazon, a unique biome is born that constitutes one of the most biodiverse areas of the planet. The Shuar and Achuar Indians also live there, protagonists of the most recent demonstrations that shook Ecuador for several days, when thousands of Indians took the city of Quito, in October 2019, after the promulgation of a new decree with several economic adjustments agreed with the International Monetary Fund. The protests ended with several deaths and a popular victory.
At that time, the media focused attention on the rise in fuel prices, but this was just one more in the long list of demands brought by the native peoples. For them, the main demand revolved around the defense of ancestral territories, threatened by oil companies, logging companies and projects that did not respect the prior consultation of indigenous peoples regarding decisions involving their territories.
While the city was occupied, a new road promoted by the local government of Pastaza province, opened up the jungle inland, towards the Achuar community of Copataza, establishing the first road between it and the city of Puyo, putting the whole community to the test in the face of this new challenge. At that time, in spite of being still under construction and barely paved, several wood logs could already be seen piled up, waiting to be transported and then processed for export. Illegal loggers were the first to land when it came to a new road, as they were a profitable way to access wood that was well priced on the international market.
Just a few weeks ago, the road finally reached its destination, drawing a winding line that unites, but also divides. "I told my wife that we will only be united until the road arrives (...) and that is coming true," says Aurelio, one of Copataza's elders, as he comments on the different changes the community - founded on the youth of its parents, former nomads - has faced since the road arrived.
The sound landscape of Copataza is no longer the same, the sounds of machines working on the road intermingling with the songs of birds, insects, and chainsaws. The old landing strip that was the only fast way to the city of Puyo is now an abandoned field, crossed by the new road. Two cargo trucks are filled with balsa wood, extracted from the islands surrounding the community, belonging to the Achuar territory. Unknown people, who do not belong to the community, move freely, load and remove the wood.
Since the beginning of the project, the Achuar have analyzed the consequences that it could bring, as well as the advantages. In Wayusa hours, a traditional drink of the Achuar, with high concentrations of caffeine, used to purify themselves in the early morning and to discuss important matters, they decided to give a definitive yes to the new road, with the condition that their community be the last point reached by it.
The youngsters are enthusiastic about the benefits that the road promises, since it would allow them to sell their products in a simpler way, as well as to access the city at a low cost, mainly in case of emergencies. In the past, the only access to the city was by an expensive flight or a hard walk: "Before, one would walk for five days through the jungle to get to the city," says Julian Illanes, one of the former leaders of the Achuar Nationality of Ecuador (NAE).
It is unquestionable that the road has advantages for the community in terms of access to the city. The problem is that it also allows the arrival of outsiders interested in the natural resources and, once money becomes a necessity for the community, the fastest way to get it is to sell these resources.
"Now many of us are seeing the economic need. Everyone says 'out of necessity I do this'. Before there was need, but not so much. Before just having the clothes and the machete was enough," says Aurelio.
From the NAE, they clearly see how timber extraction poses a threat to the community, not only in terms of resources but also in terms of culture. "The first impact we face is the logging companies. We have many ideas and alternatives, but the logging company has been quicker to offer people money." Tiyua Uyunkar, President of the NAE, says: "We are following the necessary parameters for the total blockade of these companies, but the Ministry of the Environment has done absolutely nothing. It has said that this commercialization of balsa wood does not have so many restrictions, so we have not had support".
Formerly nomads, hunters and fishermen, today they are settled in a fixed territory. The traditional houses, with their wooden base and thick palm leaf roofs are gradually being replaced by new houses with sheet metal roofs, less insulating during the extreme heat of the Amazon summer and noisier in the rainy season. Agriculture is still practiced in the gardens, but hunting and fishing are gradually disappearing, to be replaced by products brought from the city. "Before there were enough fish to go around, not five or three, but two or three baskets, full. Today it is scarce, just like hunting. Before it was all jungle (...) now I see that in the Pastaza River there is so much motor canoe traffic making a racket every 10 minutes. It scares the fish away," says Aurelio.
Immersed in a rapidly changing context, culture is that intangible territory that absorbs the collateral damage of economic interaction in indigenous communities. This is where traditions are modernized and new ones are imported, but people go on with their lives. However, the greatest threat refers to the bond with the forest, with the territory. From here on, only time will tell whether this remains that common good indispensable for the sustenance of life or simply a finite resource capable of being quickly turned into money; for a short term.








