Algorithms, Archives, and the Layers We Usually Forget

Why algorithm preservation needs to come off the back burner

Nov 18, 2025

Every so often an idea that I have pushed to the edges of my attention makes its way back into the center. Algorithm preservation is one of those ideas. It has circulated through many of the discussions I have been part of over the years, moving through standards conversations, digital preservation debates, and records management concerns. I encountered it repeatedly during the SC 11 AI case study efforts, and it has surfaced in conversations with colleagues across Europe, Australia, and New Zealand more times than I can count.

A few observations and discussions at iPRES 2025 brought it quietly back into focus. Nothing dramatic. Just timely reminders that this problem continues to grow. Each conversation pointed toward the same reality. Algorithmic processes are now creating records, classifying them, routing them, and justifying them. Yet the question of how to preserve the full meaning of those processes rarely receives the systematic attention it deserves.

This article reflects a renewed consideration of why algorithm preservation keeps returning to the foreground.

What we talk about when we talk about preserving algorithms

Many people begin with code. In that narrow view, preservation means saving Python scripts or compiled modules so that the software can run in the future. That is one small part of a much larger system.

Algorithms are not only code. They are behaviors, assumptions, decisions, workflows, and organizational practices. They are social and technical constructs that operate within policy contexts and governance frameworks. A future auditor or researcher does not need the code as much as they need the logic and behavior the code produced when it was deployed.

Across preservation practice, scientific reproducibility, AI governance, and archival theory, the same conclusion appears. You cannot preserve an algorithm unless you preserve the processes that made it work.

Paradata and processual data enter the conversation

Different fields use different vocabulary, but the underlying concepts align closely.

Paradata captures the human decisions and interpretive choices that shape algorithmic systems. This is where Patricia C. Franks’ recent work is especially useful. Franks argues that paradata is essential for documenting how algorithmic processes operate and how people shape them. She distinguishes between system paradata and operational paradata. System paradata includes model design, version histories, vendor documentation, and configuration rules. Operational paradata includes deployment notes, workflow steps, approvals, risk assessments, and impact evaluations.

Examples of paradata include

design notes and modeling decisions
parameter justifications
calibration choices
governance and risk documentation
procurement files
system cards or model cards

Franks’ framework provides a clear way to articulate what needs to be retained so that algorithmic reasoning remains intelligible. Her work supports the idea that algorithm preservation is fundamentally a records issue because paradata covers the why behind system behavior. This understanding continues to influence how I think about the completeness and interpretability of algorithmic systems.

Processual data captures the machine side of the story. This is the evidence of what the algorithm actually did during execution.

Examples include

logs
workflow graphs
intermediate outputs
checkpoints
inference traces
branch selections
performance and error reports

Processual data documents the actual behavior of the system rather than the idealized behavior described in documentation.

Paradata and processual data together preserve the making and the doing of algorithmic systems. They fill the space between the record produced by the system and the code that enabled it.

Why this matters for records and information governance

When algorithmic systems are examined through the lens of archival standards, the preservation challenge becomes clear.

ISO 15489

Authenticity, reliability, and integrity all depend on the ability to reconstruct how a record was produced. Algorithmic outputs require algorithmic context.

ISO 23081

Metadata for processes, transformations, and system behavior is central. Paradata and processual data map directly onto the standard’s categories for contextual, process, and transformation metadata.

OAIS

Algorithmic behavior becomes part of the Representation Information that is required to interpret digital objects in the future. If the meaning of a digital object depends on algorithmic processing, that processing becomes part of the Archival Information Package.

Algorithm preservation is archival theory extended into computational environments.

Machine learning multiplies the preservation challenge

Traditional algorithmic systems are challenging enough. Machine learning systems add entirely new layers to the problem.

A meaningful preservation bundle for an ML system should include

model architecture definitions
training code
trained weights
hyperparameters
preprocessing and postprocessing pipelines
evaluation metrics
training datasets or descriptive surrogates
risk assessments, governance records, and testing documentation

A frozen model file cannot explain how or why a decision was made at a particular moment. Historical accountability requires the evolution of the model, the decisions behind its tuning, and the context of its training. Franks’ distinction between system paradata and operational paradata is particularly helpful here because ML systems generate complex documentation and decision trails that must be preserved. This continues to shape my consideration of what long term interpretability requires.

Preservation strategies in practice

Various strategies circulate in digital preservation, each addressing particular risks.

Migration can maintain usability but may alter behavior in significant ways.
Emulation preserves environments but not necessarily the intent behind them.
Containerization stabilizes dependencies but often obscures reasoning.
Encapsulation packages components but risks creating sealed boxes that future users cannot interpret.
Reproducibility bundles capture workflows but may leave out governance information.

All of these strategies become more effective when paradata and processual data are preserved as primary components. These layers provide meaning, traceability, and accountability. Without them, even a technically perfect preservation plan cannot answer future questions about behavior or decision making.

A practical model for an Algorithm Preservation Package

Drawing on standards work, Dutch institutional practice, reproducibility research, and the broader digital preservation field, a comprehensive model for algorithm preservation looks like this. Over time I have found myself returning to these elements whenever I consider what future users will need in order to understand algorithmic reasoning and behavior.

1. Algorithm artifacts

Code, binaries, containers, dependency manifests, and version histories.

2. Paradata

Design histories, parameter decisions, modeling assumptions, Franks’ system and operational paradata, governance documentation, procurement files, and risk assessments.

3. Processual data

Execution logs, workflow traces, checkpoints, intermediate states, inference records, and behavioral evidence.

4. Data dependencies

Training datasets, dataset profiles, reference data, and validation samples.

5. Contextual records

Policy files, testing results, deployment approvals, change control records, and related documentation.

The value of this package lies in the relationships among these elements. Each component clarifies and reinforces the others.

Dutch work as a conceptual foundation

The Netherlands has been building the ideas and infrastructure that support algorithm preservation for more than a decade. Dutch institutions approach digital preservation as a problem of behavior, context, and functionality, not simply formats and files. This makes the Dutch landscape particularly valuable for thinking about how algorithmic systems should be preserved.

DANS and workflow centered preservation

DANS takes the position that data cannot be meaningfully preserved without attention to software dependencies and workflow logic. Their preservation program emphasizes

workflow documentation
software and version dependencies
transformation histories
structured preservation watch across formats and tools
reproducibility as a long term preservation objective

DANS encourages researchers to deposit code notebooks, containers, workflow scripts, and documentation that explains how data were created and processed. This embeds algorithm preservation into research data management and treats computational pipelines as part of the intellectual record.

The Dutch Digital Heritage Network

The Dutch Digital Heritage Network supports a national ecosystem for sustainable access that prioritizes

interoperability across domains
preservation of complex and dynamic digital objects
documentation of functional behavior
shared national infrastructure for long term access

Their work on linked data, complex object packaging, and context rich metadata aligns closely with the needs of algorithm preservation. They provide the infrastructure necessary to link digital objects with the processes that produced or transformed them.

Digital art and media preservation

Institutions such as LIMA and Sound and Vision have extensive experience preserving digital and computational art. They regularly address

code based artworks
generative and algorithmic installations
evolving media environments
custom software and hardware dependencies

These communities treat algorithmic behavior as a cultural artifact that must be retained, whether through documentation, emulation, or reinterpretation. Their work parallels the challenges of preserving algorithmic behavior in administrative or scientific systems.

Dutch administrative and legal context

The Netherlands has conducted rigorous conversations about automated decision making in the public sector. Dutch administrative law stresses transparency, auditability, and proportionality. As algorithms influence benefits decisions, fraud detection, and resource allocation, Dutch policymakers and researchers have examined

version histories
training data sources
parameter settings
audit trails
explanations that accompany algorithmic decisions

This policy environment indirectly supports algorithm preservation because oversight requires reconstruction of logic and behavior. The same evidence needed for accountability is also necessary for long term preservation.

European reproducibility and provenance work

Dutch scholars have played significant roles in European projects on reproducibility and computational provenance. Their contributions support

preservation of computational environments
documentation of decision steps in workflows
interoperability between repositories and workflow engines

These initiatives strengthen the broader foundation needed for algorithm preservation.

Why Dutch work matters

The significance of the Dutch approach is not that there is a single algorithm preservation program. It is that Dutch institutions already treat the preservation of behavior and context as standard practice. Their infrastructure, policies, and collaborative frameworks provide practical models for preserving algorithmic logic in support of accountability and long term access.

Why algorithm preservation keeps returning to my attention

Algorithmic systems increasingly shape the records that governments, researchers, organizations, and communities depend upon. They produce decisions, interpretations, analyses, and classifications. If their behavior is not preserved, the meaning of the records they generate becomes difficult or impossible to interpret.

Algorithm preservation keeps returning to my attention because it underpins accountability, scientific integrity, administrative rights, and digital cultural memory. Pat Franks’ recent work on paradata gives us a clearer vocabulary and a more precise conceptual structure for understanding the layers that need to be preserved. Her framework reinforces the importance of documenting algorithmic processes in ways that maintain trust and transparency.

The more society depends on algorithmic processes, the more essential it becomes to carry their logic and history forward. Algorithm preservation is not a peripheral concern. It is central to how records and evidence will be understood in the future.

It is time to move algorithm preservation into the mainstream of archival and records management practice.

Andrew Potter

Discussion about this post

Ready for more?