Inside the Black Box of PDF/A

Peter Wyatt’s Masterclass in Digital Preservation Clarity

Nov 03, 2025

Dispatch from Wellington

When Peter Wyatt takes the stage, you know you’re in for a masterclass in both precision and perspective. At iPRES 2025 in Wellington, the CTO of the PDF Association and a key link in the ISO standard chain for the Portable Document Format delivered a dense, fascinating tutorial that pulled back the curtain on how PDF/A really works.

I had missed Peter’s presentation at PDF Days in Berlin earlier this year, so I was pleased to see his name on the Wellington program. This time, I was determined not to miss it. What followed wasn’t just a talk about file formats; it was a lucid dissection of how preservation practice, software design, and international standards intersect.

The Architect Speaks

Wyatt began by removing the mystery around his credentials. He is an Australian engineer, the technical editor of ISO 32000 (PDF 2.0), and the point of contact for nearly every international conversation about the format’s evolution. “I’m the person who writes the spec,” he said. Yet his presentation was not about status; it was about translation—helping digital preservationists, archivists, and librarians understand the living ecosystem that sustains a 30-year-old format still central to the world’s documentary memory.

He positioned his mission plainly: align preservation practice with standards reality. PDF/A is not a mystical seal of archival purity. It is a series of pragmatic constraints layered over a complex and evolving file format. Wyatt’s message was clear: you cannot preserve what you do not understand.

From Digital Paper to Preservation Platform

Wyatt’s first point was deceptively simple. PDF is not a file; it’s a container. A random-access, object-oriented structure that encapsulates text, images, vector graphics, metadata, and even executable logic. In its early days, PDF was “digital paper,” the final form for print-ready documents. The PDF of 2025, he explained, is more like a self-describing digital ecosystem. It can hold 3D models, multimedia, embedded XML or JSON metadata, accessibility layers, and multiple content representations inside one file.

This is why the “A” in PDF/A—for “Archival”—matters. It tames the chaos, constraining features that could break reproducibility or make content dependent on external software. It is about locking in visual appearance, not meaning, not policy, and not authenticity. “PDF/A defines a static page visual appearance, and that’s pretty much it,” Wyatt reminded the room. “Everything else—records policy, metadata design, long-term access—comes from you.”

Retiring the Old Standard

Wyatt’s most provocative point was about PDF/A-1, the 2005 edition that refuses to disappear. Some archives and government policies, he noted, still mandate it, a practice he called “an indicator of old software, not good policy.” PDF/A-1 forbids transparency, drop shadows, and JPEG2000, all of which have been standard in digital content for two decades. Converting modern documents to it, he said, “dumbs down your content and distorts its authenticity.”

His plea was straightforward: stop treating compliance as a badge of virtue. Use PDF/A-4 (ISO 19005-4:2020), built on PDF 2.0, which integrates accessibility, embedded files, and richer metadata models. If your system still insists on A-1, he said, “you’re not reflecting the content as it’s written today.”

“Look for the Verbs”

One of the most useful takeaways was Wyatt’s guide to reading a standard like an engineer. “Look for the verbs,” he said. Requirements that say shall be present describe things in the file. Those that say shall ignore describe things the software must do. It is a simple heuristic that explains why a valid file might render incorrectly. It is not the file that is wrong; it is the viewer.

This distinction—between file-level compliance and processor behavior—threads through much of the confusion about PDF/A validation. Wyatt was clear: viewers are not validators, and validation is not interpretation. “There is no one true PDF/A,” he said. “Two different files can both validate, display differently, and still both be correct.”

The Metadata Maze

If the tutorial had a heartbeat, it was metadata. Wyatt walked through the long evolution from PDF’s primitive “Info Dictionary” (key-value strings like Author or Title) to the XMP standard (ISO 16684-1) that underpins structured metadata today.

PDF/A-1, he noted, forced users to embed schemas for every custom field, inflating file sizes and complexity. By PDF/A-4, ISO had relaxed that rule: “you should include a schema,” not “you shall.” It is a subtle shift with major implications. Institutions can now manage metadata schemas externally, refer to them via RELAX NG, or bundle them in PDF portfolios that act like ZIP archives with descriptive layers.

For Wyatt, this is where archivists should focus: metadata richness, not minimal compliance. “AI doesn’t care about what you see on the screen,” he said. “It looks for structure—headings, lists, tables, relationships.” That is where the next generation of preservation intelligence lies.

Validation, Versions, and Other Fictions

Wyatt also shared some illuminating data from his own testing. In a sample of one million PDFs, 20 percent contained mismatched version indicators—files claiming to be “PDF 1.4” but containing 1.5 features. Yet nearly all opened fine. Why? Because most software ignores the declared version and parses whatever it finds.

His point: version numbers lie. Preservation workflows that rely on them risk false assumptions about what is inside a file. The only real truth, he suggested, is in the features—and in the independent validators that can detect them.

Preservation as a Living Standard

The closing section of Wyatt’s presentation turned from technical detail to policy pragmatism. Every part of the ISO 19005 series, he noted, will remain valid forever. None will be withdrawn because preservation requires preserving the standards themselves. But the ecosystem keeps evolving.

He hinted at the forthcoming PDF/A-4 amendment (2025–2026) that clarifies naming and conformance levels, and ongoing alignment between PDF/A, PDF/UA, and PDF/X to ensure that accessible and printable files can coexist. It is an elegant vision: one file, many standards, long-term trust.

The Takeaway

Peter Wyatt’s iPRES presentation was not just a deep dive into a file format. It was a call to action for archivists to update their thinking. He made a persuasive case that digital preservation is as much about literacy as longevity: understanding the structure, evolution, and semantics of the technologies we depend on.

“There is no single ‘true’ PDF/A,” he concluded. “What matters is that your software and policy environment produce files that validate, render, and remain intelligible.”

It is hard to imagine a clearer mission statement for twenty-first-century digital preservation, and I am very glad I finally caught it in person.

Andrew Potter

Discussion about this post