Introduction
Archives and records repositories around the world hold vast collections of paper documents, photographs, and other media that are not easily searchable or accessible in digital form. From historical manuscripts to typescript records, much of this material can only be accessed by physically browsing or reading through it, which is labor-intensive. Today, artificial intelligence (AI) offers new ways to bridge this gap. Technologies like Optical Character Recognition (OCR) for printed text and Handwritten Text Recognition (HTR) for scripts can automatically transcribe scanned archival documents into text, making them keyword-searchable and easier to access. Crucially, these AI tools are most effective when used with a human-in-the-loop approach – archivists and information professionals guiding, training, and correcting the AI to ensure accuracy and authenticity. This article explores how AI is being used to enhance archival access, with real-world examples from the National Archives of the Netherlands and other initiatives, and emphasizes the importance of human oversight, standards, and best practices in these innovations.
AI Transcription in Action: The Dutch National Archives Example
One pioneering example comes from the Nationaal Archief (National Archives) of the Netherlands. Facing the challenge of a massive paper collection (stretching over 140 kilometers of shelves), the archive launched an ambitious digitization program to scan and transcribe millions of pages. The Dutch National Archives plans to scan about 10% of its holdings in the next 15 years – over 100 million pages – and use AI-based handwriting recognition to make these digital images text-searchable. The first phase of this project focused on 3 million pages of historical records (including 17th–18th century Dutch East India Company documents and 19th century notarial deeds), which were automatically transcribed using HTR technology. By converting handwritten pages into machine-readable text, the archives aimed to radically improve accessibility for researchers and the public.
The National Archives used AI-powered handwriting recognition to transcribe over 3 million pages from 17th–19th century records, jump-starting digital access to their collections. Human archivists provided training data and validation, ensuring the AI’s output remained accurate and trustworthy.
Keep reading with a 7-day free trial
Subscribe to Andrew Potter to keep reading this post and get 7 days of free access to the full post archives.