Andrew Potter

Andrew Potter

America Built a Catalog. It Forgot to Build the Archive

The federal government's flagship open-data platform points users to information but offers few guarantees that the information will remain available, authentic, or explainable tomorrow.

Andrew Potter's avatar
Andrew Potter
Jun 02, 2026
∙ Paid

The Congressional Research Service published a new report this week, and it deserves more attention than it will probably get outside the government information community. CRS R48954, Data.gov: Implementation and Perspectives on Its Functions (May 21, 2026), is the most comprehensive policy-level examination of Data.gov since Congress enacted the OPEN Government Data Act in 2019. It is thorough, carefully hedged, and alarming if you read it with a records management lens rather than a technology policy lens.

This piece uses the report as a jumping-off point for a deeper conversation about what Data.gov actually is, what it was designed to do, what archival and records theory tells us it cannot do, and what the events of the past eighteen months have revealed about the gap between the two.


What the CRS report actually says

The report’s central finding is deceptively simple: Data.gov is a directory, not a repository. In the language of the statute that governs it, it is “a single public interface online as a point of entry,” a finding aid pointing to data assets hosted elsewhere, primarily on individual agency websites. The General Services Administration maintains it day to day. The Office of Management and Budget exercises effective control over its implementation and issues guidance to agencies.

The report traces Data.gov’s development through three overlapping periods: administrative creation under the Obama open government initiative (2009-2019), legislative deliberation and eventual statutory enactment via the OPEN Government Data Act (2015-2019), and implementation of that statutory framework (2019-present). Each period left sediment that still shapes the site’s operation.

Several specific findings stand out.

The definitional gap between statute and implementation guidance is significant. The OPEN Government Data Act defines “data asset” broadly: “a collection of data elements or data sets that may be grouped together.” But OMB’s implementing guidance, Memorandum M-25-05 (”Phase 2 Implementation of the Foundations for Evidence-Based Policymaking Act of 2018: Open Government Data Access and Management Guidance”), issued in January 2025 after a six-year delay, interprets this more narrowly to mean data that is both structured (organized into columns and rows, or similar) and logically grouped. This narrowing matters because it may exclude FOIA-releasable unstructured information (documents, images, audio, video) from agencies' comprehensive data inventories and, therefore, from Data.gov entirely.

The six-year implementation lag on M-25-05 comes through carefully but unmistakably as a problem. Congress enacted the act in January 2019. Guidance arrived in January 2025. During those six years, agencies operated under the old OMB Memorandum M-13-13 while the statutory framework nominally governed them. GAO flagged this repeatedly. Senator Grassley wrote OMB a pointed letter in 2023. The guidance eventually arrived in the final days of the Biden administration.

The biennial compliance report that OMB is legally required to submit to Congress has apparently never been produced. The report notes this with characteristic CRS understatement: “OMB has apparently not complied with a corresponding requirement for the OMB director to issue a biennial report on agency performance and compliance.” GAO confirmed as of March 2025 that no update had been received. This is not a minor administrative lapse. It is the statute's primary accountability mechanism, and it has never functioned.

The registry-versus-repository question gets posed but left formally open, as CRS convention requires. The report makes clear that the current system is neither a well-functioning registry (because data asset availability cannot be reliably verified through Data.gov) nor a repository (because Data.gov does not host or preserve underlying data). It is more of a catalog of intentions: a list of what agencies have said about their data, harvested at a point in time, with no ongoing verification that any of it remains true.

It is more of a catalog of intentions: a list of what agencies have said about their data, harvested at a point in time, with no ongoing verification that any of it remains true.

The archival theory the report does not identify

User's avatar

Continue reading this post for free, courtesy of Andrew Potter.

Or purchase a paid subscription.
© 2026 Andrew Potter · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture