Archivists: No flowers for PDF/A-3

The National Digital Stewardship Alliance (NDSA), a consortium of archivist institutions committed to the long-term preservation of digital information, has published an important new paper analyzing PDF/A-3 from the archivist’s perspective. This post provides some brief analysis and comment.

Screen-shot of the cover of the NDSA report.

What is PDF/A-3?

While PDF/A-2 is an update to the PDF/A-1 based on ISO 32000 instead of Adobe’s PDF 1.4 – along with other important changes – it represents no fundamental challenge to archivist’s values.

But PDF/A-3 is different. Responding to commercial demands for an archival document format that could also serve as a container for associated (and possibly, non-archival) content, the ISO committee’s response was a single, very simple change, but one that has roiled the archivist community: the ability to embed arbitrary files in documents that are otherwise archival-grade.

The NDSA on PDF/A-3

The report characterizes the general problem of PDF/A-3 as the possibility that PDF/A-3 files may be used as a general-purpose bundling format irrespective of the relative significance of any given item of content, including the PDF/A-3 document itself.

While acknowledging the value of PDF/A-3 for commercial purposes, the NDSA report calls for specific protocols between depositors and archival repositories. PDF/A-3 should only be considered, they say, when workflow and protocols guaranteeing understanding of the relationship between the PDF document and any embedded files are fully established and documented.

It should be emphasized that NDSA is not opposed to processing embedded files, understands the value proposition and wants PDF/A-3 processors. They say, for example:

The requirement for file specification dictionaries with required relationship values in those dictionaries will make associated files embedded in a PDF/A-3 more obvious and discoverable through generic PDF/A-3 viewers.

Nonetheless, the report concludes with a recommendation to software developers to provide tooling allowing users to identify cases where PDF/A-3 metadata is unnecessary (either no embedded files at all or all embedded files are PDF/A). This is, essentially a way to “just say no” to PDF/A-3.


The NDSA report points out that: “There is currently no robust vendor-independent mechanism for assessing that a PDF/A file does in fact comply fully with the standard and the conformance level it claims in its internal metadata.”

The authors go on to note that a canonical PDF validator is highly desirable (and not only for PDF/A-3) because it would mitigate concerns over PDF’s complexity with respect to its use as a container format, reduce the volume of files bearing invalid metadata and result in higher file quality and reliability overall.


Quite rightly, archival institutions are afraid that users may use a PDF/A-3 file as, effectively, a cover-note for a garbage bag full of who-knows-what. The report considers that PDF/A-3 may be “appropriate for use in controlled workflows….” That’s good, because there’s no reason in principle why PDF/A-3 implementations (and their embedded content) can’t be designed in good-faith with archival considerations in mind. Archivists should allow for such cases.

There’s general recognition that a bundling format is needed, and the report spends some time on vital characteristics for such a format, such as those described in ISO/IEC 21320-1. These features may be interesting considerations for a future version of PDF.

In general, however, this report is intended to ensure that archivists understand that PDF/A-3 is emphatically not a newer, faster, stronger PDF/A, and that’s entirely correct. They should not, however, conclude that PDF/A-3 is mistaken or ‘wrong’; it simply has its place.

PDF/A-3 Policy: A Recommendation

For memory institution purposes, where embedded (non PDF/A) PDF files are concerned, I’d propose requiring that non PDF/A embedded files must also always have a companion embedded file that is a PDF/A conforming rendition.

Example: let’s assume we have a PDF with 3D in it. It seems to me that archivists could accept a PDF/A-3 container in such cases if a PDF/A conforming rendition of that non-PDF/A embedded content (which would have to be a static poster image for the 3D model) was also provided.

Reality will never be under full control, but some of it must be archived anyway. Such a policy would ensure the best of both worlds (and until we have PRC/A or U3D/A or PDF/E-2 this is the best one could get).

Let’s look at email

Email software comes in many flavors; there’s no canonical way to do it. Some emails appear very differently on different systems, and so on.

Archiving email is nonetheless important, and may have to be done despite any obstacle. Hanging onto the email bitstream is doomed to fail at some point in the future. Just keeping PDF/A renderings (effectively, digital printouts) may fail to preserve every aspect that may seem relevant 50 years from now, but it’s a lot better than nothing. Preserving a robust PDF/A-3 digital printout with the original email bitstream embedded may be the best outcome one could reasonably and cost-effectively achieve in the foreseeable future.

PDF/A-3 may be extremely useful in this regard, or in other cases where archival perfection is simply unrealistic. As such, prohibiting PDF/A-3 seems short-sighted. Instead, a policy of carefully-designed workflows that account for – and even leverage – PDF/A-3’s capabilities seem not only worthwhile, but might even lead to highly cost-effective solutions for many commonplace archiving problems.


  1. March 3, 2014 at 11:29

    The bottom line, from one archival (digital preservation) perspective is that, however compelling the need for a “possibly non-archival” PDF flavour, PDF/A-3 simply shouldn’t be termed ‘A’ if it’s not wholly archival. Period.

    This is a slippery slope: once you get started on the “commercial demand” for “possibly non-archival” PDF flavours, where’s the end game from the digital preservation side? It hugely increases, IMHO, the likelihood that “commercial demand” will overwhelm the smaller and less well-funded archival community.

    Why not rename PDF/A-3? For example, PDF/C-1 (‘C’ as in “commercial”).

  2. October 8, 2014 at 18:25

    Thanks for the comment, Fred. Sorry I’ve taken so long to respond.

    I don’t agree. Digital preservationists are already bombarded with non-archival content. PDF/A-3 doesn’t change that – of course – but it provides another means of managing the fire-hose. As my email example was intended to highlight, PDF/A-3 can serve archival needs without guaranteeing archival characteristics for the embedded content.

    The “archival community” doesn’t exist apart from commercial interests. Archivists only have something to archive (and end users only care about archivist’s work) as a function of activity (commercial and otherwise) that’s not oriented towards archival concerns.

    It’s my hope that archivists focus on policies for managing PDF/A-3 files rather than problematizing the format itself. Anything can (and will be) abused. It’s more interesting, IMO, to focus on how PDF/A-3 may be put to work to advance archivists interests.

  3. January 24, 2015 at 16:25

    I think another issue is that naive users of the format may incorrectly think that PDF/A-3 makes embedded proprietary formats accessible in the future. For example, a person might save designs in an embedded proprietary format, then 10 years later learn that he can not access them, for the company which creates the format no longer exists, or no longer supports the format.

    Don’t dismiss comments just because they seem “anti-business”. The killer feature of PDF/A is the open specification, which allows documents to be readable indefinitely. It would be a shame if users AND businesses are misled into thinking bundled proprietary formats in PDF/A will be easily readable in the future.

Leave a Reply