February 25, 2014 19

Screen-shot of NARA website.On 31 January 2014 the US Government’s National Archives and Records Administration (NARA) issued updated Transfer Guidance, replacing the previous 2003 document. The new Bulletin is substantially overhauled, with greater emphasis on standards in general, and standardized file-formats – most notably PDF – in particular.

While each agency does things their own way (hey, this is America!), NARA sets general standards for records-keeping in the US federal government. NARA’s Transfer Guidance is not required government-wide, but it does set normative requirements for federal agencies transferring their most important records for safe-keeping. NARA’s specifications go beyond document formats to encompass creation processes, workflows, transfer media, and so-on.

Probably the biggest change in the new Bulletin is the overhaul to PDF-specific requirements. First, instead of a long list of quality criteria, NARA’s new guidance simply directs agencies to meet “applicable standards” in creating their records. Secondly, in the case of records in PDF format, NARA lists specific ISO standards for PDF technology.

The 2003 Transfer Guidance

Since PDF/A (the archival subset standard for PDF) was not published until 2005, NARA had little choice for PDF records in 2003 but to select Adobe Systems’ PDF version 1.4. Not satisfied with the baseline technology, NARA added specific requirements – such as for embedded fonts – that were destined (not surprisingly) to become part of PDF/A in any event.

Indeed, most of the PDF-specific advice offered in 2003 pertained to matters of quality in records-creation and workflow. More advanced PDF features – forms, embedded files, and more – were to be considered on a “case by case” basis. OCR, on the other hand, was required, although certain types of OCR output were prohibited.

While useful, in practical terms the 2003 Transfer Guidance did not solve a lot of problems for NARA. The agency was (and is) still forced to deal with vast quantities of poor (or simply bizarre) files created by thousands of pieces of software of various capabilities and quality.

The 2014 Guidance

The new 2014 Transfer Guidance makes a number of changes specific to records in the Portable Document Format.

The first big change is that the Bulletin itself is now divided into the Bulletin itself and Appendix A, the set of file-formats NARA prefers or accepts, which makes it easier for NARA to update listings or add formats in the future without the some level of interagency review. For example, PDF/A-2 is (today) conspicuously lacking from the accepted formats for Digital Posters, I’d expect that to be changed quite quickly. When PRC’s ISO standardization is complete, the PRC entry in the 3D section is likely to reference the ISO version of the standard rather than Adobe’s.

NARA now recognizes many specific types of content ranging from documents to data files to video. Whereas in 2003 NARA did not specify format acceptability by class of records, in 2014 various PDF standards are now identified as “appropriate” for certain uses, inappropriate for others. As of February, 2014, the content categories in which NARA will accept PDF files as either “Preferred” or “Acceptable” are as follows:

  • Computer Aided Design (CAD)
    • PDF/E
    • PRC (Adobe’s specification). Note that ISO-standardized PRC is in the final publication stages.
  • Scanned text
    • PDF/A-1 and PDF/A-2
  • Digital posters
    • PDF/A-1
  • Presentation formats
    • PDF/A-1 and PDF/A-2
  • Textual data
    • PDF/A-1 and PDF/A-2
    • PDF 1.7 (ISO 32000-1)
    • PDF 1.0 – 1.6

As expected, PDF/A-3 was not included for the simple fact that the format allows arbitrary attachments. While appropriate in many commercial settings, NARA elected not to open that particular barn-door at this time.

The previous workflow requirements regarding OCR have changed. OCR is no longer required – that decision is left to the agency creating the record. While image-resolution specifications remain, scanned page conversion is permitted so long as the process meets “…standards appropriate for the accurate preservation of the original image.” NARA’s current guidance for scanning needs a bit of updating.

It’s also worth noting that compared to the 2003 Guidance, NARA removed the 2003 prohibition on using PDF to archive email. Those with old Lotus Notes email, and other systems that are highly resistant to archiving for various technical reasons, may want to take note.

Of special note to software developers, the new Transfer Guidance includes a firm requirement that files be valid according to their file format specifications (my emphasis):

Screen-shot of text segment. Underlined text: In all cases, agencies must ensure permanent electronic records are valid according to the file format specifications....

On the one hand, this might seem unremarkable. Of course, all our software should make valid files, right?

On the other, consider that as of today, no canonical validation software actually exists for PDF, or (indeed) for the other file types NARA will accept.

Perhaps CSV files are an exception….