It’s easy to understand that stable, reliable records-keeping is essential to business, government and society in general. When paper and its microfilm cousin were the only media for records the world was a simpler place.
Over the last 50 years government agencies have replaced filing cabinets with servers and printed documents with PDF files. Things got a little more complicated.
The US National Archives and Records Administration (NARA) has been involved with the development of PDF/A, the archival subset of the PDF format, for over a decade. The agency writes the rules used by the US Federal Government agencies for committing records to permanent archive.
In February 2014 I interviewed Kevin L. De Vorsey, NARA Supervisory Electronic Records Format Specialist about the first update to NARA’s Transfer Guidance since 2004.
DJ: NARA has released new transfer guidance; the first since 2004. Can you describe the change in focus?
KD: I wouldn’t describe it as a change in focus so much as an evolution reflecting changes taking place in the way agencies create and use electronic records as well as in NARA’s ability to manage those that are scheduled for permanent retention. NARA has always stressed system and platform independence for electronic records and the new guidance is built upon that core tenet.
Electronic files are more frequently being viewed as ‘the record’. In the past an agency might have scheduled the typed, paper transcript of a meeting as the permanent record while today they are likely to view a video recording as the record of that event. In acknowledgement of this shift our transfer guidance now includes categories describing appropriate digital audio and video file formats and other categories such as computer aided design (CAD) that weren’t previously addressed.
DJ: I see that the Open XML formats (DOCX, XLSX, PPTX) are included in the new transfer guidance. These aren’t specifically archival formats, so why are they included?
KD: One of the great challenges that we face is the span in the age of the electronic records that are transferred to us. Keep in mind that records are only transferred to NARA when they are no longer in active use. A number of agencies are still transferring electronic records that originated on main frame computers and which were encoded according to the Extended Binary Coded Decimal Interchange Code (EBCDIC) character set while others are using the very latest formats like the Open XML formats popularized by Microsoft.
Unlike previous Word formats, the OOXML family are recognized as ECMA and ISO standards and they are XML based. As you point out, this alone doesn’t put them in the “archival” category but the availability of detailed technical specifications is important and will help the digital preservation community make decisions about when and what actions we need to take in the future to ensure that the content stored in these files remains accessible.
DJ: Do you foresee any competition to PDF in terms of final-form electronic documents?
KD: That is an interesting question but not an easy one to answer. PDF is certainly a very common format for a variety of purposes and it is difficult to think of anything similar in terms of its ability to accommodate a wide variety of data types. PDF provides an excellent package for traditional things like forms and textual documents but we are also seeing it used for CAD and Geospatial records which are output from systems. This is an important distinction to make; some records are created and managed as files that sit on a hard drive while others are maintained in an application and must be exported out for transfer. My own folders at work demonstrate the latter and are populated with word processing files, spreadsheets, presentations, and of course PDF files. Many CAD systems, Geospatial systems, and cloud based productivity tools maintain records in something that is analogous to a database or as XML. NARA cannot preserve every system in use and so agencies must be able to extract their records in an acceptable format along with associated metadata and this package is what is transferred to NARA.
DJ: How important are standardized (i.e., openly published) formats from a records-management point of view?
KD: Federal agencies use a wide variety of systems to meet their often unique business needs. If a system is used to create, manage and store electronic records that have been appraised as permanently valuable, then it is necessary to export the records and their associated metadata out so that they can be transferred to NARA. We are taking the records away from the environment where they were created and used and will not have access to the people that worked with them. We need to capture enough information about the records, their creators, and the context under which they were created so that researchers understand what they are looking at. File formats and metadata standards that are open and thoroughly documented help us fulfill our mandate of maintaining and providing access to them in the future. Trying to retrieve data from a proprietary format could prove costly, especially in the future when the hardware and software required are no longer available.
Open technologies including format and metadata standards are very important in records management, especially when it comes to permanent records that must remain accessible for the foreseeable future. In 20 or 30 years, it could be difficult to track down authoritative information for many formats, while I’d guess that it will be relatively easy to find information about those that went through a vetted standards process. Having an authoritative specification would certainly help troubleshoot any problems that could arise in accessing the information in a particular format when it falls out of use by providing the set of instructions that the application developers followed when they built the software used to encode the files. The alternative would be to reverse engineer files which would be a costly proposition.
DJ: Are proprietary formats inherently problematic, from an archival point of view?
KD: If you were to pick a single type of electronic record such as structured data held in a database, you could quickly generate a list of 20 to 30 proprietary database applications and in short order find an agency that has used any one of them with permanent records. It is impossible for NARA to maintain licenses and expertise for Oracle, Informix, Microsoft SQL Server, Sybase, Mysql, Seibel, and the other systems that are in use. As a result we ask agencies to transfer the data from these systems in a platform independent format along with all of the code tables, user manuals, reports, and other metadata and documentation necessary to interpret the records outside of the original system. Unfortunately, exporting records out of a system is not always straight forward and not all are capable of outputting records in a non-proprietary format.
DJ: Will NARA encourage agencies to adopt standards-based formats?
KD: Agencies appreciate having requirements clearly defined when they are setting up systems and standards are a great way of ensuring that those requirements are vetted, widely adopted, and actively supported. NARA is an active participant in a number of standards making bodies involved in records and document management as well as other areas involving electronic records.
DJ: How would government agencies save if they adopted standards-based formats?
KD: Standards don’t solve all of the problems associated with the proper storage and management of information in agencies but they certainly help. Maintaining permanent records in an acceptable format, or ensuring that a system is capable of outputting records in an acceptable format avoids the cost and expense of migrating records prior to transfer. Also, agencies may avoid costly data migrations when moving records between proprietary systems. This isn’t related to formats, but we have seen instances where a cloud service used so that office automation files could be shared for collaboration only supported ‘one-at-a-time’ file download. Depending on the number of files involved, this could prove very expensive in terms of staff time.
DJ: What problems would standards-based formats solve?
KD: Standards provide a common, known reference point. Since permanent records are often transferred long after they are no longer in active use, standards can provide a documented structure that would otherwise be lacking. You would be surprised how difficult it can be to track down specifications for file formats that were in common use just a few years ago. And while having standards to refer to is helpful, it is also important to understand how electronic records relate to them. It is common for application developers to interpret standards differently so it is important to understand how files conform to the standards they are based on.
DJ: What are the barriers to adoption of standards-based technologies?
KD: When agencies develop requirements for systems and technologies, immediate business needs often take precedence over long-term records management and transfer requirements. As with the example of the cloud service, the immediate need to store and collaborate on documents took precedence over the down-the-road need to transfer records out. In this sense the biggest barrier to adoption is having a clear understanding of the applicable standards to look for in a system used to manage electronic records. In the case of international or national standards, the cost of purchasing a copy can be a barrier if it discourages adoption. In some cases there may be multiple, competing standards that dilute true standardization.
DJ: Thank you for your time, Mr. De Vorsey!