Warning! Human Judgment Required for PDF/UA Conformance

Thumbs up.This article is intended to highlight how PDF/UA differs from previous PDF standards from an implementation (ie, software development) standpoint.

The distinction is of serious interest to developers, as end-user expectations of files claiming PDF/UA conformance are based on criteria that may be costly to fulfill. Files claiming PDF/UA conformance may be subject to intensive manual verification, so if even casual verification efforts reveal non-conformance in the most serious categories, it’s likely that unhappiness will ensue.

Such situations are always best avoided, so developers should (dare I say, “shall”) understand how and to whom the burdens of PDF/UA conformance apply before they decide to identify output files as conforming to PDF/UA.

PDF/UA is Not Like Other PDF Standards

Failure to appreciate the demands PDF/UA places on document authors can easily result in confusion and potentially, dissatisfaction with the PDF authoring software.

Let’s look at the most important difference between PDF/UA and all other PDF standards:

 PDF/XPDF/APDF/VTPDF/EPDF/UA
MachineYesYesYesYesYes
Human----Yes

Entirely unlike the other subset standards for PDF, PDF/UA conformance requirements go far beyond file format syntax, object specifications and other aspects that may be ascertained programmatically. PDF/UA includes many specific requirements that are (largely) impossible to verify with computers.

The PDF/UA Competence Center of the PDF Association has operated the Matterhorn Protocol project for the past year with the active participation of several major implementers in the marketplace for tagged PDF. This document is nearing completion, and should be finalized and published in the first half of 2013.

As of the 0.92 draft the Matterhorn Protocol establishes 45 discreet PDF/UA validity checkpoints requiring human judgment. These range from establishing correct reading order to the correct identification of table header cells.

Only 84 checks may be performed entirely by machine.

What really matters in PDF/UA: Semantics, not Syntax

Let’s be specific about what really matters in PDF/UA so far as interested end-users are concerned.

The fundamental requirements of PDF/UA is stated perhaps most neatly in section 7.1, paragraph 2:

“Content shall be marked in the structure tree with semantically appropriate tags in a logical reading order.”

The same sentiment is more or less repeated in section 7.2, paragraph 1.

“Content shall be tagged in logical reading order. The most semantically appropriate tag shall be used for each logical element in the document content.”

Who is responsible for ensuring these requirements are met?

We’ll start with the end user creating the document

If an accessible document is required, it is the PDF document’s human author who is responsible for providing their software with correct inputs; that is, using available tools to mark paragraphs, headings, tables, figures, and so on.

While it may be the author’s responsibility, they won’t carry it adequately unless the software works closely with them to ensure content is fully and properly categorized, structured and ordered in accordance with the many aspects of PDF/UA conformance that require human judgment.

Layout software implementations (and crucially, the accompanying user-interfaces) that claim PDF/UA conformance must provide the author with techniques, warnings, tools, workflows and other means of ensuring the user is prompted to ensure their content conforms. Without such prompts the PDF/UA conformance claim is hollow.

Implications for “Engine” Software

Software libraries execute instructions provided by the application implementation, so the responsibilities at this level are all about the machine-validated aspects of PDF/UA.

However, there’s an interesting question to ask: when and (more importantly) on what basis should software set the PDF/UA flag on output files?

Library developers who choose to support PDF/UA would do well to emphasize to their customers that PDF/UA conformance requires far more than merely ensuring all content is tagged and the PDF/UA metadata is present, not least for liability reasons.

Accessibility is increasingly regulated. Don’t get forced (or sued)

Ensuring the customer provides the right information to enable accurate tagging of PDF files for PDF/UA conformance is a lot of work. Unfortunately, conformance is much more than annoying MCIDs, structure-types and attributes.

Developers need to remember that users regard PDF/UA conformance as a concrete indication of reliability in accessing content. In addition to providing technically specific rules for accessible electronic documents, PDF/UA gives end users and governments a specific means of checking PDF files for conformance (or the lack thereof).

It would be best to get out in front.



Leave a Reply