February 7, 2013 522

Based in Munich, Germany, PDFlib GmbH was one of the first providers of PDF development components. Introduced in 1997, PDFlib was soon in use by software developers around the world for a variety of applications. The company was instrumental in establishing the PDF/A Competence Center, which later evolved into the PDF Association.

Customer requests for accessible electronic documents have increased substantially in the past few years. Tagged PDF is not new to PDFlib; the company first shipped support for accessibility features in PDF in 2004. They began implementing PDF/UA long before the standard was published in August 2012, which explains how they’re able to ship PDF/UA conforming software only a few months after the standard’s publication.

Thomas MerzI interviewed the founder and CEO of PDFlib, Thomas Merz, to discuss PDFlib’s objectives in supporting PDF/UA.

Duff Johnson: What was your principle motivation in implementing PDF/UA?

Thomas Merz: PDFlib GmbH has a proven track record regarding solid implementation of PDF-related standards. Since PDFlib supports Tagged PDF since 2004 it was a natural move to improve it with PDF/UA features. Large PDFlib customers are increasingly faced with requirements from their user communities regarding Tagged PDF support.

DJ: Please describe how your products use PDF/UA.

TM: The PDFlib product family offers a PDF creation and processing toolkit for software developers. It can be used to create PDF/UA and supports all of the Tagged PDF, Unicode and other requirements. PDF/UA documents can also be imported for document splitting and assembly applications.

DJ: Do PDFlib implementors tend to focus on creation or processing?

TM: The focus of PDFlib applications is definitely on document creation. There are various scenarios around post-processing existing PDFs, but the majority of customers use PDFlib to create documents from scratch.

DJ: What sort of validation does PDFlib perform, if any?

TM: This question doesn’t really apply to a product for automated processing since we enforce PDF/UA requirements upon document creation. However, PDFlib validates the logical structure of imported pages to make sure they won’t taint the generated output.

DJ: How would you characterize your initial release supporting PDF/UA?

TM: PDFlib offers excellent support for PDF/UA. It implements all required rules and facilitates Tagged PDF creation with a variety of convenience features. Of course, PDFlib users still have to determine the logical structure of the documents themselves. As a world-first PDFlib supports page assembly for Tagged PDF pages with sophisticated handling of imported partial structure trees.

DJ: Ok, so users are responsible for verifying the correct logical reading order of tags and for verifying the semantically-correct tag is chosen. Given that, how does the software work?

TM: PDFlib users have some input material (e.g. text and images from a database) which they arrange to form a document. PDFlib accepts this input and creates a PDF from it. The implementation software user must always consider the best way to present the material on a PDF page; PDFlib is only a tool which helps them create this output.

Similarly, it’s up to the user to provide the logical reading order of content and the semantically appropriate choice of tags for use in creating the structure tree. This is an integral part in the process of designing an application around PDFlib. In a sense, PDFlib provides mortar and bricks, but you still need to have an idea how your house should look like. Many applications naturally create a good structure for this house; in some cases, however, database or other requirements force them to build the roof before they have the basement in place.

DJ: Ensuring correct reading order and tags are major requirements in PDF/UA. How does PDFlib help developers ensure these requirements are met?

TM: For such situations PDFlib offers support for adding content into arbitrary locations in the structure tree. Some examples:

  • Create textual items in reverse order
  • Create contents of two columns in parallel, although the columns will be read sequentially one after the other
  • Go back to a previous page and add some structure element or Artifact

DJ: Describe how you view the relationship between PDF/UA and WCAG 2.0.

TM: In parallel to the corresponding W3C publication we offer documentation regarding “PDFlib Techniques for WCAG 2.0” to inform our users about programming methods for achieving WCAG conformance. Since PDF/UA is much more specific than WCAG, we recommend it as a guideline for users; however, manual or selective application of WCAG techniques is also possible with PDFlib with or without the PDF/UA flag.

We used the publication of PDF/UA as an opportunity to collect and assemble a variety of additional guidelines in addition to WCAG 2.0. In particular, PDFlib implements all of the tag nesting and usage rules in PDF 1.7 according to ISO 32000-1.

DJ: Granted, it’s not published yet, but can you say if you are planning to implement the PDF Association’s Matterhorn Protocol?

TM: Since PDFlib creates fully conforming PDF/UA output, we don’t expect to see many validation issues. Should the Matterhorn Protocol uncover inconsistencies or non-conforming output we will certainly address these issues.

DJ: Apart from accessibility, what do you see as the most likely value end users can get from PDF/UA support in creation or processing software?

TM: Other advantages of Tagged PDF (e.g. Reflow, export to other formats) can better be leveraged with PDF/UA, mostly because of the cleaner document structure tree.

DJ: When will PDFlib 9 ship?

TM: Before the end of March 2013. We published Beta 2 last week and are frantically working on feature fine-tuning and bug-fixing, mostly in the Tagged PDF area. Some examples of the last few weeks:

  • The never-ending story of nesting rules
  • Proper tag structure for form fields, links and other annotations
  • Trying to document all of this for users who never heard about accessibility issues
  • Identify Acrobat bugs related to Tagged PDF – important for us to find workarounds (Acrobat XI fixes some important bugs, but not all)

DJ: Thank you very much for your time.