January 24, 2014 1

Screenshot of presentation title slide: Dream or Yawn, Waking up to the Possibilities of an open source PDF validator.Most electronic documents are PDF files. The problem – not all PDF files are created equal. Not all software does the right thing when editing a PDF file. Bad PDF is a reality, and so is substandard PDF software.

To discuss this problem I offered a presentation on PDF validation at both 2013 EU and North American PDF Association Technical Conferences.

In a survey of ECM industry professionals in March 2013 the PDF Association found a third claiming to personally encounter bad PDF files, or believed them to be commonplace. A quarter of respondents thought more than 1% of PDF files were broken in some way.

1% of contracts. 1% of application forms, statements, annual reports and clinical records.

Not good.

Maybe that’s why between 30% and 40% of end-users felt that broken PDF files and/or software (they can’t really distinguish) caused “significant” or worse business problems.

Given that PDF is supposed to be able to replace paper as a stable and reliable electronic document format, these opinions merit concern.

The truth is that PDF isn’t perceived to be as reliable or as universally and fully supported as one might hope for a truly generic electronic document format. A big part of the reason is easy to understand. In 2014, 20 years after the first PDF was released, there’s still no way for developers to be sure they’ve done it right.

This state-of-affairs might be OK for other, less universal, less significant, less relied-on file-formats – but it’s not OK for hardcopy. No-one wants to be chained to one vendor in order to read contracts, review old construction blueprints, or mark up a brochure.

Today, PDF is the world’s chosen electronic document format for final-form content. Approximately 80% of non-HTML documents on the public web are PDF files.

Even today, however, the vast majority of enterprise content management (ECM) and document management system (DMS) implementations don’t deal with the format itself; they treat PDF just like a TIFF or GIF file. That is, they’ll handle a 650 page PDF file in the same way they handle a screen-shot.

The result? Inefficiency, lost opportunities and not infrequently, broken files, missing data and impediments to business.

Here’s video of a presentation I gave in 2013 on this subject at the PDF Association’s EU Technical Conference. The corresponding PDF is also available.