Adobe Returns Export of Tagged PDF to HTML to Acrobat

Back in December I noticed that a feature in Adobe Acrobat I’d always thought very valuable was now missing: the ability to export tagged PDF to HTML or Word using the document’s structure (tags).

What are “tags” in PDF files?

Tags are the feature of PDF that provides reading order and semantic structure – headings, tables, figures, etc. – to text and other graphics encoded on the PDF page.

While I’d certainly noticed the much-improved export to Word feature in Acrobat XI, that new capability is driven by super-smart software that analyzes the page and performs all kinds of clever tricks to bring you something very similar (in appearance) once opened in MS Word.

The new export to HTML and Word functionality is not driven or influenced by the PDF’s tags (if any). I’m not privy to why Adobe chose this route, but I can guess it had to do with the (sad) fact that even if technically “tagged” the actual quality of tagging in the vast majority of PDF files is very poor.

That’s not, however, a good reason for removing the functionality altogether! What’s more, it used to be otherwise. In Acrobat 7, 8 and 9, export to HTML and Word used PDF tags; In Acrobat X and XI that feature was not retained.

Screen shot of Pulkit's blog post.

Export Tagged PDF to HTML is Back

I mentioned the subject to Adobe, who responded – I have to say – very quickly for a supertanker-sized software company!

This Adobe blog post, by Pulkit Jain, provides instructions on how to download and install Export to HTML options that use tags, a la Acrobat 9.

With respect to Acrobat XI, this feature more-than-likely won’t be made available in a maintenance release, but you can freely download the necessary files for yourself from Pulkit’s blog-post (or get them from an installation of Acrobat 9, for that matter).


One comment

  1. February 16, 2013 at 07:23

    Who in the world reads a document in the maennr your trying to described. If you’re vacant to make the test equal in any measure with your testing methods then you need to do the exact same thing with both adobe and foxit.Not knocking your record. I appreciate you sharing. I’m doing research right now because I’m sick of adobe’s stout ass, throwing their weight around. I don’t need 3 things in my startup to read a fricken pdf document. I don’t need ARM, AIR and SPeed Launcher. Secure your crap adobe

Leave a Reply