If you’ve ever needed to pull the text out of an Adobe PDF document, you know how frustrating it can be. The virtually universal file format doesn’t exactly make it easy to re-use text from one document and paste it into another. Formatting, line-spacing, and images all conspire to make PDF documents a challenge. But Able2Extract ($100, 7-day free trial) makes child’s play of this tedious task.
Able2Extract simply reads the PDF document, then spits out text, graphics, tables, and other content into an Microsoft Office format document of your choice–Word, Excel, or Powerpoint. Line spacing and formatting are preserved to the best of the program’s ability: tables won’t cause paragraphs to break in odd places. Nor will inline graphics, which are preserved and moved into Office documents in precisely the same location they appear in the PDF document.
The $30 premium you pay for the Professional version of the product adds in a key feature: optical character recognition, or OCR. A2E Pro can read in those PDF documents that were scanned as an image file, and does a remarkably good job of not only converting the images back into text, but at reproducing the pagination, layout, and even the typeface used in the original document.
Extracting text from a PDF that had been generated using the Adobe Acrobat utility (or “Print to PDF”) was a snap; A 20 page document, with inline images, a table that text flowed around, and other details, was reproduced flawlessly in a Microsoft Word .doc file. Using the Pro version to OCR a poorly-reproduced document that had originally been typed with a typewriter, then scanned crooked, was more of a challenge.
It took the program 3 minutes and 15 seconds to convert an otherwise disastrously-badly scanned 62-page test PDF. Minor typos appeared where handwritten notes appeared in the margins on some pages–but it tried hard to reproduce those, as well. The original had been sent via fax, and the fax footer, faded in the poor quality scan, didn’t reproduce identically–but it was also entirely extraneous. When you load the PDF file into the program, you can drag-select portions of the page that you want to convert; if I had planned the conversion better, I would have avoided selecting these unnecessary footers and the notes in the margins.
A representative of the publisher tells me that drawings or sketches in PDF documents can be scanned and output to a file format that lets them be loaded by AutoCAD. Although I didn’t test this feature, I can see how this could be incredibly useful to an architect, archivist, or historian, especially if the blueprints scanned into a PDF weren’t originally designed in a computer. All in all, I was impressed with Able2Extract.