PDF Converter With OCR

I have a genealogy of my Link family that was written on a typewriter and published at a copy center. To keep the page count as low as possible, they squished things together as much as possible. An amazing amount of research went into this document. Unfortunately their layout and photo-management skills were a bit lacking.

I have started scanning the document as PDF documents which, combined with Evernote, make it somewhat searchable. The punched holes for the combed binding and the poor layout make it a very ugly publication. It deserves more. So when I stumbled onto PDF Converter with OCR [Mac – $19.99], I had to check it out. Fortunately, they offer a free trial. I played with it for a few minutes and was convinced. This app is amazing!

PDFconverter01.png
PDF Converter

Here you see PDF Converter’s work area. I’ve opened the first scanned PDF file and I’m looking at the first page. Over on the right you see the output format choices available. The boxes in green are areas designated as text. Other options are images (displayed in a red box) and tables (displayed in a purple box). Your first step is to go through the pages and make adjustments as necessary. I had to adjust box sizes on most of the images and I deleted text boxes containing footers and page numbers. Here’s the results from the first page . . .

WPtagcloud.png

Yes, there are some problems – most related to the typewriter that created the document. One page included a poem that had been printed using some kind of script font. The OCR’d page was almost impossible to read. My experiment with a simple Pages document that I printed, then scanned was converted without any errors.

Attempts to OCR tables and columns often end up as a jumbled disaster, but PDF Converter did an amazing job of maintaining the columnar text format you see in the example below. The dark lines demonstrate where PDF Converter is defining rows and columns. Yes, this example will require some cleanup but it’s mild compared to the usual results I’ve gotten from other apps.

PDF Converter
PDF Converter

There are a few quirks. Each text box you see in the app’s layout screen is saved as a text box in Pages. And, when I tried to apply styles to the text I got a list of styles in something that looks like Chinese lettering. My solution is to copy/paste the text out of the various boxes into a new document – one with appropriate styles for my project.

It will still take some time and effort to turn the entire publication into editable text. This delightful app means that an amazing piece of family history will get the layout and design attention it deserves.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s