When it comes to OCR (Optical Character Recognition), there is none other than the Tesseract engine [ Wikipedia ], it was created by HP and now develop and maintain by Google, Tesseract is a very powerful OCR engine used by many other OCR software, this is because Google has an interest in archiving and indexing all the books in the world, therefore alot of resources has been poured into making it as accurate as possible. Google Books is a testament to Google’s commitment to this amazing technology. This amazing engine can now be found it Android Apps for scanning receipts and also on some cameras for direct translation on signboards.
The initial versions of Tesseract could only recognize English language text. Tesseract v2 added six additional Western languages (French, Italian, German, Spanish, Brazilian Portuguese, Dutch). Version 3 extended language support significantly to include ideographic (Chinese & Japanese) and right-to-left (e.g. Arabic, Hebrew) languages as well many more scripts. New languages included Arabic, Bulgarian, Catalan, Chinese (Simplified and Traditional), Croatian, Czech, Danish, German (Fraktur script), Greek, Finnish, Hebrew, Hindi, Hungarian, Indonesian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak (standard and Fraktur script), Slovenian, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian and Vietnamese. V3.04, released in July 2015, added an additional 39 language/script combinations, bringing the total count of support languages to over 100.
↓ 01 – ABBYY FineReader Online [ FlexiCapture Engine | Online ]
Convert PDF and JPG files to Microsoft® Word and Excel. Try it out now and recognize up to 10 pages free of charge! FineReader Online lets you convert scans of documents and photos containing text in any of the supported formats to Microsoft Word and Excel files, PowerPoint® presentations, text files and searchable PDF documents. Important! FineReader Online only supports printed documents. Please do not try to recognize hand-written text. Recognize documents even if they contain a mixture of Chinese, Japanese and Korean. You can even recognize old text and documents containing Fraktur fonts.
↓ 02 – CuneiForm [ Tesseract engine | Windows ]
Cuneiform is a program for converting the text in documents into editable form by means of OCR. The text the program renders can be edited in office programs and text editors and saved in standard formats; full-text search can be conducted over it as well. Cognitive Technologies made Cuneiform a free program and granted the Open Source community access to its source code. The open source project, open to anyone willing to participate, was called OpenOCR. For purposes of coordination of work on the project there was launched the web-site OpenOCR.org, which is provided with a forum in Russian.
↓ 03 – SimpleOCR [ Proprietary engine | Windows ]
SimpleOCR is the popular freeware OCR software with hundreds of thousands of users worldwide. SimpleOCR is also a royalty-free OCR SDK for developers to use in their custom applications. If you have a scanner and want to avoid retyping your documents, SimpleOCR is the fast, free way to do it. The SimpleOCR freeware is 100% free and not limited in any way. Anyone can use SimpleOCR for free–home users, educational institutions, even corporate users.
↓ 04 – OCRFeeder [ Tesseract/Ocrad engine | Linux Ubuntu ]
OCRFeeder is a document layout analysis and optical character recognition system. Given the images it will automatically outline its contents, distinguish between what’s graphics and text and perform OCR over the latter. It generates multiple formats being its main one ODT. It features a complete GTK graphical user interface that allows the users to correct any unrecognized characters, defined or correct bounding boxes, set paragraph styles, clean the input images, import PDFs, save and load the project, export everything to multiple formats, etc.
↓ 05 – YAGF OCR [ Tesseract engine | Windows ]
YAGF is a graphical front-end for cuneiform and tesseract OCR tools. With YAGF you can open already scanned image files or obtain new images via XSane (scanning results are automatically passed to YAGF). Once you have a scanned image you can prepare it for recognition, select particular image areas for recognition, set the recognition language and so on. Recognized text is displayed in a editor window where it can be corrected, saved to disk or copied to clipboard. YAGF also provides some facilities for a multi-page recognition (see the online help for more details).
↓ 06 – MeOCR [ PUMA OCR Engine | Windows ]
MeOCR 1.0 converts your scanned documents to editable text documents using OCR and exports them to Microsoft Word with one click. Use it to save time and money by not having to retype your documents. Me OCR is a fast reliable and accurate image to text OCR conversion application. Features:
- High accuracy: Saves time by reducing the number of corrections and editing needed.
- Retains Formatting: Most OCR applications do not retain formatting. Me OCR Produces formatted output saving time formatting.
- Supports Multiple Languages: Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, French, German, Hungarian, Italian, Latvian, Lithuanian, Polish, Portuguese, Romanian, Russian, Serbian, Slovenian, Spanish, Swedish, Turkish, Ukrainian.
↓ 07 – Free OCR to Word [ Unknown OCR Engine | Windows ]
Waste no more time on tedious retyping! Free OCR to Word is the most efficient text recognition solution that performs OCR in no time. It converts any image or scanned document to editable Word document. Within few clicks, you will have a fully editable copy of your paper document in your favorite word processor. Free OCR to Word has the capability to identify text within image files and turn it into electronic document. It can perform OCR on all key and many rare image format including JPG/JPEG, TIF/TIFF, BMP, GIF, PNG, EMF, WMF, JPE, ICO, JFIF, PCX, PSD, PCD, TGA and many more.
↓ 08 – Google Docs [ Tesseract engine | Online ]
You may convert your graphical files to editable text by uploading them to Google Docs. To enable OCR in Google Docs, visit http://drive.google.com, go to ‘settings’ and check – “Convert text from PDF and image files to Google documents” when uploading. All images with text will automatically be converted into editable text.
↓ 09 – FreeOCR [ Tesseract engine | Windows ]
FreeOCR is a free Optical Character Recognition Software for Windows and supports scanning from most Twain scanners and can also open most scanned PDF’s and multi page Tiff images as well as popular image file formats. FreeOCR outputs plain text and can export directly to Microsoft Word format. Free OCR uses the latest Tesseract (v3.01) OCR engine. It includes a Windows installer and It is very simple to use and supports opening multi-page tiff documents, Adobe PDF and fax documents as well as most image types including compressed Tiff’s which the Tesseract engine on its own cannot read .It now can scan using Twain and WIA scanning drivers.
↓ 10 – Boxoft Free OCR [ Unknown Engine | Windows ]
Boxoft Free OCR is completely free software to help you extract text from all kinds of images. The freeware can analyze multi-column text and support multiple languages: English, French, German, Italian, Dutch, Spanish, Portuguese, Basque and so on. You can even scan your paper documents and then OCR content from scanned files into editable text immediately.
↓ 11 – PDF OCR X [ Unknown Engine | Windows | Mac ]
PDF OCR X is a simple drag-and-drop utility for Mac OS X and Windows, that converts your Adobe PDFs and images into text documents or searchable PDF files. It uses advanced OCR (optical character recognition) technology to extract the text of the PDF even if that text is contained in an image. This is particularly useful for dealing with PDFs that were created via a Scan-to-PDF function in a scanner or photo copier.