What file formats are used in OCR (Optical Character Recognition)?
Optical Character Recognition (OCR) is a software process that attempts to make a pass over a text-based image file and re-create a corresponding text file made up of ASCII characters. The resulting ASCII text file can be indexed by different delivery systems to render full-text searching. For textual materials, the standard digital output file is a TIFF image. TIFF is a non-proprietary file format. JPEG2000 (also a standard file format) is a derivative format generated from the original TIFF. In either case, each file represents one page of the original. JPEG2000 includes a compression component that makes the image file smaller in size and thus, more Web-friendly than a TIFF. It is particularly used for oversized materials (including maps and newspapers). PDF is also a derivative output in which the original TIFF images are compressed for Web access and/or bound/bundled into a single portable document that contains multiple images.