Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What file formats are used in OCR (Optical Character Recognition)?

0
Posted

What file formats are used in OCR (Optical Character Recognition)?

0

Optical Character Recognition (OCR) is a software process that attempts to make a pass over a text-based image file and re-create a corresponding text file made up of ASCII characters. The resulting ASCII text file can be indexed by different delivery systems to render full-text searching. For textual materials, the standard digital output file is a TIFF image. TIFF is a non-proprietary file format. JPEG2000 (also a standard file format) is a derivative format generated from the original TIFF. In either case, each file represents one page of the original. JPEG2000 includes a compression component that makes the image file smaller in size and thus, more Web-friendly than a TIFF. It is particularly used for oversized materials (including maps and newspapers). PDF is also a derivative output in which the original TIFF images are compressed for Web access and/or bound/bundled into a single portable document that contains multiple images.

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123