How Imaging Leads to Optical Character Recognition (OCR)?
According to the National Library of Canada, digitization usually refers to the process of converting a paper- or film-based document into electronic form bit by bit. This electronic conversion is accomplished through a process called imaging whereby a document is scanned and an electronic representation of the original is produced. Using a scanner, the imaging process involves recording changes in light intensity reflected from the document as a matrix of dots. The light/ color values of each dot is stored in binary digits, one bit being required for each dot in a black/white scan and up to 32 bits for a color scan [1]. Optical character recognition (OCR) takes this data one step further by converting this electronic data, originally a bitmap, into machine-readable, editable text. Problems with OCR Optical character recognition currently has applications in areas such as document indexing and sorting, forms processing and digital document conversion. The current systems would have unl