Optical Character Recognition (OCR) is a process to convert images into text documents. Companies use OCR technologies to digitize text and data from documents such as PDFs, scanned images, and physical records. However, the OCR technologies had their limitations, as they were unable to extract text from some layouts, such as forms and tables. This did not fulfill the requirements of companies to accurately identify and extract data from any file type.

Recognizing the shortcomings of traditional OCR technologies, Amazon introduced a new machine learning service, Amazon Textract. It allows accurate text extraction from documents and layouts of any type. It can also detect typed and handwritten text from records and reports and can be integrated into applications through the Textract API.

Get hands-on with 1200+ tech skills courses.