Optical Character Recognition (OCR) is a process to convert images into text documents. Companies use OCR technologies to digitize text and data from documents such as PDFs, scanned images, and physical records. However, the OCR technologies had their limitations, as they were unable to extract text from some layouts, such as forms and tables. This did not fulfill the requirements of companies to accurately identify and extract data from any file type.

Recognizing the shortcomings of traditional OCR technologies, Amazon introduced a new machine learning service, Amazon Textract. It allows accurate text extraction from documents and layouts of any type. It can also detect typed and handwritten text from records and reports and can be integrated into applications through the Textract API.

Press + to interact

Primary features of Amazon Textract

Amazon Textract revolutionizes document processing with its advanced features. Let’s look at some of its primary features below:

Extract layout elements

One of the features of Amazon Textract is its ability to extract layout elements from documents. These elements include paragraphs, lists, page numbers, footers, headers, figures, tables, titles, and section headers. The layout feature can be used separately in an application or with other analyzed document features through Amazon Textract Analyze Document API.

Create custom queries

In addition to pretrained queries, Amazon Textract also provides Custom Queries. The Custom Queries feature allows users to enhance text extraction accuracy according to business requirements by customizing the pretrained queries. Using Amazon Console, users can provide as few as ten samples to train Amazon Textract to identify text accurately from specified document formats. 

Press + to interact
Workflow of custom query creation
Workflow of custom query creation

Extract data from forms

Traditional OCR technologies were unable to extract data from forms while retaining the key-value pair relationship. The extraction of key-value pairs as simple text made it difficult to import data directly into database tables.

To overcome this limitation, Amazon introduced the Form Extraction feature of Amazon Textract that allows users to automatically detect the key-value pairs in images. Unlike traditional OCRs, Amazon Textract retains this context, making it simple to import extracted data into databases.

How Amazon Textract works

Amazon Textract uses deep-learning technology developed by Amazon’s computer vision scientists to perform the following actions:

  • Detecting text: Amazon Textract outputs lines, words, relationships between lines and words, and the location of lines and words in multiple Block objects.

  • Detecting and analyzing:

    • Relationships in the text: Amazon Textract identifies relationships between text, form fields, tables, query responses, and signatures in documents and forms.

    • Invoices and receipts: Amazon Textract can identify and extract required data, such as vendor contact information, from invoices and receipts of any layout.

    • Government identity documents: The AnalyzeID API of Amazon Textract can extract relevant data from identity documents such as passports, ID cards, and others.

    • Lending documents: With Analyze Lending API, Amazon Textract can extract, categorize, and validate information from mortgage-related documents.

Press + to interact
Capabilities of Amazon Textract
Capabilities of Amazon Textract

For all actions, Amazon Textract provides synchronous and asynchronous operations to process single-paged and longer documents, respectively. The synchronous operations are real-time. However, the asynchronous operations do not produce real-time responses.

Upon completion of any text analysis, Amazon Textract provides the final result in the form of an array of Block objects or ExpenseDocument objects. Both objects contain details about the items found in the document. These details contain information like the location of the item in the document and how it is related to other items. 

Press + to interact
 Processing  documents using Amazon Textract
Processing documents using Amazon Textract

Best practices for Amazon Textract

Here are a few ways that can help you get accurate output from Amazon Textract and use them to achieve the required results:

  • Language support: Optimizing input documents can help Amazon Textract detect and extract text accurately. For this, ensure that the text of the document is in one of the languages supported by Amazon Textract. The Amazon Textract supported languages are English, Spanish, German, Italian, French, and Portuguese, currently.

  • Quality of images: Make sure the quality of the image(s) you provide to Amazon Textract has high resolution. This will help Amazon Textract in reading the text in the image. The minimum DPI of these images should be 150 for ideal results.

  • File formats: Ensure that the file uploaded to Amazon Textract is in one of the formats supported by the service. Amazon Textract supports PDF, TIFF, JPEG, and PNG formats. If the file is already in a format supported by the service, do not convert it before uploading.

  • Table extraction: For extracting data from tables accurately, ensure that tables are visually distinct from other page elements, and that text within the tables is upright. The merged cells or tables may result in inconsistent extraction.

  • Confidence scores: Confidence scores are numbers ranging from 0 to 100, indicating the likelihood of the given predictions being correct. The confidence score can help users make decisions by setting a minimum confidence score threshold in applications. For instance, a company uses Amazon Textract to extract financial data from invoices for expense tracking. For less critical tasks, like archiving invoices for reference, the company can set a lower confidence threshold, say 50%. In this case, extracted values with confidence scores above 50% can be accepted without further review.

Use case examples

Different features of Amazon Textract can be utilized in different situations. Let’s understand a few scenarios where Amazon Textract can help streamline time-consuming processes.

Expense and tax tracking

Consider a use case where a retail company has multiple stores onboard. The company receives hundreds of receipts that must be processed for expense tracking. Manually processing these receipts is time-consuming and error-prone.

Using Amazon Textract can save the company a great amount of time. In this scenario, Amazon Textract will extract text and structured data from the receipts, including store names, purchase dates, item descriptions, prices, and tax amounts. The total expense and tax information can then be easily calculated from the extracted information.

Press + to interact
Amazon Textract for extracting and structuring data from receipts
Amazon Textract for extracting and structuring data from receipts

Processing inventory reports

Consider another use case where an e-commerce platform manages inventories of various suppliers. Inventory reports provided by different suppliers are in various file types, such as spreadsheets, PDFs, and scanned documents. Manually analyzing these reports to maintain a database is a time-consuming process.

Amazon Textract’s table extraction feature would be an ideal solution in this scenario. Amazon Textract service would extract tables containing product information, and the extracted inventory data could be easily validated against existing records in the database. This can help to identify discrepancies. 

Press + to interact
Automated extraction of data using Amazon Textract
Automated extraction of data using Amazon Textract

 

Get hands-on with 1300+ tech skills courses.