Amazon Polly

Learn about Amazon Polly, its primary features and its working. Also discover how it can be used to solve real-world problems.

Amazon Polly is a text-to-speech (TTS) service that uses advanced deep learning technologies to convert text into lifelike speech; making it possible to develop voice-driven applications.

Press + to interact

It gives control over voices, languages, and speaking styles. It allows developers to integrate text-to-speech capabilities into their applications, products, or services. In this lesson, we will look into the core concepts of Amazon Polly, learn about its working, and explore its features.

Primary features of Amazon Polly

Amazon Polly has multiple features to improve the quality of the speech produced as its output. Some of its primary features are as follows:

  • Neural and Standard Speech: Amazon Polly offers Neural and Standard text-to-speech voices. The Standard engine produces good and natural-sounding speech. However, the Neural engine enhances the speech, making it more human-like.

  • Speech Synthesis Markup Language (SSML) support: Amazon Polly supports Speech Synthesis Markup Language (SSML). This allows developers to control the pronunciation, intonation, pause between sentences, and other aspects of the generated speech.

  • Integration: Amazon Polly services can be easily integrated with others like Amazon Lambda, S3 bucket, and applications.

  • Speech marks: Contains information about the timestamp of a word or a sentence and provides time information in milliseconds.

Press + to interact

How Amazon Polly work

Amazon Polly uses deep learning technologies to analyze the provided text data. Different features of Amazon Polly work together to produce natural-sounding speech from the text. The Polly Long-form engine ensures fluid and engaging delivery for longer content, such as news articles or podcasts.

To enhance the quality of the produced speech, the Polly neural Text-to-Speech (NTTS) voices utilize machine learning techniques to adjust intonation and rhythm, making speech more lifelike. Amazon Polly offers a collection of male and female voices in 24 languages. Users can select any of these voices to customize their audio output according to their preferences and target audience.

Press + to interact
Working of Amazon Polly
Working of Amazon Polly

Use case: Enhance customer interaction

Businesses can use Amazon Polly to increase customer engagement by creating interactive, human-like responses through the NTTS system. The generated audio can be stored and reused without any additional cost. This can be an ideal solution for companies like call centers.

Consider a call center receives a high volume of customer inquiries daily. The company can use Amazon Polly for automated voice responses to streamline operations and enhance customer engagement. The call center can integrate Amazon Polly’s NTTS system into its IVR platform. When customers call in, the IVR system utilizes Amazon Polly to deliver interactive, human-like responses to their inquiries.

Press + to interact
Use of Amazon Polly in call centers
Use of Amazon Polly in call centers

Use case: Speech generation

Consider another scenario where an audiobook business wants to automate converting books into audiobooks. The traditional method of manually reading the book is time-consuming, and online TTS systems produce unnatural-sounding speech.

The audiobook company utilizes Amazon Polly’s text-to-speech conversion capabilities to automate narration. It now provides written text to Amazon Polly, and with SSML support, Amazon Polly adjusts speech rate, pitch, and tone according to the text. 

Press + to interact
Automating business workflows using Amazon Polly
Automating business workflows using Amazon Polly


Get hands-on with 1300+ tech skills courses.