Audio/Video-to-Text Generation with Gemini

Learn about the Files API and how it can be used to send audio and videos in prompts.

The Files API

The Gemini Files API allows us to store and access media files (text, images, audio, and video) to use with the model’s generation capabilities. This functionality is particularly useful when the prompt data exceeds the size limit of the standard prompt input of 20 MB or when we want to provide multimedia content for multimodal prompting. The File API allows us to store up to 20 GB of files per project, with each file capped at 2 GB. Files are kept for 48 hours and can be accessed with the API key that was used to upload them. This service is free in all regions where the Gemini API is available.

Get hands-on with 1200+ tech skills courses.