Introduction to Vector Databases and Embeddings

Learn about the limitations of large language models (LLMs) and how vector databases help overcome them.

Foundation models (FMs) are pretrained on large amounts of data with billions of parameters. A traditional machine learning (ML) model is aligned with a specific task, such as sentiment analysis in text or image classification. On the other hand, a foundation model excels across various tasks, including natural language processing (NLP), question answering, image classification, etc.

Large language models (LLMs) are a category of foundation models. Examples of foundation models other than LLMs include DALL•E for generating images using natural language and CLIP (Contrastive Language-Image Pre-training) for image and text understanding. However, these models have some drawbacks. For example, a chatbot application based on an LLM can mislead users with incorrect information. Also, these models are trained on data accessed until a certain time window, which means this data can become dated, and the LLM cannot produce any effective responses beyond a given date.

Let’s look at these problems in detail below.

Drawbacks of large language models

Two of the most commonly encountered issues by LLMs are related to:

  • Limited knowledge

  • Hallucination

Limited knowledge

The training data horizon (also referred to as the knowledge cutoff date) for LLMs represents the last point in time until which the model has been trained. This cutoff date determines the model's awareness of current information and events. Beyond this date, the model may lack access to newer data, potentially leading to outdated or inaccurate responses.

The cutoff date for GPT 3.5 in ChatGPT is January 2022. Let’s ask ChatGPT about the winning team of the Cricket World Cup 2023 and see how it responds.

Get hands-on with 1400+ tech skills courses.