BigTable/HBase Architecture

Let's explore what BigTable and HBase areand look into the data model and architecture of HBase.

BigTableF. Chang et al., “Bigtable: A Distributed Storage System for Structured Data,” in Proceedings of 7th {USENIX} Symposium on Operating Systems Design and Implementation (OSDI), 2006. is a distributed storage system initially developed in Google and was the inspiration for HBase, a distributed datastore that is part of the Apache Hadoop project.

Note that the architecture of these two systems is very similar to each other, so we will focus only on HBase, which is an open-source system.

HBase data model

HBase provides a sparse, multi-dimensional sorted map as a data model, as shown in the following illustration:

The map is indexed by a row key, a column key (e.g. C1, C2, C4 etc.) and a timestamp, while each value in the map is an uninterpreted array of bytes.

The columns are further grouped in column families (e.g. CF1, CF2 etc.). All members of a column family are physically stored together on the filesystem and the user can specify tuning configurations for each column family, such as compression type or in-memory caching.

Column families need to be declared upfront during schema definition, but columns can also be created dynamically. Furthermore, the system supports a small number of column families, but an unlimited number of columns.

The keys are also uninterpreted bytes and rows of the table are physically stored in the lexicographical order of the keys. Each table is partitioned horizontally using range partitioning based on the row key into segments, called regions (e.g. [A, D], [E, K] etc.).

Note that the ranges correspond to the regions.

Goal of HBase

The main goal of HBase data model and the architecture is to allow the user to control the physical layout of data, so that related data are stored near each other.

HBase Architecture

The following illustration shows the high-level architecture of HBase, which is also based on manager-worker architecture. The leader is called HManager and the followers are called region servers:

Press + to interact
HBase high-level architecture
HBase high-level architecture

The HManager

The HManager is responsible for:

  • Assigning regions to region servers
  • Detecting the addition and expiration of region servers
  • Balancing region server load
  • Handling schema changes

Region servers

Each region server:

  • Manages a set of regions
  • Handles read and write requests to the loaded regions.
  • Splits regions that have grown too large.

Note that like other primary-backup distributed systems, clients do not communicate with the primary for data flow operations but only for control flow operations to prevent it from becoming the system’s performance bottleneck.

Usage of Zookeeper in HBase

Hbase uses Zookeeper to:

  • Perform a leader election to decide the manager node
  • Maintain group membership of region servers
  • Store the bootstrap location of HBase data
  • Store schema information
  • Access control lists

Data storage

Each region server stores the data for the associated regions in HDFS, which provides the necessary redundancy.

A region server can be collocated at the same machine as an HDFS datanode to enable data locality and minimize network traffic.

META table

There is a special HBase table, called the META table, which contains the mapping between regions and region servers in the cluster.

The location of this META table is stored in Zookeeper. As a result, the first time a client needs to read/write to HBase, it first communicates with Zookeeper to retrieve the region server that hosts the META table, then contacts this region server to find the region server containing the desired table and finally sends the read/write operation to that server.

The client caches locally the location of the META table and the data already read from this table for future use.

Creation of ephemeral node

HManagers initially compete to create an ephemeral node in Zookeeper. The first becomes the active manager, while the second one listens for notifications from Zookeeper of the active manager failure.

Similarly, region servers create ephemeral nodes in Zookeeper at a directory monitored by the HManager. In this way, the HManager is aware of region servers that join/leave the cluster, so that it can assign regions accordingly.

Get hands-on with 1400+ tech skills courses.