Introduction

Let's study distributed data processing systems.

We'll cover the following

Categories of distributed data processing systems

Batch processing systems
Stream processing systems

This chapter will examine distributed systems used to process large amounts of data that would be impossible or very inefficient to process using only a single machine.

Categories of distributed data processing systems

Distributed data processing systems can be classified into the following two main categories:

Batch processing systems

Batch processing systems group individual data items into groups called batches, which are processed one at a time. In many cases, these groups can be quite large (e.g., all items for a day), so the main goal for these systems is usually to provide high throughput, but sometimes at the cost of higher latency.

Stream processing systems

Stream processing systems receive and process data continuously as a stream of data items. As a result, the main goal for these systems is to provide very low latency sometimes at the cost of decreased throughput.

Get hands-on with 1400+ tech skills courses.

Before Getting Started

Introduction to Distributed Systems

Basic Concepts and Theorems

Distributed Transactions

Achieving Isolation

Achieving Atomicity

Concluding Distributed Transactions

Consensus

Time

Order

Networking

Security

Security Protocols

From Theory to Practice

Case Study 1: Distributed File Systems

Case Study 2: Distributed Coordination Service

Case Study 3: Distributed Data Stores

Case Study 4: Distributed Messaging System

Case Study 5: Distributed Cluster Management

Case Study 6: Distributed Ledger

Case Study 7: Distributed Data Processing Systems

Practices & Patterns

Communication Patterns

Coordination Patterns

Data Synchronization

Shared-nothing Architectures

Distributed Locking

Compatibility Patterns

Dealing with Failure

Distributed Tracing

Concluding this Course

Introduction

Categories of distributed data processing systems

Batch processing systems

Stream processing systems