Apache Flink

Let's have an introduction to Apache Flink and look into its architecture.

We'll cover the following

Basic constructs in Flink

Stream
Transformation

Additional Flink constructs
Flow of data in data processing Flink application
The architecture of Flink
Avoid making the Job Manager a single point of failure

Apache Flink is an open-source stream-processing framework developed by Carbone et al.P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas, “Apache FlinkTM: Stream and Batch Processing in a Single Engine,” IEEE Data Engineering Bulletin 2015, 2015. at Apache Software Foundation to provide a high throughput, low latency data processing engine.

Note: This is the main differentiator between Flink and Spark. Flink processes incoming data as they arrive, which provides sub-second latency that can go down to single-digit millisecond latency. Spark also provides a streaming engine called Spark Streaming. However, that is running some form of micro-batch processing, where an input data stream is split into batches, which are then processed to generate the final results with the associated latency trade-off.

Get hands-on with 1400+ tech skills courses.

Before Getting Started

Introduction to Distributed Systems

Basic Concepts and Theorems

Distributed Transactions

Achieving Isolation

Achieving Atomicity

Concluding Distributed Transactions

Consensus

Time

Order

Networking

Security

Security Protocols

From Theory to Practice

Case Study 1: Distributed File Systems

Case Study 2: Distributed Coordination Service

Case Study 3: Distributed Data Stores

Case Study 4: Distributed Messaging System

Case Study 5: Distributed Cluster Management

Case Study 6: Distributed Ledger

Case Study 7: Distributed Data Processing Systems

Practices & Patterns

Communication Patterns

Coordination Patterns

Data Synchronization

Shared-nothing Architectures

Distributed Locking

Compatibility Patterns

Dealing with Failure

Distributed Tracing

Concluding this Course

Apache Flink