Course Overview
TOPOne of the biggest challenges to success with big data has always been how to transport it. Even in scenarios that might not be considered "big data," the need for services and data integration in the organization may be challenged simply by inadequate messaging and integration architecture.
This hands-on training workshop gets you up and running with Apache Kafka so you can immediately take advantage of the low latency, massive parallelism and exciting use cases Kafka makes possible. You'll get live instruction and coaching on how to be effective when using Kafka in your work or project.
Scheduled Classes
TOPWhat You'll Learn
TOP- Explore Apache Kafka Architecture
- Learn to configure a distributed messaging broker
- Learn the Apache Kafka architecture and data model
- Learn about decoupled services and distributed systems
- Learn to build robust systems using distributed messaging brokers
- Learn best practices for configuring Kafka clusters in production
- Write custom Kafka producers and consumers
- Build an application that ingests data from a streaming API
Outline
TOPPart 1: Big Data and Distributed Systems Primer
- Distributed Systems
- High Availability
- Latency and Scalability
- Message Brokers and Queues
- Decoupling Services
- Lambda Architecture
- Data Partitioning
Part 2: Introduction to Apache Kafka
- History
- What is Kafka
- Why Kafka
- Features
- Kafka in Production
- High-Level Architecture
Part 3: Core Concepts
- Kafka Guarantees/Message Ordering
- Delivery Semantics
- Dumb Broker vs. MOM
- Kafka Semantics
Part 4: Kafka Cluster
- Installing Cluster
- Brokers
- Consumers
- Producers
Part 5: Apache Zookeeper
- Cluster Management
- Roles
- Basic Operations
Part 6: Kafka Producers
- Role of Producer
- Records
- Message Durability
- Batching and Compression
- Create Console Producer
- Publishing Data to Topics
Part 7: Kafka Consumers
- Role of Consumer
- Offsets
- Consumers and Logs
- Create Console Consumer
- Performance tuning
- Consumer Groups
- Consumer Parallelism
- Consumer Rebalancing
Part 8: The Kafka Data Model
- Kafka Data Model
- Topics
- Partitions
- Distribution
- Reliability
- Leaders/Followers
- Replication Factor
- Persistence
Part 9: The Kafka API
- Producer API
- Consumer API
- Java, Scala, Python APIs
- Creating/Modifying Topics
- Partitioning Topics
- Reading Data from Kafka
- Writing Data to Kafka
Part 10: Kafka in Production
- Big Data Pipelines
- Microservices
- Case Study: Netflix
- Apache Spark
- Storm and Hadoop
Part: 11: Kafka Streams
- Stream processing
- High-Level Overview
- Demo Application
Prerequisites
TOPParticipants in this workshop should have a working knowledge of at least one programming language (preferably Python, Java, or Scala) and be able to work from the command line in a Linux VM or container.
Who Should Attend
TOP- System architects
- Developers
- Data engineers
- DBAs
- Anyone who wants to learn to use the Kafka messaging system for consuming data in their systems.