DENG-251: Building an Open Data Lakehouse Using Apache Iceberg

DENG-251: Building an Open Data Lakehouse Using Apache Iceberg provides hands-on training for professionals interested in building scalable and efficient data lakehouses with the open-source Apache Iceberg framework. This course covers the key concepts of data lakehouse architectures, and how Iceberg provides an open, transactional storage layer for managing large datasets across distributed environments. Through practical exercises, you will learn how to leverage Apache Iceberg for high-performance querying, data versioning, schema evolution, and partitioning, enabling more efficient data management for analytical workloads in the modern data stack.

Schedule & Fee
Learning Objectives
Prerequisites
Target Audience
Course Modules
FAQs

Schedule	Learners	Course Fee	Register Your Interest
April 28^th - 01^st 09:00 - 17:00 (CST) Live Virtual Classroom		USD 1,600 Fast Filling! Hurry Up.
April 21^st - 24^th 09:00 - 17:00 (CST) Live Virtual Classroom		USD 1,600
May 12^th - 21^st 09:00 - 13:00 (CST) Live Virtual Classroom		USD 1,600
June 02^nd - 05^th 09:00 - 17:00 (CST) Live Virtual Classroom		USD 1,600

Course Prerequisites

Familiarity with big data concepts and technologies (Hadoop, Spark, etc.)
Basic knowledge of data lakes and data warehousing
Experience with SQL and working with large datasets
Basic understanding of cloud platforms (AWS, Azure, GCP) is recommended
Familiarity with distributed computing principles

No prior experience with Apache Iceberg is required, but familiarity with modern data engineering tools is helpful.

Learning Objectives

By the end of this course, participants will be able to:

Set up and configure an open data lakehouse using Apache Iceberg
Design and implement efficient data models and partitioning strategies
Ingest and manage large-scale data using Iceberg tables with high performance
Perform fast queries and leverage time travel and versioning for historical data analysis
Implement data governance and security best practices in a data lakehouse
Scale and operationalize the Iceberg data lakehouse for big data analytics

Target Audience

This course is ideal for data engineers, architects, and professionals who are involved in building, managing, and scaling modern data lakehouses. The target audience includes:

Data Engineers
Data Architects
Data Scientists
Cloud Engineers
BI Engineers
Big Data Developers
Data Analysts

Course Modules

Introduction to Data Lakehouse Architecture
- Understanding the core concepts of Data Lakehouses and how they differ from traditional data lakes and warehouses
- Key components and architecture of a modern data lakehouse
- Use cases for building a data lakehouse with Apache Iceberg
Getting Started with Apache Iceberg
- Overview of Apache Iceberg and its role in the data lakehouse ecosystem
- Installation and setup of Apache Iceberg on cloud-based platforms (AWS, Azure, GCP)
- Iceberg’s architecture, metadata handling, and file formats
Data Modeling and Partitioning in Apache Iceberg
- Data modeling best practices for building efficient data lakehouses
- Using partitioning and clustering techniques in Apache Iceberg to optimize query performance
- Understanding schema evolution and how Iceberg manages schema changes over time
Data Ingestion and Management with Apache Iceberg
- Best practices for ingesting large-scale data into Iceberg tables
- Incremental data loading strategies and handling streaming data
- Managing data consistency and transactional operations in Apache Iceberg
Querying Data in the Open Data Lakehouse
- Performing fast, scalable queries using Apache Iceberg tables with Apache Spark, Presto, or Trino
- Leveraging Iceberg’s time travel and versioning capabilities to query historical data
- Optimizing query performance and cost in the data lakehouse
Data Governance and Security in the Data Lakehouse
- Implementing data governance policies using Apache Iceberg
- Securing data in the lakehouse and managing user access control
- Ensuring compliance and data auditing for analytical workloads
Advanced Features of Apache Iceberg
- Managing large-scale datasets efficiently with Iceberg's ACID properties
- Using snapshot isolation and time travel for advanced analytical capabilities
- Integration with other cloud-native tools and frameworks for advanced analytics
Operationalizing and Scaling the Data Lakehouse
- Scaling your Apache Iceberg setup for big data workloads
- Monitoring and maintaining the Iceberg-based data lakehouse environment
- Best practices for disaster recovery, backups, and cluster management

What Our Learners Are Saying

The training, courseware, and lab experience were insightful and valuable. Keep up the great work and learning experience!

Nitish A. Anand – Accenture

Course: SC-200: Microsoft Security Operations Analyst
Date: 15th Jan 2025

The instructor was professional and very content.

Justine Daudi Mlimbilah – Bank of Africa, Tanzania

Course: MD-102: Microsoft 365 Endpoint Administrator
Date: 20th Dec 2024

The instructor was so knowledgeable & humble. Rare to find someone so confident but so down to earth these days. So appreciative to him.”

Mohd. Hassan – Ministry of Finance, UAE

Course: AZ-700: Designing and Implementing Microsoft Azure Networking Solutions
Date: 31st July 2024

Instructor is experienced and knowledgeable in guiding.

Dharshini Mahalaxmi – Dr. MGR Education and Research Institute, Chennai, India

Course: SC-300: Microsoft Identity and Access Administrator
Date: 4th May 2024

DENG-251: Building an Open Data Lakehouse Using Apache Iceberg

Register Your Interest

450K+

250+

100+

Course Prerequisites

Learning Objectives

Target Audience

Course Modules

What Our Learners Are Saying