Our cloud training videos have over 100K views on

DENG-251: Building an Open Data Lakehouse Using Apache Iceberg

DENG-251: Building an Open Data Lakehouse Using Apache Iceberg provides hands-on training for professionals interested in building scalable and efficient data lakehouses with the open-source Apache Iceberg framework. This course covers the key concepts of data lakehouse architectures, and how Iceberg provides an open, transactional storage layer for managing large datasets across distributed environments. Through practical exercises, you will learn how to leverage Apache Iceberg for high-performance querying, data versioning, schema evolution, and partitioning, enabling more efficient data management for analytical workloads in the modern data stack.

Register Your Interest

450K+

Career Transformation

250+

Workshop Every Month

100+

Countries and Counting

Schedule Learners Course Fee Register Your Interest
April 28th - 01st
09:00 - 17:00 (CST)
Live Virtual Classroom
USD 1,600
Fast Filling! Hurry Up.
April 21st - 24th
09:00 - 17:00 (CST)
Live Virtual Classroom
USD 1,600
May 12th - 21st
09:00 - 13:00 (CST)
Live Virtual Classroom
USD 1,600
June 02nd - 05th
09:00 - 17:00 (CST)
Live Virtual Classroom
USD 1,600

Course Prerequisites

  • Familiarity with big data concepts and technologies (Hadoop, Spark, etc.)
  • Basic knowledge of data lakes and data warehousing
  • Experience with SQL and working with large datasets
  • Basic understanding of cloud platforms (AWS, Azure, GCP) is recommended
  • Familiarity with distributed computing principles

No prior experience with Apache Iceberg is required, but familiarity with modern data engineering tools is helpful.

Learning Objectives

By the end of this course, participants will be able to:

  • Set up and configure an open data lakehouse using Apache Iceberg
  • Design and implement efficient data models and partitioning strategies
  • Ingest and manage large-scale data using Iceberg tables with high performance
  • Perform fast queries and leverage time travel and versioning for historical data analysis
  • Implement data governance and security best practices in a data lakehouse
  • Scale and operationalize the Iceberg data lakehouse for big data analytics

Target Audience

This course is ideal for data engineers, architects, and professionals who are involved in building, managing, and scaling modern data lakehouses. The target audience includes:

  • Data Engineers
  • Data Architects
  • Data Scientists
  • Cloud Engineers
  • BI Engineers
  • Big Data Developers
  • Data Analysts

Course Modules

  • Introduction to Data Lakehouse Architecture

    • Understanding the core concepts of Data Lakehouses and how they differ from traditional data lakes and warehouses
    • Key components and architecture of a modern data lakehouse
    • Use cases for building a data lakehouse with Apache Iceberg
  • Getting Started with Apache Iceberg

    • Overview of Apache Iceberg and its role in the data lakehouse ecosystem
    • Installation and setup of Apache Iceberg on cloud-based platforms (AWS, Azure, GCP)
    • Iceberg’s architecture, metadata handling, and file formats
  • Data Modeling and Partitioning in Apache Iceberg

    • Data modeling best practices for building efficient data lakehouses
    • Using partitioning and clustering techniques in Apache Iceberg to optimize query performance
    • Understanding schema evolution and how Iceberg manages schema changes over time
  • Data Ingestion and Management with Apache Iceberg

    • Best practices for ingesting large-scale data into Iceberg tables
    • Incremental data loading strategies and handling streaming data
    • Managing data consistency and transactional operations in Apache Iceberg
  • Querying Data in the Open Data Lakehouse

    • Performing fast, scalable queries using Apache Iceberg tables with Apache Spark, Presto, or Trino
    • Leveraging Iceberg’s time travel and versioning capabilities to query historical data
    • Optimizing query performance and cost in the data lakehouse
  • Data Governance and Security in the Data Lakehouse

    • Implementing data governance policies using Apache Iceberg
    • Securing data in the lakehouse and managing user access control
    • Ensuring compliance and data auditing for analytical workloads
  • Advanced Features of Apache Iceberg

    • Managing large-scale datasets efficiently with Iceberg's ACID properties
    • Using snapshot isolation and time travel for advanced analytical capabilities
    • Integration with other cloud-native tools and frameworks for advanced analytics
  • Operationalizing and Scaling the Data Lakehouse

    • Scaling your Apache Iceberg setup for big data workloads
    • Monitoring and maintaining the Iceberg-based data lakehouse environment
    • Best practices for disaster recovery, backups, and cluster management

What Our Learners Are Saying