Автор: Gaurav Ashok Thalpati
Издательство: O’Reilly Media, Inc.
Год: 2024
Страниц: 351
Язык: английский
Формат: epub
Размер: 10.1 MB
This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact your data platform, from managing structured and unstructured data and supporting BI and AI/ML use cases to enabling more rigorous data governance and security measures.
Lakehouse architecture is one such modern architectural pattern that has evolved in the last few years. It has become a popular choice for data architects who are designing data platforms. In the Chapter 1, I’ll introduce you to fundamental concepts related to data architecture, data platform and its core components, and how data architecture helps build a data platform. Once you have understood these, I’ll explain why there is a need for new architectural patterns like lakehouse, lakehouse fundamentals, its characteristics, and the benefits of implementing a data platform using lakehouse architecture. I’ll conclude the chapter with key takeaways to summarize everything we discuss and help you remember the key points while reading the subsequent chapters in this book.
Practical Lakehouse Architecture shows you how to:
Understand key lakehouse concepts and features like transaction support, time travel, and schema evolution
Understand the differences between traditional and lakehouse data architectures
Differentiate between various file formats and table formats
Design lakehouse architecture layers for storage, compute, metadata management, and data consumption
Implement data governance and data security within the platform
Evaluate technologies and decide on the best technology stack to implement the lakehouse for your use case
Make critical design decisions and address practical challenges to build a future-ready data platform
Start your lakehouse implementation journey and migrate data from existing systems to the lakehouse
Chapter 1 introduces you to lakehouse architecture and the key concepts, features, and benefits of implementing a data platform using lakehouse architecture. This chapter will also help you understand the fundamental concepts for building data platforms.
Chapter 2 discusses traditional architectures like data warehouses and data lakes and covers how lakehouse architecture stands out compared to these patterns. If you are new to data warehouses or data lakes, this chapter will be a good primer for understanding these architectures.
Chapter 3 explores the storage layer—the heart of the lakehouse. This chapter explains open table formats like Apache Iceberg, Apache Hudi, and Delta Lake. It also describes the key considerations for evaluating different file and table formats in order to select the right one for your use case.
Chapter 4 focuses on data catalogs and will help you understand the overall metadata management process within a lakehouse. It provides an overview of data catalog services across AWS, Azure, and GCP platforms, along with some popular third-party products.
Chapter 5 explores the different compute engine options for data engineering and consumption activities. It describes factors that will impact your decision making process when selecting the right compute engine.
Chapter 6 discusses the governance and security aspects of data and AI assets within a lakehouse. It also lists the activities you should perform, based on your role, to maintain the governance and security of data within the lakehouse.
Chapter 7 gives the big picture view of designing your lakehouse by combining storage, compute, and data catalogs. This chapter is critical for data architects who have to make choices during the design process. At the end of this chapter, you will find a questionnaire you can refer to during talks with different stakeholders.
While all the previous chapters discuss an ideal lakehouse implementation, Chapter 8 provides a reality check by highlighting the challenges you can face while implementing a lakehouse. This chapter gives you ideal versus real-world scenarios and explains how to tackle these to build a lakehouse in the real world.
The final chapter, Chapter 9, explores the future of lakehouses. It introduces some of the new file and table formats, innovative products, and new approaches to implementing a lakehouse platform.
Who Should Read This Book?
This book is for all data practitioners who handle large volumes of data and are responsible for designing and implementing modern data platforms. This book is a comprehensive guide for data architects and can help them understand key considerations, establish design principles, and make critical decisions when implementing a data platform. For data engineers, this book will help them understand key concepts like open table formats, schema evolution, and time travel, which they can leverage when implementing data pipelines. Other data personas, like data analysts and data scientists, will learn about crucial topics like lakehouse data management, data discovery, access control, and sensitive data handling. Data practitioners new to lakehouse architecture can read this book to learn the core concepts. Experienced data architects and senior data engineers can use this guide to make key design decisions during the design phase. And data leaders can refer to this book when planning their lakehouse initiatives.
Скачать Practical Lakehouse Architecture: Designing and Implementing Modern Data Platforms at Scale (Final Release)