Название: Apache Hudi: The Definitive Guide: Building Robust, Open, and High-Performing Data Lakehouses (Early Release)
Автор: Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, Rebecca Bilbro
Издательство: O’Reilly Media, Inc.
Год: 2024-09-24
Язык: английский
Формат: pdf, epub, mobi
Размер: 10.1 MB
Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi. With this practical guide, data engineers, data architects, and software architects will discover how to seamlessly build an interoperable lakehouse from disparate data sources and deliver faster insights using their query engine of choice. Authors Shiyan Xu, Prashant Wason, Sudha Saktheeswaran, and Rebecca Bilbro provide practical examples and insights to help you unlock the full potential of data lakehouses for different levels of analytics, from batch to interactive to streaming. You'll also learn how to evaluate storage choices and leverage built-in automated table optimizations to build, maintain, and operate production data applications. The data platform layer can then become a limiting factor for innovation, straining to provide data fresh enough for analytics, and slowing down use cases of Machine Learning and AI. Herein lies a key advantage of using Hudi to empower analytics for this next generation of data-intensive applications. Hudi is designed to provide native support for near real-time analytics as well as time travel, and this is most evident in the different ways in which data can be read from Hudi.