Название: Data Pipelines with Apache Airflow (MEAP)
Автор: Bas P. Harenslak and Julian Rutger de Ruiter
Издательство: Manning Publications
Год: 2020
Формат: PDF, EPUB
Страниц: 230
Размер: 20 Mb
Язык: English
A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodge-podge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack.
About the Technology
Data pipelines are used to extract, transform and load data to and from multiple sources, routing it wherever it’s needed -- whether that’s visualisation tools, business intelligence dashboards, or machine learning models. But pipelines can be challenging to manage, especially when your data has to flow through a collection of application components, servers, and cloud services. That’s where Apache Airflow comes in! Airflow streamlines the whole process, giving you one tool for programmatically developing and monitoring batch data pipelines, and integrating all the pieces you use in your data stack. Airflow lets you schedule, restart, and backfill pipelines, and its easy-to-use UI and workflows with Python scripting has users praising its incredible flexibility.
About the book
Data Pipelines with Apache Airflow is your essential guide to working with the powerful Apache Airflow pipeline manager. Expert data engineers Bas Harenslak and Julian de Ruiter take you through best practices for creating pipelines for multiple tasks, including data lakes, cloud deployments, and data science. Part desktop reference, part hands-on tutorial, this book teaches you the ins-and-outs of the Directed Acyclic Graphs (DAGs) that power Airflow, and how to write your own DAGs to meet the needs of your projects. You’ll learn how to automate moving and transforming data, managing pipelines by backfilling historical tasks, developing custom components for your specific systems, and setting up Airflow in production environments. With complete coverage of both foundational and lesser-known features, when you’re done you’ll be set to start using Airflow for seamless data pipeline development and management.