Автор: Jean-Georges Perrin
Издательство: Manning Publications
Год: 2020
Формат: true pdf/epub
Страниц: 605
Размер: 26.9 Mb
Язык: English
Spark in Action, Second Edition is an entirely new book that teaches you everything you need to create end-to-end analytics pipelines in Spark. Rewritten from the ground up with lots of helpful graphics, you’ll learn the roles of DAGs and dataframes, the advantages of “lazy evaluation”, and ingestion from files, databases, and streams.
By working through carefully-designed Java-based examples, you’ll delve into Spark SQL, interface with Python, and cache and checkpoint your data. Along the way, you’ll learn to interact with common enterprise data technologies like HDFS and file formats like Parquet, ORC, and Avro.
You’ll also discover interesting Spark use cases, like interactive reporting, machine learning pipelines, and even monitoring players in online games. You’ll even get a quick look at machine learning techniques you can apply without a PhD in mathematics! All examples are available in GitHub for you to explore and adapt as you learn. The demand for Spark-savvy developers is so steep, they’re among the highest paid in the industry today!
what's inside
Lots of examples based in the Spark Java APIs using real-life dataset and scenarios
Examples based on Spark v3.0
Ingestion through files, databases, and streaming
Building custom ingestion process
Querying distributed datasets with Spark SQL
Deploying Spark applications
Caching and checkpointing your data
Interfacing with data scientists using Python
Applied machine learning
Spark use cases including Lumeris, CERN, and IBM