Автор: Jonathan Rioux
Издательство: Manning Publications
Год: 2020
Формат: true pdf/epub/mobi+code
Страниц: 221
Размер: 11 Mb
Язык: English
PySpark in Action is a carefully engineered tutorial that helps you use PySpark to deliver your data-driven applications at any scale. This clear and hands-on guide shows you how to enlarge your processing capabilities across multiple machines with data from any source, ranging from Hadoop-based clusters to Excel worksheets. You’ll learn how to break down big analysis tasks into manageable chunks and how to choose and use the best PySpark data abstraction for your unique needs. By the time you’re done, you’ll be able to write and run incredibly fast PySpark programs that are scalable, efficient to operate, and easy to debug.