Название: Statistics for Data Science and Analytics
Автор: Peter C. Bruce, Peter Gedeck, Janet Dobbins
Издательство: Wiley
Год: 2025
Страниц: 366
Язык: английский
Формат: pdf (true), epub
Размер: 30.7 MB
Introductory statistics textbook with a focus on Data Science topics such as prediction, correlation, and data exploration. Statistics for Data Science and Analytics is a comprehensive guide to statistical analysis using Python, presenting important topics useful for data science such as prediction, correlation, and data exploration. The authors provide an introduction to statistical science and big data, as well as an overview of Python data structures and operations. A range of statistical techniques are presented with their implementation in Python, including hypothesis testing, probability, exploratory data analysis, categorical variables, surveys and sampling, A/B testing, and correlation. The text introduces binary classification, a foundational element of Machine Learning, validation of statistical models by applying them to holdout data, and probability and inference via the easy-to-understand method of resampling and the bootstrap instead of using a myriad of “kitchen sink” formulas. Regression is taught both as a tool for explanation and for prediction. Python is a general programming language that can be used in many different areas. It is especially popular in the Machine Learning and Data Science communities. A wide range of libraries provide efficient solutions for almost every need, from simple one-off scripts, to web servers, and highly complex scientific applications. As we will see throughout this book, it also has great support for statistics.