Автор: Oswald Campesato
Издательство: Mercury Learning and Information
Год: 2023
Страниц: 387
Язык: английский
Формат: pdf (true)
Размер: 10.2 MB
This book contains a fast-paced introduction to data-related tasks in preparation for training models on datasets. It presents a step-by-step, Python-based code sample that uses the kNN algorithm to manage a model on a dataset. Next, you will see other classification algorithms (on the same dataset), such as decision trees, random forests, SVMs (support vector machines), and Naive Bayes simply by modifying three lines of code. Chapter One begins with an introduction to datasets and issues that can arise, followed by Chapter Two on outliers and anomaly detection. The next chapter explores ways for handling missing data and invalid data, and Chapter Four demonstrates how to train models with classification algorithms. Chapter 5 introduces visualization toolkits, such as Sweetviz, Skimpy, Matplotlib, and Seaborn, along with some simple Python-based code samples that render charts and graphs. An appendix includes some basics on using Awk.
What do i need to know for this book?
The minimum programming requirement is a basic knowledge of Python 3.x because all the code samples are in Python. In some cases, you need a rudimentary understanding of the awk utility, which you can learn through free online tutorials. In addition, you need ta basic understanding of Pandas data frames and the Pandas methods for extracting information from data frames.
Features:
- Covers extensive topics related to cleaning datasets and working with models
- Includes Python-based code samples and a separate chapter on Matplotlib and Seaborn
- Features companion files with source code, datasets, and figures from the book
Table of Contents:
1: Working with Data. 2: Outlier and Anomaly Detection. 3: Cleaning Data Sets.4: Working with Models. 5: Matplotlib and Seaborn. Appendix: Working with awk . Index.
Скачать Managing Datasets and Models