Bash for Data Scientists

Автор: literator от 12-12-2022, 11:25, Коментариев: 0

Категория: КНИГИ » ПРОГРАММИРОВАНИЕ

Bash for Data ScientistsНазвание: Bash for Data Scientists
Автор: Oswald Campesato
Издательство: Mercury Learning and Information
Год: 2023
Страниц: 293
Язык: английский
Формат: pdf (true)
Размер: 10.2 MB

The goal of this book is to introduce readers to an assortment of powerful command line utilities that can be combined to create simple yet powerful shell scripts for processing datasets. The code samples and scripts use the bash shell, and typically involve small datasets so that you can focus on understanding the features of Grep, Sed, and Awk. Aimed at a reader relatively new to working in a bash environment, the book is comprehensive enough to be a good reference and teach a few new tricks to those who already have some experience with creating shells scripts.

This short book contains a variety of code fragments and shell scripts for data scientists, data analysts, and other people who want shell-based solutions to “clean” various types of datasets. In addition, the concepts and code samples in this book are useful for people who want to simplify routine tasks. This book takes introductory concepts and commands in Bash, and then demonstrates their use in simple yet powerful shell scripts.

This book is intended for general users, data scientists, data analysts, and other people who perform a variety of tasks from the command line, and who also have a modest knowledge of shell programming. You will acquire an understanding of how to use various bash commands, often as part of short shell scripts in later chapters. The chapters also contain simple use cases that illustrate how to perform various tasks involving datasets, such as switching the order of a two-column dataset (Chapter 1), removing control characters in a text file (Chapter 2), find specific lines and merge them (Chapter 3), reformatting a date field in a dataset (Chapter 5), and removing nested quotes (Chapter 6). This book saves you the time required to search for relevant code samples, adapting them to your specific needs, which is a potentially time-consuming process.

The Chapter 7 shows you how to clean data, which includes missing data, incorrect data, and duplicate data. The first part of this chapter contains several Pandas code samples that use Pandas to read CSV files and then calculate statistical values such as the mean, median, mode, and standard deviation. The second part of this chapter uses Pandas to handle missing values in CSV files, starting with CSV files that contain a single column, followed by two-column CSV files. After you have completed this chapter, you will be ready to learn how to “split” CSV files into subregions that are then processed via classification algorithms, such as kNN, decision trees, and random forests.

This chapter contains a mixture of Python-based code samples and an awk-based shell script. All the code samples are straightforward, and if you can follow the Pandas and awk-based code samples in Chapter 1, then you will most likely be able to understand the code samples in this chapter. This chapter requires basic knowledge of Python and Pandas, such as creating Pandas data frames, as well as reading and writing CSV files. Knowledge of the awk programming language is required for three shell scripts that invoke the awk command if you decide to read those code samples.

Companion files with code are available for downloading from the publisher.

Features:
- Provides the reader with powerful command line utilities that can be combined to create simple yet powerful shell scripts for processing datasets
- Contains a variety of code fragments and shell scripts for data scientists, data analysts, and those who want shell-based solutions to “clean” various types of datasets
- Companion files with code available for downloading with Amazon proof of purchase by writing to the publisher.

Table of Contents:
1: Introduction to UNIX.
2: Files and Directories.
3: Useful Commands.
4: Conditional Logic and Loops.
5: Processing Datasets with grep and sed.
6: Processing Datasets with awk.
7: Processing Datasets (Pandas).
8: NoSQL, SQLite, and Python.
Index.

Скачать Bash for Data Scientists








Нашел ошибку? Есть жалоба? Жми!
Пожаловаться администрации
Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.
Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.
Информация
Посетители, находящиеся в группе Гости, не могут оставлять комментарии к данной публикации.