Автор: Martin Hugh Monkman
Издательство: CRC Press
Серия: Data Science Series
Год: 2024
Страниц: 236
Язык: английский
Формат: pdf (true), epub
Размер: 10.1 MB
The Data Preparation Journey: Finding Your Way With R introduces the principles of data preparation within in a systematic approach that follows a typical Data Science or statistical workflow. With that context, readers will work through practical solutions to resolving problems in data using the statistical and data science programming language R. These solutions include examples of complex real-world data, adding greater context and exposing the reader to greater technical challenges. This book focuses on the Import to Tidy to Transform steps. It demonstrates how “Visualise” is an important part of Exploratory Data Analysis, a strategy for identifying potential problems with the data prior to cleaning.
This book is designed for readers with a working knowledge of data manipulation functions in R or other programming languages. It is suitable for academics for whom analyzing data is crucial, businesses who make decisions based on the insights gleaned from collecting data from customer interactions, and public servants who use data to inform policy and program decisions. The principles and practices described within The Data Preparation Journey apply regardless of the context.
Key Features:
Includes R package containing the code and data sets used in the book
Comprehensive examples of data preparation from a variety of disciplines
Defines the key principles of data preparation, from access to publication
It is assumed that the reader of this book will have a working knowledge of the fundamental data manipulation functions in R (whether base or tidyverse or packages beyond those) or another programming language that supports that work. If you can filter for specific values in the variables and select the columns you want, know the difference between a character string and a numeric value ("1" or 1), and can create a new variable as the result of a manipulation of others, then we’re on our way.
This book leans heavily on R Markdown, particularly when it comes to describing documentation and the packages of the tidyverse. Familiarity with both will be very helpful.
The first three chapters of this book provide some foundations, elements of the data preparation process that will help guide our thinking and our work, including data documentation (or recordkeeping).
Chapters 4 through 10 cover importing data from a variety of sources that are commonly encountered, including plain-text, Excel, statistical software formats, PDF files, internet sources, and databases.
Chapters 11 and 12 tackle finding problems in our data, and then dealing with those problems. Finally Chapter 13 presents a short summary and poses the question, “Where to from here?”
Скачать The Data Preparation Journey: Finding Your Way with R