Автор: M. Niranjanamurthy, Kavita Sheoran, Geetika Dhand
Издательство: Wiley-Scrivener
Год: 2023
Страниц: 357
Язык: английский
Формат: pdf (true)
Размер: 130.3 MB
Data Wrangling written and edited by some of the world’s top experts in the field, this exciting new volume provides state-of-the-art research and latest technological breakthroughs in data wrangling, its theoretical concepts, practical applications, and tools for solving everyday problems. Data wrangling is the process of cleaning and unifying messy and complex data sets for easy access and analysis. This process typically includes manually converting and mapping data from one raw form into another format to allow for more convenient consumption and organization of the data. Data wrangling is increasingly ubiquitous at today’s top firms. Data cleaning focuses on removing inaccurate data from your data set whereas data wrangling focuses on transforming the data’s format, typically by converting “raw” data into another format more suitable for use. Data wrangling is a necessary component of any business. Data wrangling solutions are specifically designed and architected to handle diverse, complex data at any scale, including many applications, such as Datameer, Infogix, Paxata, Talend, Tamr, TMMData, and Trifacta. This book synthesizes the processes of data wrangling into a comprehensive overview, with a strong focus on recent and rapidly evolving agile analytic processes in data-driven enterprises, for businesses and other enterprises to use to find solutions for their everyday problems and practical applications. Whether for the veteran engineer, scientist, or other industry professional, this book is a must have for any library.
Knowing Python programming has become one of the most basic and crucial tasks to be able to enter the field of data science, machine learning and general software development. But at the same time, due to the presence of other languages, like R, MATLAB, SAS, it certainly draws a lot of comparisons too. Off late, Python has undoubtedly become an obvious choice because of its widely used libraries like Pandas and Scikit-learn. Python is also being used for building data applications, given that it is widely acceptable for software engineering practices.
Pandas: The Pandas name is derived from panel data. It is basically a term specifically used to describe a multidimensional dataset that is also structured and plays a vital role in Python data analysis itself. It is due to the presence of libraries, like Pandas, which facilitate working with structured data much efficiently and expressively due to the presence of high-level data structures and functions. They have enabled a powerful and much efficient data analysis environment in Python.
The primary object in pandas that is most commonly used is data frame. A data frame is tabular in nature, i.e., column oriented. This data structure has both row and column labels. The series is a 1-D labeled array object. Pandas library perfectly blends the spreadsheets and relational databases (such as SQL) along with high-performance, array-computing ideas of NumPy. Not only this but it also provides an indexing functionality to easily manipulate arrays by reshape, slice and dice, perform aggregations, and select subsets of data. Since data manipulation, preliminaries preparation, and cleaning is such an important skill in data analysis, knowing Pandas is one of the primary tasks.
R is an extremely flexible statistics programming language and environment that is most importantly Open Source and freely available for almost all operating systems. R has recently experienced an “explosive growth in use and in user contributed software.” R has ample users and has up-to-date statistical methods for analysis. The flexibility of R is unmatched by any other statistics programming language, as its object-oriented programming language allows for the performance of customized procedures by creating functions that help in automation of most commonly performed tasks.
Скачать Data Wrangling: Concepts, Applications and Tools