Автор: Satish Mahadevan Srinivasan, Phillip A. Laplante
Издательство: CRC Press
Год: 2023
Страниц: 279
Язык: английский
Формат: pdf (true)
Размер: 26.4 MB
What Every Engineer Should Know About Data-Driven Analytics provides a comprehensive introduction to the theoretical concepts and approaches of Machine Learning that are used in predictive data analytics. By introducing the theory and by providing practical applications, this text can be understood by every engineering discipline. It offers a detailed and focused treatment of the important Machine Learning (ML) approaches and concepts that can be exploited to build models to enable decision making in different domains.
A brief introduction to the basics of R and Python programming is provided here, which will be very helpful for the readers to navigate through the other chapters in this book. Now let’s look at some basics of the Python programming. Readers are recommended to execute the provided scripts here in the Python terminal or Jupyter Notebook. In the Chapter 1, the objective is merely to provide an exposure to programming in R and Python so that the readers can become familiar with the syntax used in R and Python scripts. In the latter chapters, R and Python scripts will be used interchangeably to illustrate the concepts discussed.
Several essential packages in R and Python that will be very helpful for performing data wrangling (structuring and cleaning data) and for performing analytics are listed below. In the latter chapters several of these listed packages will be used to demonstrate their potential for data wrangling and analytics.
Popular packages in Python for Data Wrangling and Analytics include:
Pandas - It is a popular open- source package that provides high- performance, easy- to- use data structures and data analysis tools. Pandas is a perfect tool for data wrangling. It is designed for quick and easy data manipulation, reading, aggregation, and visualization.
NumPy - The NumPy is a general- purpose array- processing package. This package provides high- performance multidimensional array objects and tools to work with the arrays. NumPy is an efficient container of generic multidimensional data.
SciPy - This package builds on the NumPy array object and is part of the stack which includes tools like Matplotlib, Pandas, and SymPy with additional tools. The SciPy library contains modules for efficient mathematical rou tines as linear algebra, interpolation, optimization, integration, and statistics.
Matplotlib - This library supports data visualization. Matplotlib is the plotting library for Python that provides an object- oriented API for embedding
plots into applications.
Scikit Learn - This is a robust machine learning library. It features Machine Learning algorithms including SVMs, random forests, k-means clustering, dimensionality reduction, etc. The Scikit Learn package focuses only on modeling data and not on data manipulation.
The structuring and cleaning of the structured data involve several steps including checking for the normality distribution of the data, detecting and dealing with any data having a bimodal distribution, and resolving issues related to outliers, missing values, skewed data, and duplicate records. In this section, a brief overview is provided to detect and deal with the data irregularities mentioned above. First, we’ll discuss how to deal with a bimodal distribution and check for the normality distribution of the variables. A set of data has a bimodal distribution if the data is distributed in two clusters. We use a simple example using R to visualize a variable with a bimodal distribution and show how to transform this variable into having a normal distribution.
Utilizes practical examples from different disciplines and sectors within engineering and other related technical areas to demonstrate how to go from data, to insight, and to decision making.
Introduces various approaches to build models that exploits different algorithms.
Discusses predictive models that can be built through machine learning and used to mine patterns from large datasets.
Explores the augmentation of technical and mathematical materials with explanatory worked examples.
Includes a glossary, self-assessments, and worked-out practice exercises.
Written to be accessible to non-experts in the subject, this comprehensive introductory text is suitable for students, professionals, and researchers in engineering and Data Science.
Table of Contents:
1. Data Collection and Cleaning. 2. Mathematical Background for Predictive Analytics. 3. Introduction to Statistics, Probability, and Information Theory for Analytics. 4. Introduction to Machine Learning. 5. Unsupervised Learning. 6. Supervised Learning. 7. Natural Language Processing for Analyzing Unstructured Data. 8. Predictive Analytics Using Deep Neural Networks. 9. Convolutional Neural Networks (CNN) for Predictive Analytics. 10. Recurrent Neural Networks (RNNs) for Predictive Analytics. 11. Recommender Systems for Predictive Analytics. 12. Architecting Big Data Analytical Pipeline.
Скачать What Every Engineer Should Know About Data-Driven Analytics