Автор: Catherine Nelson
Издательство: O’Reilly Media, Inc.
Год: 2024
Страниц: 258
Язык: английский
Формат: pdf (true), epub (true)
Размер: 11.7 MB
Data science happens in code. The ability to write reproducible, robust, scaleable code is key to a data science project's success—and is absolutely essential for those working with production code. This practical book bridges the gap between data science and software engineering,and clearly explains how to apply the best practices from software engineering to Data Science.
All the code examples in this book are written in Python, and many of the chapters describe Python-specific tools. In recent years, Python has become the most popular programming language for Data Science. Python has an extremely solid set of open source libraries for data science, with good backing and a healthy community of maintainers. Large trend-setting companies have chosen Python for their main ML frameworks, including TensorFlow (Google) and PyTorch (Meta). Because of this, Python appears to be especially popular among data scientists working on production machine learning code, where good coding skills are particularly important. In my experience, the Python community has been friendly and welcoming, with many excellent events that have helped me improve my skills. It’s my preferred programming language, so it was an easy choice for this book.
If you want to write better Data Science code, this guide covers the essential topics that are often missing from introductory data science or coding classes, including how to:
• Understand data structures and object-oriented programming
• Clearly and skillfully document your code
• Package and share your code
• Integrate data science code with a larger code base
• Learn how to write APIs
• Create secure code
• Apply best practices to common tasks such as testing, error handling, and logging
• Work more effectively with software engineers
• Write more efficient, maintainable, and robust code in Python
• Put your data science projects into production
• And more
Who Is This Book For?
This book is aimed at data scientists, but people working in closely related fields such as data analysts, machine learning (ML) engineers, and data engineers will also find it useful. I’ll explain well-established software engineering principles that will be useful to anyone who writes code, but the examples I’ll use to illustrate these principles will be most familiar to data scientists.
I’ve aimed to make this book accessible to data scientists who are relatively new to the field. Maybe you’ve just finished a degree in data science or you’re starting your first job in industry. This book will cover the practical software engineering skills that are not always included in introductory data science courses. Or maybe you didn’t take a formal data science course. Maybe you’re self-taught or you’re moving into data science from math or another science. No matter which route you’re taking into data science, this book is for you.
More experienced data scientists will also learn a great deal, and you’ll find this book especially useful if you’re in a job where you’ll often interact with software developers. You’ll learn the skills that will help you work effectively on a larger codebase and how to write Python code that will work efficiently in production.
I’m assuming that you already know the fundamentals of data science, including data exploration, data visualization, data wrangling, basic ML, and the math skills that go along with these. I’m also assuming that you already know the basics of how to code in Python: how to write functions and control flow statements, and the basics of how to use modules including NumPy, Matplotlib, pandas, and scikit-learn.
Скачать Software Engineering for Data Scientists: From Notebooks to Scalable Systems (Final)