Автор: Andrew Treadway
Издательство: Manning Publications
Год: 2023
Страниц: 213
Язык: английский
Формат: pdf (true)
Размер: 12.2 MB
These easy to learn and apply software engineering techniques will radically improve collaboration, scaling, and deployment in your Data Science projects.
In Software Engineering for Data Scientists you’ll learn to improve performance and efficiency by:
Using source control
Handling exceptions and errors in your code
Improving the design of your tools and applications
Scaling code to handle large data efficiently
Testing model and data processing code before deployment
Scheduling a model to run automatically
Packaging Python code into reusable libraries
Generating automated reports for monitoring a model in production
Software Engineering for Data Scientists presents important software engineering principles that will radically improve the performance and efficiency of Data Science projects. Author and Meta data scientist Andrew Treadway has spent over a decade guiding models and pipelines to production. This practical handbook is full of his sage advice that will change the way you structure your code, monitor model performance, and work effectively with the software engineering teams.
Jupyter Notebook is a popular tool for data scientists because it integrates coding with being able to visualize the results of code, such as plots or tables, all in one seamless environment. While you can commit Jupyter Notebook files, just like most other files, it can be more difficult to handle merge conflicts when two users modify the same notebook. This is because Jupyter Notebook files are more complex than simple Python files or R files (these are not much different than plain text files). Jupyter Notebook files are comprised of HTML, markdown, source code, and potentially images all embedded inside JSON. Thus, trying to programmatically identify the differences between files using Git is quite challenging. However, there are a few alternatives to easily identify the differences between two Jupyter Notebook files. One alternative is a Python package called, which we’ll dive into next.
about the technology
Many basic software engineering skills apply directly to Data Science! As a data scientist, learning the right software engineering techniques can save you a world of time and frustration. Source control simplifies sharing, tracking, and backing up code. Testing helps reduce future errors in your models or pipelines. Exception handling automatically responds to unexpected events as they crop up. Using established engineering conventions makes it easy to collaborate with software developers. This book teaches you to handle these situations and more in your Data Science projects.
Скачать Software Engineering for Data Scientists (MEAP v2)