Автор: Anthony Sarkis
Издательство: O’Reilly Media, Inc.
Год: 2023-01-26
Страниц: 204
Язык: английский
Формат: epub
Размер: 10.1 MB
Your training data has as much to do with the success of your data project as the algorithms themselves--most failures in deep learning systems relate to training data. But while training data is the foundation for successful machine learning, there are few comprehensive resources to help you ace the process. This hands-on guide explains how to work with and scale training data.
What is Training Data? Training Data is the control of a Supervised System. Training Data controls the system by defining the ground truth goals for the creation of Machine Learning models. This involves technical representations, people decisions, processes, tooling, system design, and a variety of new concepts specific to Training Data. In a sense, a Training Data mindset is a paradigm upon which a growing list of theories, research and standards are emerging. A Machine Learning (ML) Model that is created as the end result of a ML Training Process.
Training Data is not an algorithm, nor is it tied to a specific Machine Learning approach. Rather it’s the definition of what we want to achieve. A fundamental challenge is effectively identifying and mapping the desired human meaning into a machine readable form. The effectiveness of training data depends primarily on how well it relates to the human defined meaning and how reasonably it represents real model usage. Practically, choices around Training Data have a huge impact on the ability to train a model effectively.
Let’s jump to code for a moment to think about this. Imagine I can create a new dataset object in Python:
my_dataset = Dataset(“Example”)
This is an empty set. There are no raw data elements.
You'll gain a solid understanding of the concepts, tools, and processes needed to:
Design, deploy, and ship training data for production-grade deep learning applications
Integrate with a growing ecosystem of tools
Recognize and correct new training data-based failure modes
Improve existing system performance and avoid development risks
Confidently use automation and acceleration approaches to more effectively create training data
Avoid data loss by structuring metadata around created datasets
Clearly explain training data concepts to subject matter experts and other shareholders
Successfully maintain, operate, and improve your system
Скачать Training Data for Machine Learning (Seventh release)