Название: Validity, Reliability, and Significance: Empirical Methods for NLP and Data Science
Автор: Stefan Riezler, Michael Hagmann
Издательство: Morgan & Claypool
Год: 2022
Страниц: 165
Язык: английский
Формат: pdf (true)
Размер: 10.2 MB
Empirical methods are means to answering methodological questions of empirical sciences by statistical techniques. The methodological questions addressed in this book include the problems of validity, reliability, and significance. In the case of Machine Learning, these correspond to the questions of whether a model predicts what it purports to predict, whether a model's performance is consistent across replications, and whether a performance difference between two models is due to chance, respectively. The goal of this book is to answer these questions by concrete statistical tests that can be applied to assess validity, reliability, and significance of data annotation and Machine Learning (ML) prediction in the fields of natural language processing (NLP) and Data Science. This book can be used as an introduction to empirical methods for machine learning in general, with a special focus on applications in NLP and Data Science. The book is self-contained, with an appendix on the mathematical background on GAMs and LMEMs, and with an accompanying webpage including R code to replicate experiments presented in the book.