Автор: Daniel Sorensen
Издательство: Springer
Год: 2023
Страниц: 696
Язык: английский
Формат: pdf (true)
Размер: 16.7 MB
This book provides an introduction to computer-based methods for the analysis of genomic data. Breakthroughs in molecular and computational biology have contributed to the emergence of vast data sets, where millions of genetic markers for each individual are coupled with medical records, generating an unparalleled resource for linking human genetic variation to human biology and disease. Similar developments have taken place in animal and plant breeding, where genetic marker information is combined with production traits. An important task for the statistical geneticist is to adapt, construct and implement models that can extract information from these large-scale data. An initial step is to understand the methodology that underlies the probability models and to learn the modern computer-intensive methods required for fitting these models. The objective of this book, suitable for readers who wish to develop analytic skills to perform genomic research, is to provide guidance to take this first step.
This book is addressed to numerate biologists who typically lack the formal mathematical background of the professional statistician. For this reason, considerably more detail in explanations and derivations is offered. It is written in a concise style and examples are used profusely. A large proportion of the examples involve programming with the open-source package R. Most of today’s students are competent in R and there are many tutorials online for the uninitiated. The R-code needed to solve the exercises is provided in all cases and is written, with few exceptions, with the objective of being transparent rather than efficient. The reader has the opportunity to run the codes and to modify input parameters in an experimental fashion. This hands-on computing contributes to a better understanding of the underlying theory. The MarkDown interface allows the students to implement the code on their own computer, contributing to a better understanding of the underlying theory.
Part I presents methods of inference based on likelihood and Bayesian methods, including computational techniques for fitting likelihood and Bayesian models. Part II discusses prediction for continuous and binary data using both frequentist and Bayesian approaches. Some of the models used for prediction are also used for gene discovery. The challenge is to find promising genes without incurring a large proportion of false positive results. Therefore, Part II includes a detour on False Discovery Rate assuming frequentist and Bayesian perspectives. The last chapter of Part II provides an overview of a selected number of non-parametric methods. Part III consists of exercises and their solutions.
The majority of the datasets used in the book are simulated and intend to illustrate important features of real-life data. The size of the simulated data is kept within the limits necessary to obtain solutions in reasonable CPU time, using straightforward R-code, although the reader may modify size by changing input parameters. Advanced computational techniques required for the analysis of very large datasets are not addressed. This subject requires a specialised treatment beyond the scope of this book.
Daniel Sorensen holds PhD and DSc degrees from the University of Edinburgh and is an elected Fellow of the American Statistical Association. He was professor of Statistical Genetics at Aarhus University where, at present, he is professor emeritus.
Скачать Statistical Learning in Genetics: An Introduction Using R