Автор: Thomas W. MacFarland
Издательство: Springer
Год: 2024
Страниц: 536
Язык: английский
Формат: pdf (true)
Размер: 21.5 MB
Introduction to Data Science in Biostatistics: Using R, the Tidyverse Ecosystem, and APIs defines and explores the term "Data Science" and discusses the many professional skills and competencies affiliated with the industry. With Data Science being a leading indicator of interest in STEM fields, the text also investigates this ongoing growth of demand in these spaces, with the goal of providing readers who are entering the professional world with foundational knowledge of required skills, job trends, and salary expectations.
The text provides a historical overview of computing and the field's progression to R as it exists today, including the multitude of packages and functions associated with both Base R and the tidyverse ecosystem. Readers will learn how to use R to work with real data, as well as how to communicate results to external stakeholders. A distinguishing feature of this text is its emphasis on the emerging use of APIs to obtain data.
Beginning with the mid-1970s development of S and its reimagination into R, approximately 20 years later, R remains a leading language in biostatistics. By the mid-2000s, ease of use and functionality with the R language expanded greatly when the tidyverse ecosystem saw its first implementation.
Throughout its evolution to today, R has remained open-source software that is freely available to all. From among its many uses, R supports data acquisition from distant hosts using Application Programming Interface (API) clients, data management and data organization using tidyverse ecosystem tools such as the dplyr package and the tidyr package, and superior production of graphics and maps using the ubiquitous tidyverse ggplot2 package and complementary packages that are ggplot2 compliant. There is also a host of other R-based tools for statistical analyses that work and play well with APIs and the pervasive tidyverse ecosystem. It is argued in this text that R should always be among the first selections in any list of software that supports biostatistics.
This text was developed to assist beginning students and early stages researchers in their attempt to make sense of how software can be used in biostatistics, viewing an all-pervasive concept of biostatistics in the large and the many disciplines associated with biostatistics. To meet this challenge, R was selected as the most appropriate programming language, calling on Base R (e.g., the many functions made available when R is first downloaded) and supporting packages (e.g., the thousands of auxiliary R software collections that provide functionality far beyond what is available in Base R, especially packages associated with the Tidyverse ecosystem).
The use of R and specifically the use of APIs and R’s evolving Tidyverse ecosystem for engagement in biostatistics is the focus of this text. By following along with a gradual exposure to R, APIs, and the tidyverse ecosystem, this text should help beginning students and early stages researchers gradually increase their skills with the use of R syntax for inquiries into biostatistics.
Base R still has many useful and appropriate applications for those data scientists who use R. It would be difficult to totally eschew the use of Base R. Base R is demonstrated throughout this text. Yet, the tidyverse ecosystem has gained such wide acceptance that knowledge and mastery of the many tools associated with the Tidyverse ecosystem is essential for contemporary data scientists. This text introduces the Tidyverse ecosystem that with study should help with this aim.
Скачать Introduction to Data Science in Biostatistics: Using R, the Tidyverse Ecosystem, and APIs