Genomics in the Azure Cloud: Scaling Your Bioinformatics Workloads Using Enterprise-Grade Solutions

Автор: literator от 14-11-2022, 19:42, Коментариев: 0

Категория: КНИГИ » ПРОГРАММИРОВАНИЕ

Genomics in the Azure Cloud: Scaling Your Bioinformatics Workloads Using Enterprise-Grade SolutionsНазвание: Genomics in the Azure Cloud: Scaling Your Bioinformatics Workloads Using Enterprise-Grade Solutions, First Edition
Автор: Colby T. Ford
Издательство: O’Reilly Media, Inc.
Год: 2023
Страниц: 378
Язык: английский
Формат: epub (true), mobi
Размер: 30.8 MB

This practical guide bridges the gap between general cloud computing architecture in Microsoft Azure and scientific computing for bioinformatics and genomics. You'll get a solid understanding of the architecture patterns and services that are offered in Azure and how they might be used in your bioinformatics practice. You'll get code examples that you can reuse for your specific needs. And you'll get plenty of concrete examples to illustrate how a given service is used in a bioinformatics context.

Who Should Read This Book:
This book is written for people with experience in bioinformatics and genomics but not necessarily cloud computing. This book can also be valuable for those who are cloud engineers looking to get a better understanding of how cloud architecture should work for bioinformatics.
In many university programs, bioinformatics students are exposed to technical concepts to analyze genetic data. Then, they may also be taught how to use on-premises high-performance computing (HPC) environments (clusters) to run larger workloads. These skills will be valuable when transitioning to the cloud. So don’t worry, all the skills you learned in grad school aren’t for naught.

Many online resources for cloud-enabled bioinformatics focus only on human genomics and, more specifically, how to use the Broad Institute’s set of tools (such as GATK, Picard, Cromwell, etc.) in the cloud. I’d like to generalize this book a bit more since, in my experience, people use bits and pieces of software from all over the place, combining them to make a patchwork of tools to fit their specific needs. Also, as an infectious disease guy, I have used lots of different tools in the past that are never mentioned by the human-focused online resources. So all that to say, this book will not be focused on any specific species or suite of tools. I want this book to be useful to scientists no matter if they work on humans or viruses or elephants or plants or whatever…

You'll also get valuable advice on how to:

Use enterprise platform services to easily scale your bioinformatics workloads
Organize, query, and analyze genomic data at scale
Build a genomics data lake and accompanying data warehouse
Use Azure Machine Learning to scale your model training, track model performance, and deploy winning models
Orchestrate and automate processing pipelines using Azure Data Factory and Databricks
Cloudify your organization's existing bioinformatics pipelines by moving your workflows to Azure high-performance compute services

After reading this book, you should have a solid understanding of the services that are offered in Azure and how they might be used in your bioinformatics practice. You’ll be familiar with some standard architecture patterns, including how to store and organize data and how to analyze it at scale. You’ll also understand how to cloudify your organization’s existing bioinformatics pipelines by converting the workflows to work in compute services in Azure.

Some technical things that may be helpful to know:

• Proficiency with Python (or R) is a good idea since this book covers bioinformatics programming and machine learning, which usually require one of these languages.
• Familiarity with database concepts, including some knowledge of SQL, may be useful when we talk about how to store and manage tons of -omics data.
• Knowledge of primary, secondary, and tertiary analysis pipelines will help you brainstorm what steps in your workflow will benefit most from being in the cloud.

Contents:
Preface
1. Essentials of Cloud Architecture
2. Organizing Genomics Data with Data Lakes
3. Querying Variant Data in SQL
4. Orchestrating Data Movement and Transformation
5. Azure Databricks (and Apache Spark)
6. Azure Machine Learning
7. High-Performance Computing and Other Compute Services
8. Deployment, Security, Compliance, and Potpourri
Conclusion
Index

Скачать Genomics in the Azure Cloud: Scaling Your Bioinformatics Workloads Using Enterprise-Grade Solutions








Нашел ошибку? Есть жалоба? Жми!
Пожаловаться администрации
Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.
Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.
Информация
Посетители, находящиеся в группе Гости, не могут оставлять комментарии к данной публикации.