Site Reliability Engineering. How Google Runs Production Systems

Автор: SCART56 от 11-02-2018, 22:59, Коментариев: 0

Категория: КНИГИ » WEB-РАЗРАБОТКИ


Название: Site Reliability Engineering. How Google Runs Production Systems
Автор: Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy
Издательство: O’Reilly Media
Год: 2016
Страниц: 550
ISBN: 149192912X, 9781491929124
Формат: PDF
Размер: 10 Мб
Язык: English

Building and operating distributed systems is fundamental to large-scale production infrastructure, but doing so in a scalable, reliable, and efficient way requires a lot of good design, and trial and error. In this collection of essays and articles, key members of the Site Reliability Team at Google explain how the company has successfully navigated these deep waters over the past decade.

You’ll learn how Google continuously monitors and deploys some of the largest software systems in the world, how its Site Reliability Engineering team learns and improves after outages, and how they balance risk-taking vs reliability with error budgets.

Introduction
The Production Environment at Google, from the Viewpoint of an SRE
Embracing Risk
Service Level Objectives
Eliminating Toil
Monitoring Distributed Systems
The Evolution of Automation at Google
Release Engineering
Simplicity
Practical Alerting from Time-Series Data
Being On-Call
Effective Troubleshooting
Emergency Response
Managing Incidents
Postmortem Culture: Learning from Failure
Tracking Outages
Testing for Reliability
Software Engineering in SRE
Load Balancing at the Frontend
Load Balancing in the Datacenter
Handling Overload
Addressing Cascading Failures
Managing Critical State: Distributed Consensus for Reliability
Distributed Periodic Scheduling with Cron
Data Processing Pipelines
Data Integrity: What You Read Is What You Wrote
Reliable Product Launches at Scale
Accelerating SREs to On-Call and Beyond
Dealing with Interrupts
Embedding an SRE to Recover from Operational Overload
Communication and Collaboration in SRE
The Evolving SRE Engagement Model
Lessons Learned from Other Industries
Conclusion



Скачать Betsy Beyer и др. - Site Reliability Engineering. How Google Runs Production Systems (2016)




.







Нашел ошибку? Есть жалоба? Жми!
Пожаловаться администрации
Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.
Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.
Информация
Посетители, находящиеся в группе Гости, не могут оставлять комментарии к данной публикации.