Автор: Katharine Jarmul
Издательство: O’Reilly Media, Inc.
Год: 2023
Страниц: 500
Язык: английский
Формат: pdf, epub (true)
Размер: 12.6 MB
Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been so much pressure to ensure data privacy. Unfortunately, integrating privacy into data systems is still complicated. This essential guide will give you a fundamental understanding of modern privacy building blocks, like differential privacy, federated learning, and encrypted computation. Based on hard-won lessons, this book provides solid advice and best practices for integrating breakthrough privacy-enhancing technologies into production systems.
What Is Data Privacy? In a simple sense, data privacy protects data and people by enabling and guaranteeing more privacy for data via access, use, processing, and storage controls. Usually this data is people-related, but it applies to all types of processing. This definition, however, doesn’t fully cover the world of data privacy.
Federated Learning (FL) and distributed Data Science provide new ways to think about how you do data analysis by keeping data at the edge: on phones, laptops, edge services — or even on-premise architecture or separate cloud architecture when working with partners. The data is not collected or copied to your own cloud or storage before you do analysis or Machine Learning. In this chapter, you’ll learn how this works in practice and determine when this approach is appropriate for a given use case. You’ll also evaluate how to offer privacy via other tools, along with what types of data or engineering problems federated approaches can solve and which are a poor fit.
In Data Science, you are almost always using distributed data. Every time you start up a Kubernetes or Hadoop cluster or use a multi-cloud setup for data analysis, your data is de facto distributed. Because this is becoming “the norm”, it means that distributed data analysis is increasingly built into the tools and systems you use as a data professional. But what I am referring to in this chapter is taking distributed data and moving it further away from your core processing. What if, instead of distributing data in your own data centers, or clouds or clusters, you actually kept data where it originated and ran your analysis across hundreds, thousands or even millions of smaller, distributed datasets?
Cryptographic protocols are used to encrypt, transmit, compute and decrypt information. A protocol is a plan and way to exchange information — usually between multiple computers or parties — in order to communicate or compute together. When you browse the internet, you are utilizing several encryption and networking protocols at once — including TLS, DNS and HTTPS!
Practical Data Privacy answers important questions such as:
What do privacy regulations like GDPR and CCPA mean for my data workflows and data science use cases?
What does "anonymized data" really mean? How do I actually anonymize data?
How does federated learning and analysis work?
Homomorphic encryption sounds great, but is it ready for use?
How do I compare and choose the best privacy-preserving technologies and methods? Are there open-source libraries that can help?
How do I ensure that my data science projects are secure by default and private by design?
How do I work with governance and infosec teams to implement internal policies appropriately?
Contents:
Скачать Practical Data Privacy (Final Release)