Автор: Katharine Jarmul
Издательство: O’Reilly Media, Inc.
Год: 2023-03-02
Страниц: 384
Язык: английский
Формат: epub
Размер: 10.6 MB
Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been so much pressure for data scientists to ensure data privacy. Unfortunately, integrating privacy into your data science workflow is still complicated. This essential guide will give you solid advice and best practices on breakthrough privacy-enhancing technologies such as encrypted learning and differential privacy--as well as a look at emerging technologies and techniques in the field.
Federated Learning (FL) and distributed Data Science provide new ways to think about how you do data analysis by keeping data at the edge: on phones, laptops, edge services — or even on-premise architecture or separate cloud architecture when working with partners. The data is not collected or copied to your own cloud or storage before you do analysis or Machine Learning. In this chapter, you’ll learn how this works in practice and determine when this approach is appropriate for a given use case. You’ll also evaluate how to offer privacy via other tools, along with what types of data or engineering problems federated approaches can solve and which are a poor fit.
In Data Science, you are almost always using distributed data. Every time you start up a Kubernetes or Hadoop cluster or use a multi-cloud setup for data analysis, your data is de facto distributed. Because this is becoming “the norm”, it means that distributed data analysis is increasingly built into the tools and systems you use as a data professional. But what I am referring to in this chapter is taking distributed data and moving it further away from your core processing. What if, instead of distributing data in your own data centers, or clouds or clusters, you actually kept data where it originated and ran your analysis across hundreds, thousands or even millions of smaller, distributed datasets?
Cryptographic protocols are used to encrypt, transmit, compute and decrypt information. A protocol is a plan and way to exchange information — usually between multiple computers or parties — in order to communicate or compute together. When you browse the internet, you are utilizing several encryption and networking protocols at once — including TLS, DNS and HTTPS!
Practical Data Privacy answers important questions such as:
What do privacy regulations like GDPR and CCPA mean for my project?
What does "anonymized data" really mean?
Should I anonymize the data? If so, how?
Which privacy techniques fit my project and how do I incorporate them?
What are the differences and similarities between privacy-preserving technologies and methods?
How do I utilize an open-source library for a privacy-enhancing technique?
How do I ensure that my projects are secure by default and private by design?
How do I create a plan for internal policies or a specific data project that incorporates privacy and security from the start?
Contents:
Preface
1. Data Governance and Simple Privacy Approaches
2. Anonymization
3. Building Privacy into Data Pipelines
4. Privacy Attacks
5. Privacy-Aware Machine Learning and Data Science
6. Federated Learning and Data Science
7. Encrypted Computation
8. Navigating the Legal Side of Privacy
9. Privacy and Practicality Considerations
10. Frequently Asked Questions (and Their Answers!)
11. Go Forth and Engineer Privacy!
Скачать Practical Data Privacy (6th Early Release)