Apple: Drilling down into ‘differential privacy’

A tour of the pros and cons, from original sources.

Craig Federighi, WWDC Keynote: We believe you should have great features and great privacy. Differential privacy is a research topic in the areas of statistics and data analytics that uses hashing, subsampling and noise injection to enable crowdsourced learning while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale.

Apple, iOS Preview Guide: Starting with iOS 10, Apple is using Differential Privacy technology to help discover the usage patterns of a large number of users without compromising individual privacy. To obscure an individual’s identity, Differential Privacy adds mathematical noise to a small sample of the individual’s usage pattern. As more people share the same pattern, general patterns begin to emerge, which can inform and enhance the user experience. In iOS 10, this technology will help improve QuickType and emoji suggestions, Spotlight deep link suggestions and Lookup Hints in Notes.

Cynthia Dwork, Microsoft Research: Differential Privacy Abstract. Contrary to intuition, a variant of [Dalenius’ 1977] result threatens the privacy even of someone not in the database. This state of affairs suggests a new measure, differential privacy, which, intuitively, captures the increased risk to one’s privacy incurred by participating in a database. The techniques developed in a sequence of papers, culminating in those described in [12], can achieve any desired level of privacy under this measure. In many cases, extremely accurate information about the database can be provided while simultaneously ensuring very high levels of privacy.

Theoretical Antagonist, MediumDifferential Privacy considered not practical. Differential Privacy is an elegant & beautiful mathematical theory. Much like the Statistical Learning Theory (by Vapnik & Chervonenkis) and Probably Approximately Correct learning framework (by Leslie Valiant), it provides fundamental insights about privacy and data mining… However in the last six years it has failed to gain any traction… Assuming one ignores all other practical issues such as availability of external statistics, un-bounded noise distributions, lack of well maintained code. One still runs into a problem of budgeting.

Matthew Green, Cryptography Engineering: What is Differential Privacy? To give an absolutely crazy example of how big the tradeoffs can be, consider this paper by Frederikson et al. from 2014. The authors began with a public database linking Warfarin dosage outcomes to specific genetic markers. They then used ML techniques to develop a dosing model based on their database — but applied DP at various privacy budgets while training the model. Then they evaluated both the information leakage and the model’s success at treating simulated “patients”. The results showed that the model’s accuracy depends a lot on the privacy budget on which it was trained. If the budget is set too high, the database leaks a great deal of sensitive patient information — but the resulting model makes dosing decisions that are about as safe as standard clinical practice. On the other hand, when the budget was reduced to a level that achieved meaningful privacy, the “noise-ridden” model had a tendency to kill its “patients”. 

Ben Thompson, Stratechery: The broader challenge for Apple is this: in a fair fight the company would have a hard time matching Google or Facebook’s big data capabilities, which increasingly means a worse user experience, but this isn’t a fair fight: Apple is tying its own arm behind its back. The focus on privacy is admirable, to be sure, but there is absolutely a conflict with Apple’s focus on the user experience, and my question is whether or not Apple is being explicit in their decision-making about balancing the chances of datasets being stolen (or abused) + de-anonymized + compromising information being found + that information being abused, versus taking reasonable privacy steps (i.e. anonymizing data) that are not perfect but make it much easier to enhance the user experience for its hundreds of millions of users.

More as it comes in. 


  1. Tom Sidla said:

    Only Apple offers the choice of staying private. I can use Duck Duck Go or Google Search, Apple Maps or Google Maps, Facebook if I want, content blockers, etc. I have options.

    Who cares if Apple services aren’t the best because of privacy constraints. I can adjust accordingly.

    June 15, 2016

Leave a Reply