How to understand the $\epsilon$ value in differntial Privacy
After reading so many papers about the application of differential privacy on machine learning, a nature question is raised in my mind:
\[\textit{"So my algorithm is $\epsilon$-DP, but what does it mean?"}\]In fact the answer to this question is not that hard if you know some basic probability theory - Bayes’ rule to be more specific.
We imagine ourselves as an attacker tries to figure out whether someone (the target) is in $S$. Let’s be the strongeest attacker we can think of: we know all the database, except the target. Our prior guess satisfies
\[\mathbb{P}[S = S_{\text{yes}}] = 1 - \mathbb{P}[S = S_{\text{no}}].\]Now the $\epsilon$-DP algorithm $A$ returns an output $O$, we have to compare $\mathbb{P}[S = S_{\text{yes}}]$ with the posterior $\mathbb{P}[S = S_{\text{yes}} | A(S) \in O]$. By the Bayes’ rule, we know |
By the definition of conditional probability, we know
\[\frac{\mathbb{P}[S = S_{\text{yes}}|A(S) \in O]}{\mathbb{P}[S = S_{\text{no}}|A(S) \in O]} = \frac{\mathbb{P}[S = S_{\text{yes}}]}{\mathbb{P}[S = S_{\text{no}}]} \cdot \frac{\mathbb{P}[A(S_{\text{yes}}) \in O]}{\mathbb{P}[A(S_{\text{no}}) \in O]}.\]Now by the DP constraint, we know
\[e^{-\epsilon}\leq \frac{\mathbb{P}[A(S_{\text{yes}}) \in O]}{\mathbb{P}[A(S_{\text{no}}) \in O]} \leq e^\epsilon.\]Plugging it into the formula above, we know
\[e^{-\epsilon} \cdot \frac{\mathbb{P}[S = S_{\text{yes}}]}{\mathbb{P}[S = S_{\text{no}}]} \leq \frac{\mathbb{P}[S = S_{\text{yes}}|A(S) \in O]}{\mathbb{P}[S = S_{\text{no}}|A(S) \in O]} \leq e^\epsilon \cdot \frac{\mathbb{P}[S = S_{\text{yes}}]}{\mathbb{P}[S = S_{\text{no}}]}.\]Solving for $\mathbb{P}[S = S_{\text{yes}} | A(S) \in O]$, we know |
For example, if the prior is $50\%$ and $\epsilon = 1.1$, then the posterior is between $25\%$ to $75\%$.
Reading the book by Dwork will give you a deeper understanding on what does differential privacy guarantee. We will omit it here.
Leave a comment