Thursday, May 3, 2012

Does Hashing Make Data “Anonymous”?

Felten, Ed. "Does Hashing Make Data “Anonymous”?"  Tech@FTC Blog, April 23, 2012.

From the blog: "One of the most misunderstood topics in privacy is what it means to provide “anonymous” access to data. One often hears references to “hashing” as a way of rendering data anonymous. As it turns out, hashing is vastly overrated as an “anonymization” technique. In this post, I’ll talk about what hashing is, and why it often fails to provide effective anonymity.

What is hashing anyway? What we’re talking about is technically called a “cryptographic hash function” (or, to super hardcore theory nerds, a randomly chosen member of a pseudorandom function family–but I digress). I’ll just call it a “hash” for short. A hash is a mathematical function: you give it an input value and the function thinks for a while and then emits an output value; and the same input always yields the same output. What makes a hash special is that it is as unpredictable as a mathematical function can be–it is designed so that there is no rhyme or reason to its behavior, except for the iron rule that the same input always yields the same output. (In this post I’ll use a hash called SHA-1.)" Read more

See also
Felten, Ed. "Are Pseudonyms “Anonymous”?" Tech@FTC Blog, April 30, 2012.