Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This document has some bullshit example: Identifying users by zipcode and gender? huh?

"Imagine that a covered entity is considering sharing the information in the table to the left in Figure 3. This table is devoid of explicit identifiers, such as personal names and Social Security Numbers. The information in this table is distinguishing, such that each row is unique on the combination of demographics (i.e., Age, ZIP Code, and Gender). Beyond this data, there exists a voter registration data source, which contains personal names, as well as demographics (i.e., Birthdate, ZIP Code, and Gender), which are also distinguishing. Linkage between the records in the tables is possible through the demographics. Notice, however, that the first record in the covered entity’s table is not linked because the patient is not yet old enough to vote."



It's not bullshit. Here's the paper from 2000: http://dataprivacylab.org/projects/identifiability/paper1.pd... From the abstract: 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}.


the date of birth is not cited in the example. Only the age. Seems like a stretch of an example to me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: