There is surprisingly little discussion about the actual spec here. It looks really good to me!
- Advertisements change every 15 minutes, are not trackable unless keys are shared.
- The only central bit is a repository of "infected" daily keys.
- No knowledge about contacts is shared with a central authority.
Nothing is shared unless you are infected and decide to share your keys, which are only valid for one day. I don't see how you could have a real argument against this unless you are a privacy extremist. It also seems more privacy friendly than the Singapore or German apps.
In widely distributed and important spec like this it may be useful to look for what is conspicuously absent or unstated, rather than simply reading the precise positive language.
To my mind this phrase under 'Privacy Considerations' in the Cryptography Specification stands out:
"A server operator implementing this protocol does not learn who users have been in proximity with or users’ location unless it also has the unlikely capability to scan advertisements from users who recently reported Diagnosis Keys."
That phrase explicitly mentions that server operators cannot learn about user proximities.
What I reckon may be unstated there is that it could be possible for adversaries with sidechannel / network monitoring capability to learn those kind of details about users (i.e. internet, cell data, and other data network operators).
If such a side door did exist, it would seem in the public interest to be aware of the scope of the availability of that data, especially given the potential (physical, social) vulnerability and risk of those users.
I'd also like to be proven wrong about the possibility of such sidechannel attacks by anyone who understands the spec in more detail.
The approach outlined by Apple and Google is very similar to, and likely based on, the TCN protocol developed by a coalition of open source projects. If you'd like to discuss possible vulnerabilities and propose further improvements, there's an active community already doing that who would be happy to have one more contributor. :-)
I’m part of the CoEpi project, one of the member projects of the TCN Coalition. I see that some of my teammates are searching through the OpenTrace code to see if anything there is worth taking, such as their device-specific bluetooth range calibrations. I don’t think there’s been any two-way communication between these teams.
The projects I've seen inside of TCN seem aware of OpenTrace and the code / data they put out over the last few days, not sure if direct contacts exist yet.
> I don't see how you could have a real argument against this unless you are a privacy extremist.
The authors of DP-3T (which seems quite similar to this spec) have a huge list of privacy caveats in their whitepaper [1], in section "5.4 Summary of centralised/decentralised design trade-offs".
I haven't seen any analysis on how the Apple/Google spec prevents those problems.
The Apple/Google design drops this DP-3T requirement:
2) Enable epidemiologists to analyse the spread of SARS-CoV-2
So anything in that table with epidemiologists is gone.
The remaining caveats are pretty boring:
To do so, the attacker uses strategically placed Bluetooth receivers
and recording devices to receive EphIDs. The app’s Bluetooth broadcasts of non-infected
people and infected people outside the infectious window remain unlinkable.
...
On the other end, a proactive tech-savvy person can abuse any proximity tracing
mechanism to narrow down the group of individuals they have been in contact with to
infected individuals. To do so they must, 1) they keep a detailed log of who they saw when.
2) they register many accounts in the proximity tracing system, and use each account for
proximity tracing during a short time window. When one of these accounts is notified, the
attacker can link the account identifier back to the time-window in which the contact with an
infected individual occurred.
So, yeah, these vulnerabilities still exist and have been pointed out on this thread... but I find it hard to care about these at all.
> The app’s Bluetooth broadcasts of non-infected people and infected people outside the infectious window remain unlinkable.
The group of non-infected people is getting smaller and smaller. The infectious window is presumably weeks long (times the number of diseases this system will track). These risks don't seem that easy to downplay, even before we get into the "security concerns" section.
One issue I see is that when I query the central repository of infected IDs I expose to the central server the IDs I've been in contact with (unless I always download all of them, but that doesn't seem feasible).
It seems like this could be solved by providing a K-anonymous query interface like the one exposed by Have I Been Pwned. I wrote to the contact email address of Pepp-Py, which is a European initiative do develop a system that seems pretty much the same as this, suggesting this, but I got no answer (not that I was really expecting one).
Ah you mentioned the HIBP example, although for this search space you may be able to get by with just a download of all of them. If you stick to, say, state by state sharding, you get around 30 MB of hashes for the worst case (NYC).
If you further reduce that by only providing new confirmed hashes since a timestamp, the client can track when they last downloaded the data and pull only the delta, you end up with a few MB a day, which compares quite well to say, a video call.
Geographical based sharding seems to break down once people travel though. Just a single visit to a hub airport might have gotten you in contact with people form all around the world (I assume that the objective of this initiative is to try and get us at least part way back to normal). Even if you don't travel, but other people are, you will be in contact with people who are registered as infected in a different region.
Also I don't think NYC is at all the worst case in the world, there are a lot of megacities that dwarf it in size...
You could still have geo sharding if the device also saved the location locally and shared the diagnosis for every zone it’s been in / downloaded the data for all zones. Ofc that would mean more data to process for travelers but it should still be way less than the data of the entire globe.
I think it has a flaw: if you find out you are infected mid-day, then if you reveal your key for the day others can impersonate you for the rest of the day, and if you don't those who you had contact with in the first part of the day won't be notified.
So my suggestion for a minimal fix would be to also reveal all advertised rolling IDs for the current day in addition to the keys for the past days.
A better fix would be to generate ID in a hierarchical fashion from the daily keys with power-of-two-length time slots, so that you only need to share O(d + log(n)) values where d is the number of days and n is the number of subdivisions in a day.
Another potential fix is to use public-key cryptography and only reveal the daily public keys; however, this requires twice as large IDs and matching requires to try to decrypt/signature-check all received IDs instead of being able to generate and lookup.
Your suggestions don't seem different from what the spec already describes. Tests are not immediate and the incubation period of the disease dictates that you have to share multiple diagnosis keys (days) of infected persons anyway. You don't have to share timeslots within a day, they can be derived from the daily key. Impersonation risk is unlikely, whatever health authority applies can just invalidate all newly identified keys from generating new contacts, preventing replay attacks derived from known infected with simple and coarse timestamps.
A simple solution using virus properties would be to just delay the release of the last id. It takes a while before the viral load inside someone becomes high enough to be infectious, so there is no significant harm in the last id being delayed by 24h in the worst case.
Which part of the spec do you think people who care about privacy will object to? I agree with you that this is a poor choice of wording but I think your interpretation is uncharitable.
I think this is a very innovative solution that enables contact reporting without knowing location or personal details at all, and its exclusively opt-in.
I see some people arguing that "yes but it could be subverted" but this isn't a really good place to begin if you just want to monitor people and know who is talking to who, there are much better ways to do that already available.
Think of the daily key as the seed to a random number generator. If two people pass the same seed into the same random number generator, they can generate the same list of 500 random numbers. This provides a compact way for someone to say: "I just learned that I was infected. These are the 500 identifiers I broadcast on that day. If you recognize one of them, then you might also be infected."
I understand that aspect of it; I'm just confused as to how only having the daily key is enough to generate the identifiers. Wouldn't they also need the TimeIntervalNumber, according to the function?
If each phone generated 500 numbers per day, then TimeIntervalNumber is a number in the range of 1...500. So generate 500 codes using all of the numbers in that range. If any of those 500 codes match one of the codes that you actually saw in the wild, then you were near that person.
Thanks! I actually brainstormed with a friend later that day on how it'd work and we finally came to a similar conclusion.
According to the spec, the phone only generates a new identifier when the MAC address changes or on a new day. But since it's generated in accordance to a 10-minute time window, that means you'd try to derive their key with all 144 possible time windows for that day. And if you find one of those ID's in your list of contacts, then you know you were in contact with someone infected.
- Advertisements change every 15 minutes, are not trackable unless keys are shared.
- The only central bit is a repository of "infected" daily keys.
- No knowledge about contacts is shared with a central authority.
Nothing is shared unless you are infected and decide to share your keys, which are only valid for one day. I don't see how you could have a real argument against this unless you are a privacy extremist. It also seems more privacy friendly than the Singapore or German apps.