There is surprisingly little discussion about the actual spec here. It looks rea...

jka · on April 10, 2020

In widely distributed and important spec like this it may be useful to look for what is conspicuously absent or unstated, rather than simply reading the precise positive language.

To my mind this phrase under 'Privacy Considerations' in the Cryptography Specification stands out:

"A server operator implementing this protocol does not learn who users have been in proximity with or users’ location unless it also has the unlikely capability to scan advertisements from users who recently reported Diagnosis Keys."

That phrase explicitly mentions that server operators cannot learn about user proximities.

What I reckon may be unstated there is that it could be possible for adversaries with sidechannel / network monitoring capability to learn those kind of details about users (i.e. internet, cell data, and other data network operators).

If such a side door did exist, it would seem in the public interest to be aware of the scope of the availability of that data, especially given the potential (physical, social) vulnerability and risk of those users.

I'd also like to be proven wrong about the possibility of such sidechannel attacks by anyone who understands the spec in more detail.

[1] - https://covid19-static.cdn-apple.com/applications/covid19/cu...

ericlavigne · on April 11, 2020

The approach outlined by Apple and Google is very similar to, and likely based on, the TCN protocol developed by a coalition of open source projects. If you'd like to discuss possible vulnerabilities and propose further improvements, there's an active community already doing that who would be happy to have one more contributor. :-)

https://tcn-coalition.org/

jka · on April 11, 2020

Thank you, I'll take a look into TCN and the protocol.

Do you know whether TCN have worked with and/or compared notes with OpenTrace[1]?

[1] - https://github.com/OpenTrace-Community

ericlavigne · on April 11, 2020

I’m part of the CoEpi project, one of the member projects of the TCN Coalition. I see that some of my teammates are searching through the OpenTrace code to see if anything there is worth taking, such as their device-specific bluetooth range calibrations. I don’t think there’s been any two-way communication between these teams.

tastroder · on April 11, 2020

The projects I've seen inside of TCN seem aware of OpenTrace and the code / data they put out over the last few days, not sure if direct contacts exist yet.

FartyMcFarter · on April 10, 2020

> I don't see how you could have a real argument against this unless you are a privacy extremist.

The authors of DP-3T (which seems quite similar to this spec) have a huge list of privacy caveats in their whitepaper [1], in section "5.4 Summary of centralised/decentralised design trade-offs".

I haven't seen any analysis on how the Apple/Google spec prevents those problems.

[1] https://github.com/DP-3T/documents/raw/master/DP3T%20White%2...

Eridrus · on April 10, 2020

The Apple/Google design drops this DP-3T requirement:

2) Enable epidemiologists to analyse the spread of SARS-CoV-2

So anything in that table with epidemiologists is gone.

The remaining caveats are pretty boring:

To do so, the attacker uses strategically placed Bluetooth receivers and recording devices to receive EphIDs. The app’s Bluetooth broadcasts of non-infected people and infected people outside the infectious window remain unlinkable.

...

On the other end, a proactive tech-savvy person can abuse any proximity tracing mechanism to narrow down the group of individuals they have been in contact with to infected individuals. To do so they must, 1) they keep a detailed log of who they saw when. 2) they register many accounts in the proximity tracing system, and use each account for proximity tracing during a short time window. When one of these accounts is notified, the attacker can link the account identifier back to the time-window in which the contact with an infected individual occurred.

So, yeah, these vulnerabilities still exist and have been pointed out on this thread... but I find it hard to care about these at all.

FartyMcFarter · on April 10, 2020

> The app’s Bluetooth broadcasts of non-infected people and infected people outside the infectious window remain unlinkable.

The group of non-infected people is getting smaller and smaller. The infectious window is presumably weeks long (times the number of diseases this system will track). These risks don't seem that easy to downplay, even before we get into the "security concerns" section.

lultimouomo · on April 10, 2020

One issue I see is that when I query the central repository of infected IDs I expose to the central server the IDs I've been in contact with (unless I always download all of them, but that doesn't seem feasible).

It seems like this could be solved by providing a K-anonymous query interface like the one exposed by Have I Been Pwned. I wrote to the contact email address of Pepp-Py, which is a European initiative do develop a system that seems pretty much the same as this, suggesting this, but I got no answer (not that I was really expecting one).

joshuamorton · on April 10, 2020

Ah you mentioned the HIBP example, although for this search space you may be able to get by with just a download of all of them. If you stick to, say, state by state sharding, you get around 30 MB of hashes for the worst case (NYC).

If you further reduce that by only providing new confirmed hashes since a timestamp, the client can track when they last downloaded the data and pull only the delta, you end up with a few MB a day, which compares quite well to say, a video call.

lultimouomo · on April 11, 2020

Geographical based sharding seems to break down once people travel though. Just a single visit to a hub airport might have gotten you in contact with people form all around the world (I assume that the objective of this initiative is to try and get us at least part way back to normal). Even if you don't travel, but other people are, you will be in contact with people who are registered as infected in a different region.

Also I don't think NYC is at all the worst case in the world, there are a lot of megacities that dwarf it in size...

hijodelsol · on April 11, 2020

You could still have geo sharding if the device also saved the location locally and shared the diagnosis for every zone it’s been in / downloaded the data for all zones. Ofc that would mean more data to process for travelers but it should still be way less than the data of the entire globe.

The_Double · on April 10, 2020

You have to download the entire database. The check is done inside the framework, recorded ids are not exposed to the frontend apps.

devit · on April 10, 2020

I think it has a flaw: if you find out you are infected mid-day, then if you reveal your key for the day others can impersonate you for the rest of the day, and if you don't those who you had contact with in the first part of the day won't be notified.

So my suggestion for a minimal fix would be to also reveal all advertised rolling IDs for the current day in addition to the keys for the past days.

A better fix would be to generate ID in a hierarchical fashion from the daily keys with power-of-two-length time slots, so that you only need to share O(d + log(n)) values where d is the number of days and n is the number of subdivisions in a day.

Another potential fix is to use public-key cryptography and only reveal the daily public keys; however, this requires twice as large IDs and matching requires to try to decrypt/signature-check all received IDs instead of being able to generate and lookup.

tastroder · on April 11, 2020

Your suggestions don't seem different from what the spec already describes. Tests are not immediate and the incubation period of the disease dictates that you have to share multiple diagnosis keys (days) of infected persons anyway. You don't have to share timeslots within a day, they can be derived from the daily key. Impersonation risk is unlikely, whatever health authority applies can just invalidate all newly identified keys from generating new contacts, preventing replay attacks derived from known infected with simple and coarse timestamps.

The_Double · on April 10, 2020

A simple solution using virus properties would be to just delay the release of the last id. It takes a while before the viral load inside someone becomes high enough to be infectious, so there is no significant harm in the last id being delayed by 24h in the worst case.

throwaway122378 · on April 10, 2020

Now people who simply care about privacy are “extremist”

Perfect way to begin marginalizing people who care for privacy

jeremyjh · on April 10, 2020

Which part of the spec do you think people who care about privacy will object to? I agree with you that this is a poor choice of wording but I think your interpretation is uncharitable.

I think this is a very innovative solution that enables contact reporting without knowing location or personal details at all, and its exclusively opt-in.

I see some people arguing that "yes but it could be subverted" but this isn't a really good place to begin if you just want to monitor people and know who is talking to who, there are much better ways to do that already available.

prophesi · on April 11, 2020

Could someone smarter than me ELI5 how devices are able to "re-derive the sequence of Rolling Proximity Identifiers" of the infected?

I know that the RPI is derived from the daily key + TimeIntervalNumber. But these devices should only be receiving the daily keys + the current day.

Everything else about the spec is pretty easy to follow and gets my a-okay.

ericlavigne · on April 11, 2020

Think of the daily key as the seed to a random number generator. If two people pass the same seed into the same random number generator, they can generate the same list of 500 random numbers. This provides a compact way for someone to say: "I just learned that I was infected. These are the 500 identifiers I broadcast on that day. If you recognize one of them, then you might also be infected."

https://tcn-coalition.org/

prophesi · on April 11, 2020

I understand that aspect of it; I'm just confused as to how only having the daily key is enough to generate the identifiers. Wouldn't they also need the TimeIntervalNumber, according to the function?

ericlavigne · on April 16, 2020

If each phone generated 500 numbers per day, then TimeIntervalNumber is a number in the range of 1...500. So generate 500 codes using all of the numbers in that range. If any of those 500 codes match one of the codes that you actually saw in the wild, then you were near that person.

prophesi · on April 16, 2020

Thanks! I actually brainstormed with a friend later that day on how it'd work and we finally came to a similar conclusion.

According to the spec, the phone only generates a new identifier when the MAC address changes or on a new day. But since it's generated in accordance to a 10-minute time window, that means you'd try to derive their key with all 144 possible time windows for that day. And if you find one of those ID's in your list of contacts, then you know you were in contact with someone infected.