Telco data and Covid-19: A primer
Governments around the world are rushing to leverage the metadata held by mobile service providers in order to track the movements of a population.
This sort of population and movement tracking is neither new nor novel - indeed, PI have been pushing back against measures of this type for two decades. We have seen telecommunications data utilised in building "smart cities", in tracking protesters, and in arrests... including of innocent people, dissidents, and journalists.
- Because Governments get these data directly from providers, you cannot choose whether to be part of this form of tracking, and do not know whether you have been tracked
- Telco data tracking by definition is not opt-in, and cannot be opted out of as it is records of the connections your mobile phone has made to masts
- Telco data is highly sensitive, and must be aggregated with other data to de-identify it - at which point, can it be accurate enough to actually track the spread of a virus based on 1-1 proximity?
In a scramble to track, and thereby stem the flow of new cases of Covid-19, Governments around the world are rushing to track the locations of their populace. One way to do this is to leverage the metadata held by mobile service providers (telecommunications companies - "Telcos" - such as Hutchison 3 (Also known as Three), Telefonica (Also known as O2), Vodafone, and Orange) in order to track the movements of a population, as seen in Italy, Germany and Austria, and with the European Commission.
This sort of population and movement tracking is neither new nor novel - indeed, PI have been pushing back against measures of this type for two decades. We have seen telecommunications data utilised in building "smart cities", in tracking protesters, and in arrests... including of innocent people, dissidents, and journalists.
Metadata?
Metadata are data describing other data - literally "data about data".
We describe Metadata and risks in our report for the ICRC thusly:
Let’s imagine you work at the post office. Every day, you receive packages. For each package, you note the return address, the date it arrived, and the person to whom it is addressed. Moreover, you might have a vague idea of what the package contains based on the company who sent it. Given the wrapping and the time of the year, you might also infer that it’s a gift. All in all, you were able to get a lot of information about this package – without ever opening it.
Now, let’s imagine that you’re a telephone operator. It’s late at night, and you’re asked to connect a hospital’s emergency room to the president’s personal phone line. Between the recent headlines you’ve read in the press and the brevity of the phone conversation, you’re able to guess the kind of news that’s just been shared.
Neither scenario involved any eavesdropping or device tampering. All the information you obtained – the metadata – was consequent to the communication itself. These metadata were made accessible to you, a third party, without the say of the individuals involved. This was an inevitable result of the communication simply taking place: the content of people’s interactions was revealed by correlating observed metadata, legally obtained through their use of an intermediary platform. At no point was any right to privacy explicitly forfeited.
Wait, my phone operator is tracking me?
As covered in some depth in our report on metadata in humanitarian contexts, the mobile telephone network is, by design, also a tracking network. We also explore these data in more depth in our long-read piece, here.
In order to try and maintain a signal whilst moving, as well as to connect you to the "best" tower, mobile phones send constant "pings" to towers in their vicinity, meaning their position can be easily triangulated.
So what's the problem?
A crucial difference between using Telco data and app-based tracking is that by definition Telco data is not opt-in, and cannot be opted out of. Indeed, in several countries around the world, Telcos are legally compelled to store these records. Unlike proximity tracking through Bluetooth where it's based on physical proximity of users of the app to each other, Telco data are generated as a result of the connections your mobile phone makes to masts.
This sort of tracking, by definition, can never be truly anonymised. It is some of the most intrusive tracking available; as a by-product of the network itself, there is no way to avoid it other than by leaving your phone behind, in a Faraday cage, or (if possible) removing the battery - although in Taiwan this could lead to the police knocking on your door.
This all sounds a little abstract...
Want to see the true potential impact of ignoring social distancing? Through a partnership with @xmodesocial, we analyzed secondary locations of anonymized mobile devices that were active at a single Ft. Lauderdale beach during spring break. This is where they went across the US:
@TectonixGEO on Twitter, 25th March 2020
We've seen Saudi Arabia tracking its citizens' movements around the US, the Mexican Government reportedly spent $5 million on this tracking technology.
In 2012, a German politician teamed up with Die Zeit to turn the data held by his domestic mobile service provider into an interactive map.
“This profile reveals when Spitz walked down the street, when he took a train, when he was in an airplane. It shows where he was in the cities he visited. It shows when he worked and when he slept, when he could be reached by phone and when was unavailable. It shows when he preferred to talk on his phone and when he preferred to send a text message. It shows which beer gardens he liked to visit in his free time. All in all, it reveals an entire life.”
In France, Orange are currently selling access to aggregated subscriber data to track peoples' movements after mass events such as festivals.
But don't just take our word for it. Watch the following promotional video from a company providing direct access to these data
They can anonymise it though, right?
There are inevitable trade-offs in any data set. Given Telco data is the tracking information of all devices using their infrastructure, by definition it cannot be anonymised. As the Dutch Data Protection authority has ruled, anonymising such location data is not possible, because it is never irreversible. Instead, the aim will be to de-identify it; that is remove the ability to identify an individual phone in a data set.
Tower connection information alone is rough, and in order to deidentify it, the data must aggregated (and have statistical noise added) until a given phone can no longer be identified - at which point, can it be accurate enough to actually track and stem the spread of a virus, when it's based on 1-1 proximity?
As discussed in our ICRC report, even at population scale there are inherent time-shifted risks to these sort of tracking measures.
Aggregation?
Even when aggregating telco data, because of the highly personalised nature of the records, it is practically impossible to deanonymise the data set and keep it useful.
In the 2014 paper On the anonymizability of mobile traffic datasets the authors conclude that
[...] mobile traffic fingerprints tend to have a non-negligible number of elements that are much more difficult to anonymize than the average sample. These elements, which determine a characteristic dispersion and long-tail behavior in the distribution of fingerprint sample distances, are mainly due to a significant diversity along the temporal dimension. In other words, mobile users may have similar spatial fingerprints, but their temporal patterns typically contain a non-negligible number of dissimilar points.
It is the presence of these hard-to-anonymize elements in the fingerprint that makes spatiotemporal aggregation scarcely effective in attaining anonymity. Indeed, in order to anonymize a user, one needs to aggregate over space and time, until all his long-tail samples are hidden within the fingerprints of other subscribers. As a result, even significant reductions of granularity (and consequent information losses) may not be sufficient to ensure non-uniqueness in mobile traffic datasets.
As a concluding remark, we recall that such uniqueness does not implies[sic] direct identifiability of mobile users, which is much harder to achieve and requires, in any case, cross-correlation with non-anonymized datasets. Instead, uniqueness is a first step towards re-identification. Understanding its nature can help developing mobile traffic datasets that are even more privacy-preserving,and thus more easily accessible.
Essentially - we may all use our phones in similar ways, but our individual usages are unique enough that to anonymise a given phone requires the data set to be so coarse as to be fairly useless.
So why are they asking for it?
Many Governments already have access to the raw data - whether through legal instruments, or otherwise. Perhaps this is an attempt to "parallel construct" the access they already have, or perhaps it's to put a "socially acceptable" coating on the access they already have.
If properly aggregated and de-identified, the risk of individual deanonymisation from the data set becomes harder, as it is describing population behaviours rather than those of individuals. However, we remain incredibly wary of the reuse of this data for predictive policing, smart cities, and many other uses.