Digital National ID systems: Ways, shapes and forms
PI has put together an analysis of some of the most used Foundational ID systems worldwide.
Governments around the world are increasingly making registration in national digital ID systems mandatory for populations, justifying its need on a range of issues from facilitating access to services, to national security and fighting against corruption. This is an attempt to create a "foundational identity" for an individual, or "a single source of truth" about who someone is, according to a government agency. These identity systems are run by governments, sometimes by private companies, or by a combination of both.
ID systems are the gatekeepers to access an increasingly-wide range of goods and services, resulting in potential limitation of the exercise of a range of human rights. The ID system designer or manager not only has control over what people have access to, but may also make the use of the system mandatory.
While governments and other proponents of ID systems highlight their potential benefits, little attention and public debate has focused on the potential harm that come from the implementation of such systems, notably:
- Exclusion - People can be excluded from accessing some public services as the result of not having an ID, because of discriminatory application, technical or logistical barriers or enrollment and verification not being possible. Sometimes the capture of biometric identifiers such as fingerprints is made mandatory to enroll in these systems, despite not everyone having "readable" fingerprints. We have seen governments going forward with such measures despite expecting up to 15% failure rates in fingerprint authentication due to a large chunk of the population being dependent on manual labour, which abrases people's hands and fingertips. Similarly, exclusion is likely to happen if someone ends up with an ID that they are not able to make use of (e.g. if gender markers assigned on ID is different from their self-identified gender)
- Exploitation - People's data can be exploited through their use and processing within identity schemes. This is particularly the case when 'Unique identifier' is introduced. A 'unique identifier' is a unique number or code, for example an ID number, through which government and the private sector are able to connect together various data sets. The prevalence of this unique identifier across multiple government or private sector databases, implies the risk of providing a “360 degree view” of an individual. On top of this, it raises concerns of further data processing for purposes beyond initial legitimate purpose.
- Surveillance - ID systems can be used as tools of surveillance within a broader surveillance infrastructure, often leading to disproportionate and unnecessary interference with our privacy and enabling violations of other human rights. Since 9/11, governments have sought to justify the imposition of ID systems, the indiscriminate tracing and tracking of individuals within countries and across borders in the name of national security.
These harms can occur at multiple points along the identity system - from the issuing of a document or credential and the information required to do so, through to how the data is stored and processed, down to the time when someone is requested or required to show their credentials for verification. Understanding this fact should lead governments to be cautious about new technological approaches for identification.
When despite these concerns, governments decide to go forward in implementing foundational ID systems, they are still left with the big question of how to technically and in practice roll out such systems. National ID schemes can take various forms: some have smartcards, some don't; some rely on biometrics, some don't; some are linked to citizenship, some aren't. Some link to birth registration, some don't. Some link to voter rolls, some don't. Some give everyone a unique number, some don't. Some have advanced security features, some are left for the deploying end to laminate.
Most used Foundational ID systems worldwide: an analysis
With this in mind, PI has put together this analysis of some of the most used Foundational ID systems worldwide. This analysis is based on publicly available information, collated into this one article for the purpose of making it easier for our readers to compare the characteristics of the commonly used identity systems. This is a developing piece and more systems will be added over time. A summary of our findings can be found on the table below. At the time of writing, the findings in this piece are based on our analysis of MOSIP, Adhaar and e-Estonia:
What our analysis includes
For each individual analysis we explore and present:
An overview of the ID system
This includes information about where and when it was developed and by whom. We also include a list of advocates, representatives and funders for each particular system, so decision-makers can have an idea of who the particular solution is being pushed by.
In this overview we also provide information about whether the system is open source or a proprietary. Open source systems are characterised by an underlying code that is made publicly available. This means anyone can inspect, modify and distribute the code, meaning it's developed in a decentralised way, relying on peer reviews and community production. It is important that government founded software is made open source and having an open source approach to the development of an ID system has several advantages:
- increased transparency as anyone can see how implementation is done in practice, the project becomes open to a large community of developers,
- testers and other contributors who can constantly provide feedback, contribute actively with new features, or fix bugs,
- on top of this open source solutions are often cheaper,
- more flexible and
- offer more longevity that proprietary alternatives because they are developed by communities rather than a single author or company.
Some might argue that making a codebase open source is a source of risk as motivated enough malicious attackers can take their time to look for vulnerabilities or ways to exploit the code. In reality, keeping the code away from the eyes of the public will not stop this either way, instead missing out on public scrutiny and community development.
From the foundational ID systems currently analysed two out of the three rely on open source software:
- MOSIP is fully open-source and all its code is publicly available.
- e-Estonia services rely on X-Road - an open source data exchange layer. On top of this, earlier this year the Estonian government decided to make all government software publicly available in this repository.
- Aadhaar relies on proprietary technology that belongs neither to the Indian government nor the UIDAI (Unique Identification Authority of India) which makes the system a "black box" in the sense that one cannot know of how operations are de facto being handled.
Insights into the infrastructure makeup
Authentication attributes and personal data need to be appropriately safeguarded. Encryption is the process through which data is encoded so that it remains inaccessible to unauthorised users. On top of storing and dealing with people's personal information such as name and date of birth, several foundational ID systems systems across the world increasingly rely on biometric data, which data protection frameworks largely recognise as sensitive, and therefore a special category of personal data.
The use of such sensitive and uniquely identifying personal data should mandate that the tightest security safeguards are in place. Encryption is crucial to keeping data safe from unwanted third parties and to provide users with a reliable authentication process, providing veracity when determining if an entity - user, server, or client app - is who it claims to be.
In any modern software the use of strong encryption should be assumed, and as such we will generally not discuss implementations unless we cannot find any information, or what is being used is wildly and obviously dangerous (such as using encryption protocols which are obsolete or no use of encryption at all).
Is user data stored in a centralised or decentralised way?
Electronic databases are the preferred option to store, reference, validate and authenticate identity data. These databases can exist as a central repository or as distributed systems depending on the country and the implemented solution.
Physical V Logical (de)centralisation
(De)centralisation can take two forms; distributing the database across several geographically dispersed physical computers (either through a process known as "sharding", where different parts of the database are held in different places, or with identically replicated databases spread over several computers), or through holding logically separate data in separate (preferably geographically dispersed) databases which use some form of API (Application Programming Interface) to interact with each other.
Whilst logical decentralisation is a design decision, physical decentralisation should be expected in any at-scale system to reduce (and attempt to eliminate) single points of failure, as well as potentially speeding up access by "moving" the database being interacted with physically closer to the user.
Insights into how de-duplication is handled
Identity de-duplication is a procedure used to attempt to 1:1 match a natural person to a unique "identity" within a system -- the ultimate goal being no cases where one person has two "identities", or where one "identity" is shared by more than one person.
Some systems may attempt de-duplication based purely on demographic data, while others will attempt to compare against biometric identifiers already stored by the database.
False Positives
In the particular case of Aadhaar for example, the UIDAI published a report with a proof of concept on biometric de-duplication. In this report, the UIDAI states that the occasional false positive is expected due to reasons such as:
- faulty biometric collection equipment.
- human error in assigning biometrics to the correct "unique" profile.
- one person enrols twice with a different name.
- two people may happen to have the same biometrics.
This proof of concept is based on a sample of 40,000 people where the UIDAI (Unique Identification Authority of India) is aiming for a false positive identification rate of 0.0025%, meaning 2.5 false positives for every 100,000 comparisons. In this context a false positive means a biometric system incorrectly matches an individual to someone else’s biometric identifiers. This may not seem like a lot but when scaling up the sample size to India's population of 1,380,000,000 it adds up to each person having over 17,000 false positives to resolve when using their biometrics for identification purposes.
In practice this means that solely based on biometrics an individual cannot prove to be one of 1,380,000,000 citizens. Rather they can only prove that they are one of the 17,000 people with that one biometric identifier, within the whole population. This shows that in reality we are dealing with a lot more than occasional false positives when doing biometric deduplication and that it is impossible to claim that all the IDs in UIDAI's CIDR (Central Identities Data Repository) are biometrically unique. You can find a breakdown of the simple maths used to get to these numbers on our research into Aadhaar.
Principles of engagement for each system
The technological choices in the design and deployment of digital ID systems can express a variety of political motives, which impact how societies operate at the deepest level. The design, deployment and governance of large socio-technical systems can establish a different public order and power entrenchment among social and political groups.
With this in mind, besides presenting information on the technical infrastructure makeup of each ID system, we will also delve into the engagement principles or governing principles published by developers, when existent. Such guidelines on external conditions and safeguards present what developers believe should be in place so that the risk of abuse and exploitation in the use of their tool is minimised.
Countries where the particular ID system is deployed or where there's evidence it will be deployed in the future
Aadhaar:
- India (2009)
MOSIP:
- Morocco (2018)
- Philipines (2019)
- Guinea (2021)
- Ethiopia (2020)
- Sri Lanka (to sign soon)
e-Estonia (X-Road):
- Estonia (2001)
- Finland (2017)
- Azerbeijan (2018)
- Faroe Islands (2016)
- El Salvador (2017)
Reported examples of abuse in countries where the researched ID system is in use
In our research PI found reported example of abuses in all systems we analysed, despite the fact that some of them proclaim to be 'safe', 'inclusive' and 'privacy friendly'.
For instance, Aadhaar in India has had countless reports of abuse over the years including massive data leaks, exclusion from access to benefits and even issues around de-duplication. This list by itself contains almost 40 isolated cases of breaches between February 2017 and May 2018. The data leaked are not limited to Aadhaar numbers and demographic information. In some of the cases sensitive data such as data on pregnancy, people's religion and caste and even bank details were leaked alongside Aadhaar numbers.
The Indian government has also made enrolment in Aadhaar a mandatory requirement to access a myriad of social protection schemes. This measure creates thick barriers in accessibility for a lot of people and has led to cuts in accessing food rations which has been linked to several deaths by starvation across the country.
On top of this, the Indian government and the Unique Identification Authority of India (UIDAI), have ignored privacy concerns as well as sample test results of its pilot project that showed that there could be up to 17,000 false positives each time an Indian citizen engages in an identification process. This systematic fail has led to cases where one person somehow ends up getting two different "unique" Aadhaar numbers.
MOSIP is being deployed in Morocco and there have been concerns regarding exclusion through language. Morocco’s General Directorate of National Security announced a new generation of identity cards in 2020, but according to a draft law the card would only be in Arabic - one of the two official languages of the country - and French - a foreign non-constitutional language, leaving Tamazight - the second official language - out. This goes directly against regulations aimed at gradually including Tamazight in Morocco’s public life and encouraging the usage of Tamazight, alongside Arabic, in administrative documents, including national identity cards.
Regarding countries where e-Estonia or X-Road based ID systems have been implemented there haven't been any similar reports of abuse to this day.
Methodology
The information presented about each of the systems was all collected from publicly available sources. As stated above, some of the analysed systems will be open-source meaning they have increased levels of transparency, including public access to the source code being used. This allows for anyone to confirm any technological statements made about the said system, which we must stress is an invaluable feature. On the opposite end, there are "black box" systems for which our research was based on documentation made publicly available by the ID system designer or owner. In these cases it becomes impossible for the public to verify that the system behaves as described in the documentation, and hence the public's knowledge is trust-based and limited to whatever information is picked as fit for public consumption.