Why we should regulate information about persons, not “personal information”

 
People walking outside U of T's Arts and Science Building.

Schwartz Reisman Research Leads David Lie and Lisa Austin, along with Faculty Affiliates Nicolas Papernot and Aleksandar (Sasho) Nikolov, comment on aspects of Canada’s newly-proposed privacy law reform. This piece is the fifth in a series of posts on the features, implications, and controversies surrounding privacy law reforms in Canada and around the world in an increasingly digital and data-rich context.


Data protection statutes—like PIPEDA and Canada’s proposed new Bill C-11, but also the Privacy Act—are modelled on the Fair Information Practice Principles (FIPPs). These laws all share the same basic legal architecture in that they regulate the collection, use, and disclosure of “personal information.” In this blog post we argue that we should shift from regulating “personal information” and instead regulate “information about persons.”

As we have outlined in past blog posts, there are many practical and definitional difficulties associated with the category of “personal information” and related ideas such as “de-identify.” See, for example:

Concepts such as “personal information” and “de-identification,” and our standard ways of defining them, do not easily map onto the kinds of advanced data analytics in use today (such as complex computational tools that can forecast, predict, and draw insights from massive amounts of data). 

This should come as no surprise. The FIPPs were developed in the 20th century and predate the internet, social media, big data, advanced profiling, AI, and the internet of things. FIPPs also predate important computer science research that can provide us with better methods to minimize privacy risks. 

We cannot protect privacy and other important values in the 21st century with 20th century paradigms. 

While the question of whether a person is “identifiable” from some mass of information is central to many privacy concerns, it does not catch the many ways in which contemporary data processing creates privacy vulnerabilities. For example, an individual might not be identifiable in any single database, but the accumulation of information about this specific individual across multiple databases potentially increases what can be known about this individual. Waiting for that risk to materialize into a “reasonably foreseeable” risk of identifiability (which is the general legal standard) is to address responsible data analysis too late in the data pipeline. 

This is why we disagree with risk-based approaches that focus only on identifiability. 

Waiting for privacy vulnerabilities to materialize into a “reasonably foreseeable” risk of identifiability is to address responsible data analysis too late in the data pipeline. 

Consider the analogy of speeding on a highway: I might not be wronging any specific individual when I speed, but my behaviour is risky and we regulate it through imposing speed limits in order to reduce this risk. Similarly, my data processing might not identify any specific individual, but my methods might still be risky and we need to regulate to reduce this broad risk—rather than focus on identifiability only (later in this post, we discuss defining “privacy risks” to include both the risk of identifiability and the risk of inferring information about a specific individual even when that individual is not identifiable).

The way to do this is to ensure that all organizations that process information about persons utilize reasonable practices to minimize these risks. As we have already outlined, these practices are not just about manipulating the data, but involve also focusing on the algorithms and the computing environment more generally.

Although the idea of “identifiability” should no longer be the basis upon which we determine the scope of regulation, it might still be relevant for some specific obligations such as when the relevant authorization for data processing should be individual consent. We therefore propose some additional categories of information.

Here is how an alternative legal architecture could work. 

Alternative framework

The basic features of our alternative framework are as follows:

  1. The law regulates “information about persons.”

  2. This includes “functionally non-identifying information” (defined below).

  3. Organizations processing information about persons are subject to a re-formulated and updated set of FIPPs (outlined below). Some of these obligations will vary, depending on whether the information is identifiable, so the idea of “personal information” remains relevant but shifts from its current role in delineating the scope of regulation to a role in determining the nature of some specific obligations (like consent).

Important definitions 

Information about persons means information that relates to individuals and groups of individuals.

Personal information means information about persons that is neither anonymous information nor functionally non-identifying information.

Functionally non-identifying information means information about persons that is managed through the use of privacy-protective techniques and safeguards in order to ensure that identifying an individual is not reasonably foreseeable in, or made significantly more likely by, the context of its processing and the availability of other information.

Privacy-protective techniques and safeguards mean techniques and safeguards that can include any of the following, including in combination:

  1. modifying personal information

  2. creating information from personal information

  3. privacy-protective analysis

  4. controlled access through a trusted computing environment

  5. as prescribed

Privacy risks mean the processing of information about persons in a manner that increases the risk of identifying an individual or inferring information about a specific individual.

Processing means any operation or set of operations which is performed on information about persons, whether or not by automated means, such as collection, recording, organization, structuring, storage, adaptation or alteration, retrieval, consultation, use, statistical analysis, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction

We use the term “non-identifying” rather than “de-identify” because of the shortcomings of the latter, already documented by us (see our posts on exceptions to consent for de-identified data and the inadequacy of de-identification techniques currently in use). The term “functionally” is added in order to highlight that this is not simply about the properties of the data itself but also about the context in which it is processed, one which includes a consideration of other available information that is not part of the data in question. This definition also better expresses the idea that it is possible to minimally modify personal information but still be “functionally non-identifying” if other techniques and safeguards establish that such identifiability is “not reasonably foreseeable”.  The associated definition of “privacy-protective techniques and safeguards” highlights that there are many different strategies that are important, not just the manipulation of data, and that the important question is whether together these result in “functionally non-identifying information.”

By defining “functionally non-identifying information,” this framework allows for there to be different obligations (such as consent) depending on whether information about persons is “personal information” or “functionally non-identifying information”. However, we separately define “privacy risks” to include both the risk of identifiability and the risk of inferring information about a specific individual even when that individual is not identifiable. This captures a broader set of privacy vulnerabilities than a focus on identifiability alone. This allows our framework to require that all processing of information about persons involve taking reasonable steps to minimize this broader set of privacy risks (see the “Privacy Risk Limitation Principle”  below).

We separately define “privacy risks” to include both the risk of identifiability and the risk of inferring information about a specific individual even when that individual is not identifiable. This captures a broader set of privacy vulnerabilities than a focus on identifiability alone.

Some other approaches, including in the EU and in Quebec’s Bill 64, exclude “anonymous” information from the scope of regulation and define this in terms of whether the information does not identify an individual “irreversibly.”

While we think that different obligations might attach to information about persons depending upon the risk of re-identification, we also think that it is better to presumptively regulate all information about persons and then outline different obligations within the legislation where it can be regulated with greater specificity. For example, information about persons that can be released openly on a public release model with a low risk of re-identification might not attract the same obligations as information about persons that requires more safeguards in order to be “functionally non-identifying.” This might be different again from the obligations that attach to “personal information.” However, it might also be the case that in some contexts all such information should be subject to collective governance (e.g. the First Nations principles of OCAP® in relation to First Nations’ data and information).

It is better to presumptively regulate all information about persons and then outline different obligations within the legislation where it can be regulated with greater specificity.

We use the term “processing” instead of “collect, use, or disclose” because it better captures the complexities of contemporary data practices that do not always easily fit within these categories. This definition of processing is essentially copied from the GDPR definition but we add “statistical analysis” to clarify that this is included.

Rethinking the “Collection Limitation Principle”

The traditional “Collection Limitation Principle” in the FIPPs framework states that “[t]here should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject.” There are two different ideas merged in this principle—an idea of data minimization (collecting only the minimum amount of data necessary for the purposes in question)  and an idea of authorization for data collection (including consent). 

A 21st century version of this principle requires three shifts. 

First, the idea of data minimization requires rethinking. Instead of a focus on personal information, the focus should be on the risks associated with the processing of information about persons; instead of a focus on collection, the focus should be on all processing. All processing of information about persons should involve the use of privacy-protective techniques and safeguards to reasonably minimize privacy risks during the processing, and as a result of the output of the processing. 

Second, relatedly, the name of the principle should be the “Privacy Risk Limitation Principle” to reflect this new focus.

Third, the idea of authorization (including consent) should be separated from the idea of privacy risk minimization. It should be an entirely different principle. This would help show that the “who decides?” question is different from the “have reasonable steps been taken to minimize privacy risks?” question. We take no position here on when consent is appropriate, but we point out here that our alternative legal architecture makes it possible to differentiate between how “personal information” and “functionally non-identifying information” are treated for these purposes. It also makes it easier to, at some point, create new kinds of authority for collective governance over both personal information and functionally non-identifying information, something that is of increasing interest in debates about data trusts, data collectives, and other such mechanisms.

There is technology that can help support different kinds of authorization. For example, there is technology designed to facilitate access control, which can help to ensure that data is only processed by persons with the appropriate authority and can provide the means to revoke access (such as when an individual withdraws consent). 

new kinds of authority for collective governance over both personal information and functionally non-identifying information is something that is of increasing interest in debates about data trusts, data collectives, and other such mechanisms.

Rethinking other principles

The “Collection Limitation Principle” is not the only principle that requires rethinking for the 21st century. 

To give just one additional example, the “Data Quality Principle” focuses on whether personal information is relevant and accurate for its purposes. But we can create machine learning models based on inaccurate and biased data that is not necessarily personal information. Such models can have a social impact, for example when machine learning is used to decide insurance premiums. Although other laws might be able to address some of the uses to which these models get put, we should address this earlier in the data pipeline by imposing data quality obligations on the developers of such models. 

A shift from “personal information” to “information about persons” provides us with a better architecture to rethink some of these basic issues and develop a data governance framework that is truly suited to the 21st century.

Editor’s note: Bill C-11 failed to pass when Canada’s federal parliament was dissolved in August 2021 to hold a federal election. In June 2022, many elements of C-11 were retabled in Bill C-27. Read our coverage of C-27.


Browse stories by tag:

Related Posts

 
Previous
Previous

SRI graduate fellows invite submissions for 2021 Grad Workshop, “Views on Techno-Utopia”

Next
Next

Nisarg Shah named one of “AI’s 10 To Watch” for innovative work in computational social choice