Anonymous for Now: Demystifying Data De-Identification

Egin Kongoli is a 3L JD Candidate at Osgoode Hall Law School. This article was written as a requirement for Prof. Pina D’Agostino’s IP Innovation Program.

Canada is getting serious about consumer privacy, or so our lawmakers claim.

Parliament has recognized the public’s need for a data framework that ensures proper transparency and accountability.[i] Ottawa’s response is Bill C-27 and the proposed Consumer Privacy Protection Act (CPPA), meant to govern the future collection, use, and disclosure of personal information for commercial purposes. However, while the law modernizes elements of the privacy framework, it leaves out exceptions for de-identified data practices that undermine the very trust the legislation is meant to foster. Standing tenuously on technological assumptions, the exception creates a wild-west scenario ripe for harmful data practices.

Under the CPPA, organizations are not required to obtain user consent to de-identify, a process that modifies data so that “an individual cannot be directly identified.”[ii] The legislation creates an offence for re-identification and, as such, seems aware of the risk.[iii] Nonetheless, further exceptions are made for data anonymization, by which an organization “irreversibly and permanently modif[ies] personal information… to ensure that no individual can be identified from the information, whether directly or indirectly, by any means.”[iv] The CPPA excludes the anonymized data from its purview because, by their definition, there is no reasonable prospect of re-identification.

This logic rests on several problematic assumptions. First, the line which separates de-identified and anonymized data is vague and rarely obvious until re-identification occurs. De-identified data is by its nature not meant to be re-identified, and thus anonymous by the government’s definition. Moreover, the law assumes organizations have the technological capabilities to ensure irreversible and permanent anonymization. While identifiers may be removed, many other seemingly innocuous data points can be used to recreate a person’s identity. Research from Oxford recently found that approximately 99.98% of sampled anonymized data was capable of re-identification. One might imagine many disturbing consequences, from identity fraud to the cancer patient whose allegedly-anonymous data is used to change their insurance coverage and rates.

How can the disclosure and use of data be monitored if the law excludes anonymized data from regulation? Privacy enforcement may require individuals to come forward with complaints about the misuse of their data.[v] The system thus asks users to not only be aware of their data anonymization (which they never consented to) and its subsequent disclosure (kept secret from them) but to catch the bad actors re-identifying information the regulators turned a blind eye to. Our framework’s release-and-forget de-identification model thus opens the door to potential misuse of personal information that will remain altogether hidden from the regulator’s or public’s view. Where is the transparency or accountability?

While the anonymized exception answers the growing demands of businesses seeking to use personal data, the current state of de-identification practices does not satisfy the standards of the CPPA. The European GDPR includes data that does not contain direct identifiers but is capable of re-identification, “pseudonymous data,” as within the scope of the law. That our lawmakers decided against regulating allegedly-anonymous data begs whether their priorities indeed lay with the needs of the public or of commerce.

[i] Bill C-27, An Act to enact the Consumer Privacy Protection Act, the Personal Information and Data Protection Tribunal Act and the Artificial Intelligence and Data Act and to make consequential and related amendments to other Acts, 1st Sess, 44th Parl, 2022, preamble, para 8.

[ii] Ibid at s 2(1).

[iii] Ibid at s 128.

[iv] Ibid at s 2(1).

[v] Ibid at s 107.