Risks in AI Over the Collection and Transmission of Data

While the daily lives for us ordinary people are made more convenient and more pleasant by the application of various Artificial Intelligence (AI) tools – ranging from the widely known consumer products such as home assistant Siri and personal medical devices to other business applications of natural language processing or deep learning, we should gradually start to think about the emerging risks associated with these AI-enabled technologies. In particular, it is important to recognize the risks associated with the collection and transmission of data between consumer applications and the users themselves.

The first risk factor is quality of data, arising from the technical aspect of AI algorithms. No matter how well-coded the AI algorithms are, the results still depend highly on the quality of data entered as inputs. Volunteer sampling may produce bad data that is not representative of the subject attributes or introduce unwanted bias. Duplicate, incorrect, incomplete, or improperly formatted data is bad data and can be removed by a data scrubbing tool in a more cost-efficient manner than fixing errors manually. Bad data is a big issue for employing AI and, as businesses increasingly embrace AI, the stakes will only get higher. For example, KenSci, a start-up company based in Seattle, is using an AI platform to make health care recommendations to doctors and insurance companies based on medical datasets collected, classified, and labelled. If there are errors in the medical records, or in the training sets used to create predictive models, the consequences could potentially be fatal, as real patients’ lives are at stake.

However, some companies (unlike large and established ones like Microsoft) may not realize the importance of good data until they have already started their projects. It is critical to have a well-curated collection of data to train an AI system and companies might not be aware of the potential business risks rising from biases hidden within their AI training datasets. For example, back in 2015, Google photos tagged two African-Americans as gorillas through facial recognition software. Thus, companies must be cautious about what data they use and how they use it to avoid public relations nightmares and reduce associated business risks.

The second risk arises from a legal perspective: consumers are becoming more concerned with whether their privacy is being infringed by service providers, for example, using the data for unpermitted purposes, benefitting from unauthorized transfer to 3^rd parties, or providing insufficient protection from potential hackers. Consumers do want to know how their personal data is used, where it is used, and what it is used for. Various known or unknown gaps regarding the legal risks and liabilities governing AI exist and the recent implementation of the European Union General Data Protection Regulation (GDPR) has started to fill those gaps. For example, some types of big data analytics such as profiling can have intrusive effects on individuals. According to Article 22, the GDPR does not prevent automated decision making or profiling, but it does give individuals a qualified right not to be subject to purely automated decision making.

The GDPR also says that the data controller should use “appropriate mathematical or statistical procedures for the profiling” and take measures to prevent discrimination on the basis of race, ethnic origin, political opinions, religion or beliefs, trade union membership, genetics, health condition, or sexual orientation. Previously, users did not have the leverage to negotiate with companies in terms of what data is collected or transmitted, by whom, what their personal data are being used for, and their consent to data sharing. Although such legal rights are explicitly regulated by the recent implementation of GDPR (i.e. Articles 15, 16, 17, 21), there is arguably a strong inequality of bargaining power between consumers and powerful companies which makes consumers vulnerable targets and unable to effectively defend their privacy rights. The Supreme Court of Canada decision for Douez v Facebook in 2017 is a good illustration.

Today, technology is changing so rapidly that new questions regarding AI risks are raised from time to time, posing legal challenges to lawyers, regulators, manufacturers, service providers and consumers. Even back in 2013, United Kingdom Information Commissioner’s Office issued a detailed report with suggestions for companies, addressing the incoming GDPR reforms. Various online learning resources have discussed the suggestions in an attempt to cope with the new challenges. Summarily, it is more cost-saving to start with “privacy by design” processes or systems in the beginning than being caught by new rules in later adaptation stage. With “privacy by design” in mind, companies need to determine what is the purpose of data analytics, what data is required and how to legally and effectively collect, transmit, store and use the data. Second, companies need to avoid over-collection of personal data when such data is not required for the legitimate purpose. Third, they should be transparent about their collection and transmission of personal data by providing privacy notices which are comprehensible to data subjects without accessibility barriers caused by legal jargon, hidden notifications or poor telecom infrastructure. Lastly, companies should ensure that data subjects can exercise their legal rights to give consent, withdraw consent, request for a copy and make changes to their data.

Artificial Intelligence is a double-sword. If used well, it will continue to benefit the general public. If not properly managed, it might be leveraged by those with ill intent to cause harms and companies should bear that in mind when leveraging AI-enabled technologies.

Grace Wang is an IPilogue Editor and a JD candidate at Osgoode Hall Law School.