There is hardly any doubt that Artificial Intelligence (AI) holds plenty of promise. At the same time, there are concerns that the use of AI might have exclusionary effects. There are economic concerns, for one, such as those about the jobs landscape and how it will be altered, with only the companies owning AI able to flourish. There is also the apprehension that AI algorithms will be used for credit checks and information sorting that will intensify discriminatory behaviour in hiring, for example, or granting loans. The third concern is the distress that may be caused by the fact that only a few companies own and can provide such high volumes of data. This article discusses the third area of concern.
What makes AI “intelligent” is the data fed into it. What makes AI intelligent is the experience, memory and identity; just as protein molecules alone do not make up a human brain. Therefore, what makes AI more equitable will be the equitable availability of the data used to train AI. What then are the ways to make distribution of data more equitable? Interestingly, one such attempt has been the drafting of data protection laws to make data subjects the owners of the data concerning them, or at least give them ownership-like control. What is ironic is that such ownership scheme made it more difficult for personal data to be shared with one another, e.g., judicial decisions database, creating an elite class of judges who have monopoly over how the law is meted out, and eliminating the possibility of using AI to monitor and hold accountable the judicial system to make sure it works for all and not only for the few rich. Where did we go wrong?
Who owns data about oneself?
The central tenet of data protection law, “one owns data about him or her, and therefore should have control over that data” may sound good, but it is not always sustainable and compatible with respect to others’ freedom of thoughts and expressions. For example, the statement, “K.S. Park is a professor” is data about this author that is known to many. That this author, K.S. Park, can control circulation of such data will be impossible in a free society, especially after he has introduced himself as a professor to countless numbers of people. When and under what grounds can this author then control perfectly lawful data that resides in another person’s head, assuming it is non-defamatory and non-confidential?
The mental exercise, “one owns data about him or her” originates from the concept of “data surveillance”, a term coined by Alan Westin in his 1967 book, “Privacy and Freedom”1. The idea is that when a person discloses data about themselves to governments and companies, the processing of such data for purposes not contemplated by the data subject or the disclosure to agencies not thus contemplated can constitute “surveillance”—in a sense that the data controller will learn more information about the data subject than they intended to give the agency. Of course, the term “surveillance” usually means acquisition of data about another against their will, such as wiretapping, or search and seizure. But even voluntary disclosures of data can—if the conditions of the disclosures are not adhered to—lead to revealing something about oneself against one’s will, and thus bringing about the term “data surveillance.”
Westin, in an effort to protect people from data surveillance, proposed giving all data subjects some sort of property right on the data about them, because making promises about how the data will be used is insufficient—these promises are hard to enforce, and more importantly, powerless individuals will have a hard time bargaining those promises from the governments and companies.2 By proposing a property right, as opposed to a contractual, the data subject’s control will be a default position from which the data controller can depart only by obtaining affirmative and explicit consent from the data subjects whenever it takes the data, for the purpose and scope of data use and disclosure, just like someone borrowing a car from another will have affirmative duties to obtain consent from the car owner about its use. Westin’s proposal has persuaded an increasing number of countries and people around the world and manifested itself in the form of data protection laws. Against the background of this success, the property metaphor has hardened into “owning data about oneself.” Indeed, data protection laws are highly effective tools for protecting the rights of powerless individuals who, in disclosing data to a mega data controller, do not have the acumen to bargain or enforce the conditions of such disclosure.
Ownership as a Correction to Market Failure
Data ownership was an attempt to address a market failure: the data subjects’ lack of bargaining power in data transactions. This is consistent with the genealogy of other forms of ownership. The idea of private land ownership originated from people’s experience with the tragedy of the commons.3 Institutions such as intellectual properties were created along the same lines, i.e., in order to encourage arts and other cultural productions which otherwise will be subject to rampant piracy and free-riding on others’ creations.
In order to call something a “market failure”, there has to be axiomatic values through which things are evaluated. In the case of the tragedy of the commons, the value was the production of livestock, and in the case of copyrights and patents, the values were the advances in the sciences and arts. In both cases, the values are unassailable as they are. What is data ownership trying to protect—efficiency, or more data usage perhaps? It is privacy of powerless individuals.
The fact that the forms of ownership have axiomatic values to serve means that data ownership should also be limited to the extent that it serves those values. Since the concept of data ownership was concocted to compensate for the data subjects’ lack of bargaining power on the point of disclosure, and thereby prevent unwanted subsequent use and disclosure of the data about them, it is important that it is not mechanistically applied to all data but only to that which has not been made available publicly. Publicly available data has no point of disclosure that the concept of data ownership needs to intervene to strengthen the data subjects’ bargaining power. The paradigmatic situation for that concept works in the following manner: when a data subject has kept certain personal data within a zone of privacy and later transfers out of such a zone to the governments and companies, the concept of data ownership kicks in to ensure that its subsequent use or disclosure does not depart from the data subject’s original will, with strong force that contractual law will not provide.
This means that the concept should not be applied to personal data that has
already been published to the public on a voluntary basis without any condition. That “K.S. Park is a professor” will be exactly an example of such data. In the same vein, the data lawfully compelled into disclosure (for instance, the publicly noticed data of a company owned by a data subject) will be included. Such definition is consistent with a common sense that it is no surveillance to acquire data that everyone knows.
Indeed, a closer look at the world’s data protection laws reveals a thread of such philosophy in Australia4, Canada5, Singapore6, India7, and Belgium8 which explicitly leave publicly available data out of the purview of data protection laws. The 2004 APEC Privacy Framework also states that a data subject’s right can be limited with respect to publicly available data. In 2000, the EU and the US entered into a safe harbor treaty9 on application of the 1994 EU Data Protection Directive on U.S. data processors, which also left out publicly available data. Further upstream, the provision in the 1980 OECD Guidelines excludes from application the data that has no risk of infringing on privacy.10
What is more important, if data subjects can control even publicly available data about us, then the data protection laws aimed at preventing surveillance will instead do the exact opposite: We would be censoring our colleagues, monitoring them in order to watch what data they acquire, and be able to intervene when we want. Such mutual interference will empower only the powerful who can purchase censorship tools. Data protection law was supposed to be an equaliser but if applied beyond its mandate, can do the opposite in that sense as well.
To recap, what is more important is that data ownership should have their values straight to be justified as a correction to market failure. Data protection law and its incidental metaphor to ownership exist to protect privacy; in the same manner that copyright laws serve to advance the arts, patent laws, to advance science, and real property laws to maximise land use. These are all ownership or semi-ownership laws but they are all clear on what they try to achieve. What of data ownership? If it is privacy, its application should be limited to non-publiclyavailable data, otherwise it can erode from its socially equalising function of empowering the powerless.
Inherent Limits of Ownership
Ownership itself has a limitation as a metaphor. What does it mean to “own” something? Exclusive possession and control do not suffice because those can be obtained by virtue of contracts. Hegelian scholars got together in the 1980s in the United States to figure this out. Ironically—and somewhat tautologically—they concluded that when the statement, “someone owns something” is made, what is meant is that he/she can disown it, meaning that he/she has the legal right to transfer ownership to another person however restricted it may be. Both real properties and intellectual properties satisfy this requirement. If a person is an author of a book, and therefore the owner of the intellectual properties of that book, that person’s ownership means that they can transfer whatever Hohfeldian matrix of rights and privileges they have to another person, such that the other person has the same rights and privileges even to the exclusion of the transferor. This is a legal feat that no other person can achieve other than the so-called owner of the book.
Then, is data ownership—whether data is owned by the data subject or someone else — a good idea at all? This article is not trying to posit that data is not conducive to ownership due to its non-exclusive and non-rivalrous nature. Data can be owned just as copyright can be owned, although the factual possibility of such ownership will be different.
However, who will own that data? Data is produced communally. Data is the result of perception. Earth itself is not data. Its presence becomes data only after there is that interaction between nature and sentient beings. It is difficult to have anyone—data controllers, data collectors, or database makers—claim exclusive dominion over such relational existence. For instance, this author is a professor only because there are “students” willing to listen to his lectures. Personal data that “K.S. Park is a professor” is the result of other people’s recognition of him as a professor. His identity will be meaningful only to the extent that other individuals, i.e., students, are aware of that identity. If the concept of data ownership aims to protect privacy, that privacy is limited by the constraint that some personal data are born communal, and therefore are not conducive to ownership by any one person.
This critique is most effective against data subject’s ownership—basically data protection laws—which tries to go beyond its original goal of serving privacy, and intervenes in governance of publicly available data. This concept of data ownership by data subjects may be feasible for some parts of human civilisation but not for the rest. After all, human civilisation thrives on the transfer of personaldata; human civilisation is, in essence, data transfer itself. Education, literature, and arts are reflections of what we perceive and acknowledge about ourselves. Giving ourselves consent powers over these perceptions and acknowledgements can have terrible consequences on human civilisation.
We should think about data socialism: the idea that data is shared amongst people as much as possible for the maximum benefit of society. This is not necessarily an opposition to data protection laws which grant data subjects ownership over data about them, thereby on the surface hampering the community use of personal data, but can be an improvement upon data protection laws by carving out a family of personal data that can be freely used for social discourse, for instance, “publicly available data” such as Singapore, India, Canada and Australia do. Also, the principles of open data and open government can demand the carving out of some personal data from the strictures of data protection law, such as court decisions database and other records of government agencies taking or deliberating adverse actions against their own citizens.
These improvements on the existing data protection laws can enhance equitable availability of AI-training data: Currently, government agencies and companies justify building the closed silos of personal data and not sharing them with people, citing the concerns of data protection laws. Some of these concerns are justified as they are but other concerns need be moderated with or balanced against the people’s need for that data for participatory democracy. Case-by-case exceptions, already built into data protection laws, are not sufficient because they will maintain chilling effects on people wishing to use the data for AI and other socially beneficial uses. Categorical exceptions need be carved out so that within these exceptions people can enjoy freedom of expression, open data, and democracy without worrying about their discourse satisfies any collectivistic (or majoritarian) notions of public interest, which often can crush beneath its pluralistic visions of a society.
In the film, Deus ex Machina, the guru reveals towards the end where he got his AI training data from: the internet. What is on the internet will be determined by data governance rules including data protection laws and the exceptions to them, as one could see what the “right to be forgotten” does to the availability of data. To allow people around the world to benefit from this learning software called AI, there should be data governance rules to allow as open access as possible to as much data as possible. Personal data normally may be deemed individually owned, but there are circumstances requiring social and communal ownership of personal data by all members of that society.
In this line of thought, Asian countries may fare better than the EU or the US because they have worked towards instituting data protection laws that are comprehensive, and at the same time have exceptions to publicly available data, thereby allowing the balance necessary for data socialism. Even still, many countries across the globe have not yet implemented data protection laws at all, or are only recently adopting them. These countries can benefit from what data-progressive countries have realised in hindsight, and create data protection laws that adhere to global standards while simultaneously being customised to that particular country’s own requirements. In this manner, data protection laws will benefit data transference between and within countries, after having clearly demarcated norms that identify ownership, assure privacy, and secure against surveillance.
1. Westin, Alan. “Privacy and Freedom.” New York: Athenum. 1967.
2. Ibid., p. 324-325
3. Hardin, Garrett. “The Tragedy of the Commons”, Science. 162: 1243–1248. 1968.
4. “State and Territory Regulation of Privacy: Child Witnesses in Australian Jurisdictions.” ALRC. 16 August 2010.
5. Personal Information Protection and Electronic Documents Act (S.C. 2000, c. 5), Article 7: Regulation defining “publicly available information”.
6. Greenleaf, Graham. “Private Sector Uses of ‘Public Domain’ Personal Data in Asia: What’s Public May Still Be Private.” 127 Privacy Laws & Business International Report, 13-15; UNSW Law Research Paper No. 2014-27. 1 February 2014.
8. See the website for Loi Traitements de données à caractère personnel (LOIWET). Refer to art. 3, § 2. 8 December 1992.
9. See the website for Hunton Privacy Blog.
10. “OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data.” OECD. 1981.