Excerpts from session description:
Data is mostly seen as a tool: for decision-making, micro-targeted advertising, surveillance, and in some cases for social good, e.g. to increase transparency. However, data nowadays is also an infrastructure critical to social and economic development. Especially for the training of artificial intelligences, the availability of high quality data is crucial and one of the main barriers for the development of local AI-based solutions, especially in the global South where resources to acquire data are scarce. Both the availability of training data and AI-based solutions as such can play a major role in addressing current inequalities regarding access to knowledge, services and the diversity of cultural expressions. Exemplary for impact-driven AI-based solutions is voice interaction: it has the potential to enable millions of people access to information and services they do not have yet, preserve cultural heritage, make technology more inclusive and ultimately foster local value creation as well as digital sovereignty. In this session, we would like to explore different initiatives aiming at creating data commons and digital public goods to learn from their successes and challenges. We will discuss various governance models and ecosystem approaches such as community-governance and multi-stakeholder models with the aim to democratize the potential of artificial intelligence for all.
Excerpts from a participant’s report here:https://dig.watch/sessions/let-there-be-data-exploring-data-public-good
Today’s frameworks treat data only as a commodity and ignore its potential for closing the global digital gap. . . but remains deeply problematic. For example, datasets used to train artificial intelligence (AI) algorithms exclude data from the Global South which makes new technologies even less apt to improve global digital inclusion. Ms Lea Gimpel (Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ)) said that to harness technologies and data potential, we firstly need to close the data gap and gather data from as many different communities as possible. This will show the needs of those currently excluded. Secondly, we need to open up data and allow small entrepreneurs to innovate and train their locally built solutions. Thirdly, we need to ensure that data is not biased.
Ms Renata Ávila (Smart Citizen Foundation) added that once data was seen as a cultural good, whereas today it has been framed as a commodity for profit. A profound cultural shift is needed to take back ownership over data as a common product and a common infrastructure to build projects on, locally, regionally, or nationally.
Online languages and interaction need to be available in as many languages as possible to be able to serve the Global South.
Mr Alex Klepel (Open Innovation, Mozilla) explained that open voice and speech technology is a crucial access point to information and accessible to everyone. The major tech companies own most of the available data and use it to improve their products, while keeping smaller actors excluded. A good counteract is building public-private partnerships such as the one between Mozilla, GIZ, and local innovators in Africa that are running together a voice crowd-sourcing initiative. Mr Audace Niyonkuru (Digital Umuganda) said that the wealth of speech data allows the local entrepreneurs to produce solutions on the ground. This is particularly valuable for Africa since in their cultures spoken interaction is preferred.
Ms Baratang Miya (Uhuru Spaces) warned about data biases. Data commons could give voice to marginalised groups such as women. Developing voice recognition technology is also important for preserving the global linguistic diversity as 40% of languages are in danger, according to Ms Irmgarda Kasinskaite-Buddeberg (UNESCO). Under resourced and under-represented languages are economically unprofitable for companies to document, but we as a people should be interested in preserving the heritage of cultural linguistic diversity, traditional knowledge, and discovering new traditional practices. We need to better use technology to reflect our global communities, and data as a common good can do that.
The roundtable asked if data should then be a commodity at all or should we introduce a type of ‘data socialism’? If researchers interact with indigenous communities and collect data, it is communal data. But once the raw data becomes a new dataset to be exploited for any purpose, who owns it then? Mr Kyung-Sin Park (Open Net Korea) proposed to overcome disconnect between the user and its data through new data ownership, public ownership. New datasets should not be a luxury, but a service for the community. Park advocated that making some of our personal data communal makes sense because we are all social beings and, through interaction, our personal data inherently also becomes communal data. Kasinkaite-Buddeberg emphasised that not all open or public data means the same for each community, so context-based discussions on privacy and intellectual property have to be present. If data becomes a common good, new data governance models will be needed. Everyone agreed that any governance model should be multistakeholder based, owned by the communities who would have to decide what their goals, purposes, expected outcomes, and structures are. Kasinkaite-Buddeberg proposed to focus on seeing data not just as a commodity, but as a value for humanity. A proposal was made to make data free of charge for communities and non-commercial use, but have fees and licenses for companies to use community data. There are many calls for open data, but future debates should focus on understanding how to really incentivise data sharing, as well as on how to build skills for using and maintaining open databases.
One participant said that indigenous people often believe that “sharing data is only giving away identities” and refuse to participate in the voice data crowdsourcing project. We need more capacity building. Their ideas may have been hinted at by the modern copyright regime, which we should push back against in order to encourage crowdsourcing the voice data.
My idea of data socialism will strengthen social ownership of personal data and encourage the companies to open up their data silos for communal benefit. One participant reported that, in Brazil, courts are making available data by banning the captchas making it difficult for small companies to access public data.
There must be distinction between private data and personal data. Why protect data? for privacy. As Anja Kovacs said, privacy is boundary management. There are contexts in which boundary management issues disappear because the data arise out of and constitute the relationship between people. For instance, from a structuralist sense, just as tomato becomes tomato in relations to things that are not tomatos, teachers become teachers and obtain their identity as teachers only in relation to (and thanks to existence of) students willing to listen to them. There cannot be boundary management issues among teachers and students and therefore data protection law should not be used to restrict the information about the teachers’ identities as teachers, which is exactly the holding of the Spickmich case in Germany (teacher reputation platform).
One participant responded, “the structuralist approach is good for communally created data but even ethnicities which are obviously communally created are sometimes in need of protection. For instance, one’s ethnic origin needs be protected.” My response was (and is) that boundary management trumps how data is created. For instance, I can show up in academic conferences and choose not to reveal my professorship.
Another participant responded that social science itself is data extraction. and distinction between what is my intellectual contribution and what is raw data is very difficult. Data socialism makes sense in that regard.
Yet another participant agreed that data is non-rivalrous, and that it cannot be brought back. private data. Others contributed other ideas supporting the concept that data protection law should be used for protecting privacy. For instance, gradations of privacy imbued in personal data can decide how data are protected under data protection law. Data submitted for getting phone services should be protected because it constitutes only the relationship between phone companies and subscribers and therefore has no business in disclosing to others.