Open Net speaks at NED event on the risks of “AI sovereignty”

by | Dec 20, 2025 | Free Speech, Innovation and Regulation, Privacy | 0 comments

K.S. spoke at an NED expert consultation titled Democracy in an Era of “AI Sovereignty” on December 11, 2025 as follows:

We should be concerned about AI and democracy because people are accepting AI’s summaries without surfing the web and reading the websites themselves. AI’s output such as deepfakes and AI summaries are being accepted without actually checking the realities. People are spending more time on AI instead of Internet.

Internet is a proven machine for democracy in a sense that it gives every powerless individual the power of mass communication previously reserved to elites that dominated the space and airtime. Likewise, AI has a potential to be a proven machine for democracy in a sense that it gives every powerless individual the power of knowledge previously reserved to elites.  AI is definitely an opportunity for democracy.

Of course, as AI is taking the place of Internet as purveyor of knowledge, we should see AI both as an opportunity as well as a threat. However, a misguided choice to see AI only as a threat is resulting in “AI sovereignty” discourse. Not just a country’s ambition to develop its own AI capabilities but to assert local government’s control of AI and its output to protect and promote the locality.

Firstly, sovereignty discourse has been used to justify data localization. As AI is a stochastic machine that analyzes huge number of human behavior data sets and regurgitates the most statistically probable response to human prompts, there is some truth in the claim that data is the “new oil”. Taking that literally, some countries are trying to keep the data within the country as if they want to keep nuclear weapons or natural resources under their control and within their territories.

We already established that data localization hurts free speech and privacy. As I have written in a book chapter, data localization is equivalent to partial internet shutdown and any benefit from data localization is overwhelmed by its perils of facilitating censorship and surveillance. However, AI sovereignty is being used as a stronger reason for data localization. Philippines with poor connectivity structure and vulnerable language barrier, and Indonesia with poor AI capability and the resulting urgency for upgrading their AI capability, can definitely use “sovereign AI” as pretext for further control. India in 2022 toyed with the idea of LLM localization.

For AI to work for “you”, your behavior data must be in the training data.  For democracy, meaning for making AI to work for everyone, everyone’s data must be in the training data.  Amazon’s hiring data probably lacked enough data on successful female executives. Diversity of the training data set is a key to AI. Having local data not represented in the training data of the global AI will only hurt that locality’s ability to benefit from AI. If Korean contents are not included in the training data of the global AI, it will not return the results that are favorable to Korean’s positions on world affairs, i.e., Dokdo dispute.

Secondly, sovereign AI discourse can been used to justify administrative censorship on the AI training data or the published online content which later becomes the training data anyway. Administrative censorship has become another emerging trend already in the Southeast Asian region, triggered by NetzDG’s appearance, which opens up government’s direct shaping of the online discourse, which will be likely to affect both the input and output of “sovereign AI”. Development of sovereign AI can be intercepted by authoritarian forces to intensify censorship through moderation of training data (e.g., removing all anti-monarchy contents from the training data). Thailand seems government-heavy in “sovereign AI” development compared to Korean “sovereign AI” largely led by private sector (supported by government).

All foundational models do come with certain pre-training which means they are already influenced by or reflect the foundational training data curated by the developer. Deep Seek although open source comes pre-trained with the training data censored by the Chinese government, which explains its inability to address some of the politically sensitive events within China. So some moderation of the training data or other types of post-training may be needed if you want a global AI to work for your country lest the particular model has not been trained with or reflects the local nuances.

Even for other reasons, training moderation is needed. Chatbot Tay of Microsoft became racist and sexist because it was trained on the pre-existing data sets in Twitter and learned that being controversial is a key to getting more followers. If there is a feature of human behavior that we as a collective do not want to see, we should decidedly remove from the training data. This is so because the current version of AI is based on machine learning.  It learns by trial and error as a child learns.  A child can never define a cat or a dog.  But after a child is shown many many photos of cat and many many photos of dog, a child knows a cat’s photo and a dog’s photo when it sees one.  If you accept the metaphor to a child’s learning, what would you put in the textbooks used for training the child? Would you not curate carefully what is in the textbooks?

However, the government should not be involved in sanitizing the training data. The civic groups and AI developers should work together to sanitize and curate the training data. Maybe, we need a regulation that forces such sanitization process without dictating what needs be removed.  Any such regulation should have a full prohibition on data moderation by the government.

______________________________________________________

Organizers’ prompt

While “AI sovereignty” has become a global buzzword, the implications for democracy are relatively undercovered. This hour-long, discussion-based virtual roundtable will provide an opportunity to reflect and hear from a global group of experts on how the push for greater government involvement in AI development could impact core democratic principles, from free speech and privacy to self-determination. We will consider, across a wide range of geographic settings, the goals and concerns driving the “sovereign AI” push, and pathways to address these concerns in a way that safeguards human rights and civic space.

Specific questions to explore include:

Positive visions: In partly free settings, what can civil society do to channel state interest in AI in directions that support goals like good governance and democratic accountability? In closed societies, can we find bottom-up approaches to AI that offer their own vision of self-determination?

Data risks and opportunities: The development of “sovereign AI” creates a new impetus for states to pursue data localization strategies. In authoritarian settings, however, data storage in-country may make local activists and dissenters more vulnerable to state suppression and control. How can digital rights advocates distinguish between justified economic and national-security approaches to data governance, and dangerous authoritarian power grabs?

Censorship or cultural preservation? At the content layer, states that have traditionally been importers of AI systems are seeking to develop models more suited to local contexts. This ambition reflects genuine concerns, including among digital rights advocates, but also provides a potential entry point for censorship. What kinds of safeguards (market-based, technical, political) can help uphold the right to free expression in a “Sovereign AI” era?

Sovereign AI and the “splinternet”: Government control over certain AI models present greater risks where others are inaccessible. At the same time, from Russia’s deepening “sovereign internet” capabilities to Geedge Networks’ export of China’s “Great Firewall” tech, novel threats to global connectivity are intensifying. Over the coming 3–5 years, are more users likely to begin encountering “sovereign AI” technologies within the boundaries of closed digital ecosystems?

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *