Privacy regulation applied to AI training, inference, and feedback loop: Open Net at AI Action Summit

by | Mar 9, 2026 | Innovation and Regulation, Open Blog, Privacy | 0 comments

The Centre for Communications Governance (CCG) at the National Law University Delhi and the Global Network Initiative (GNI) organized the Strategic Multistakeholder Dialogue (Dialogue), a two-day programming event in New Delhi on February 16th and 17th, during the week of the India AI Impact Summit. These convenings aim to foster inclusive discourse around global AI governance, especially from academia, civil society, think tanks, researchers, and underrepresented voices from the Global South. Through these events and the activities we are organizing leading up to them, we will enable multistakeholder participation in the India AI Impact Summit, build cross-regional connections among actors engaged in AI governance, and establish infrastructure for sustained multistakeholder engagement in AI governance. The February 17th Reinforcements & Learning: Multistakeholder Convening on AI Governance has been recognised as an Official Satellite Event of the AI Impact Summit 2026.

KS Park spoke on February 16 at the following session: Protecting Privacy Amidst Rapidly Developing AI Systems: Myth or Mechanisms?

This session opened with a discussion of the technical and sociopolitical complexities of protecting privacy in the AI age. For example, what are the technical considerations of implementing user privacy in AI systems, including emerging agentic systems, given issues of “unlearning” and memory? Then, we will discuss how existing privacy regulatory regimes are working in this new context. How are privacy frameworks holding up in practice to new AI systems? What tensions have been identified? What might need to evolve? The discussion will also explore the efforts to balance innovation, transparency, and privacy.

There were 5 different ways of building AI, now only one has survived. The one that AI directly mimics humans. There are different humans. Who among them? AI is a stochastic machine that averages over lots of human behavioral data and regurgitates what it decides to be the most statistically probable human answer to a human prompt.   10 billion cat photos are shown together with 10 billion non-cat photos.  From these photos, AI learns like a child learns what adult humans tend to recognize as cats, and these human tendencies or technically called vectors are the only things remembered by AI. AI does not have any understanding of felinity.  In that sense, AI is a statistical tool with zero creativity. It is no wonder, according to Ben Affleck, the product of AI converges on mediocrity because AI is an averaging tool.

AI goes through three phases: Training phase – Inference Phase – Feedback/Improvement Phase. AI does not do the averaging all the time. Most of the averaging is done before it is used, and we call this the training.  This is where the most computing power, electricity and powerful chips are needed.     

The training phase is theoretically not supposed to violate anyone’s privacy because the training data is all tokenized and only the relationship between the tokens are memorized, not the data themselves.  The closer relationship of the pointy ears to the word ‘cat’, that relationship is rememeberd.   

But there is a problem of overfitting.  If too many photos of Harry Potter are in the training data, AI may average all the requests for a wizard student into Harry Potter, in which case the output is Harry Potter’s personal data. To prevent overfitting, anonymization of all training data should take place. 

A Korean company Scatter Lab’s Science of Love, gives relationship advices to anyone who submits their conversation transcripts with their partners.  They then later trained AI on those conversations and created Lee Luda, a chatbot. It kept giving out the actual addresses of the real people that was part of the conversation transcripts that took place in Science of Love.  So this was an example of overfitting.  Of course, there is an issue of whether addresses alone without more which have no semantic value constitute personal data but here the data does have semantic value which is that the address holder used Science of Love. Anonymization should have taken place, whereby the actual address should have been removed from the training data.

At the inference phase, the input consists of a human prompt and the contextual information for that specific session.  

As to how long the human prompt can be retained should be of course controlled by the users. . . not just because it is privacy-infringing but it is a feature of user experience.  When you use Google Map, you hate it when it keeps referring back to your old location in overzealous subservience to your needs. However, this type of targeting or information curation is pre-AI.  It is not particularly AI problem.   

We already have an emerging jurisprudence on targeted advertising under data protection law.  Korean Court in 2025 for the first time in the world found the advertising network operator Meta and Google responsible for not obtaining informed consent on such use of data. Now they did obtain the consent but it was not informed consent for the 3rd party behavioral data (i.e., the user’s behavioral data on third party websites that have entered into a contract with the advertising network and transferred the data to the network without the users’ knowledge. This lack of knowledge was addressed in EU by the rule that such websites must show consent banners before the users start using them while outside EU no solution has been presented before this Korean case).  Please note that targeted advertising affects only what you yourself see in the future. This means that we already decided that using my personal data on the transaction with future “me” constitutes privacy violation even if the data was not transferred to third parties or otherwise breached externally.

At the feedback-improvement phase, the target labeling and human evaluation are fed back into AI to adjust the vectors (retained human tendencies). Now this phase is important for privacy purposes because that is where the information from one user’s session can influence or can be even disclosed in other users’ sessions. For instance, let’s take the hypothetical of a modified Lee Luda.  Let’s assume that Lee Luda re-trains real-time depending on the feedback from some of the users, which means that Lee Luda adjusts its vectors depending on the   Those users’ feedback can influence or can be disclosed in Lee Luda’s session with other users.  Again, there must be a filtering process whereby such data recycled for re-training is rid of personally identifiable data.

Conclusion: 

Agentic AI or conversational AI are qualitatively no different from the existing foundational model in that it goes through both the training and inference. AI is only a stochastic tool.   Strong data protection law will suffice, first by requiring anonymization of training data both at the training stage and the feedback stage, second by limiting the retention period for human prompts using the similar jurisprudence on targeted advertising.

Now, the fact that AI is a tool does not mean that it will not be privacy-threatening in a uniquely AI way.  Agentic AI making $500 purchase which the user would not have approved, is possibility a privacy problem if we see it as informational self-determination.  We need singularity control, i.e., regulation on endowing AI with independent volition and ability to act upon that volition, which can be introduced in the form of institutional review board used for bioethics. 

Another control we need is to make distribution of data more equitable, otherwise privacy violation through the feeback stage is more likely to happen if a small number of LLMs is used by a huge number of people. The real reason people are concerned about privacy is a scale. Doubling down on intellectual property will not do that. Open data will do tha.  Data portability will do that. Open source AI will do that.   

Powered By EmbedPress

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *