Navigating the Challenges of ChatGPT: Privacy, Intellectual Property, and Education

Author: Marco Correa

Due to the release of ChatGPT the use of generative AI tools in educational contexts has been the subject of intense debate in universities and schools around the world. Understandably, the main concern of the academic community is the use of large language models (LLMs) by students to generate assessments and essays, with implications for academic honesty and content trustworthiness.

At LLInC, we have also been exploring the legal perspective in order to guide our teachers, researchers and students on the implications of using these technologies. This perspective has focused on the data flow involved in generative AI: both the information fed into the model (the input) and the result produced (the output) raise significant legal challenges.

Inputs and data protection

Earlier this year, the press widely reported that the Italian Data Protection Authority had temporarily suspended the use of ChatGPT due to a lack of age verification and other breaches of the General Data Protection Regulation (GDPR). OpenAI, the company behind ChatGPT, responded making some changes to the service and access for Italian users was restored within a month. However, some privacy concerns remain, which is why the Dutch Data Protection Authority is expected to start a review of OpenAI’s data processing activities.

Beyond the ChatGPT case and OpenAI’s GDPR compliance, there is a broader concern about user input to generative AI systems, particularly the instructions or questions known as ‘prompts’. Unsurprisingly, users often add personal data (and other forms of confidential data) to prompts, whether consciously or not – and some people even encourage this (e.g., when recommending AI tools to improve your CV). In an academic context, prompts may include not only the data identifying the researcher or student using the tool, but also personal data of other people involved in their research or assessment (e.g. medical data, surveys, etc.).

Personal data used in prompts is not only sent to the company owning the AI tool to generate an immediate output, but also used to continue training the model for future outputs (not just for the specific user, but for all users). This is explicitly stated in ChatGPT’s Privacy Policy, which states that users’ data may be used to improve the service. Following the suspension of their services in Italy, OpenAI added the option to opt out of having your data used for AI training, but this means that such processing will remain the norm, as most users are likely to keep the default settings.

Outputs and Intellectual Property

Generative AI systems are pre-trained with information from various sources – for example, ChatGPT has been fed with 40 terabytes of text, the equivalent of 40 million books. However, these sources may not necessarily be used with the permission of their owners, which could constitute an infringement of intellectual property (IP) rights. Illustrators and a stock photo company have already announced copyright lawsuits for this reason. In principle, this should only be a problem for the companies that train AI models, but it may eventually have consequences for their users, as AI output can be considered an IP infringement if it is seen as an adaptation of a pre-existing (copyrighted) work. When users reuse AI-generated material, they may also be infringing third-party copyright.

The IP implications of using AI in an educational context are crucial, as the results can not only be used by students, but also presented as their own creation, which constitutes plagiarism. In fact, the use of AI for assessments can be considered fraud according to the Leiden University regulations. However, the main challenge for educators is to use plagiarism checker tools correctly, as there is always the possibility of wrong detections (as false positives or negatives). AI detection tools, on the other hand, should not be used by teachers as an indicator of fraud, as they have important limitations to be considered evidence.

This is not to say that human review is infallible, either. One study comparing research abstracts from medical journals with ChatGPT-generated abstracts showed that blinded human reviewers misidentified 14% of original abstracts as being generated and 32% of the generated abstracts as being original. Recently, AI-generated images have won digital art and photography competitions, showing that AI tools can even outsmart experts. Therefore, it is important to understand the capabilities and limitations of these generative AI systems to manage student grading responsibly.

Challenges and opportunities

Generative AI should not be seen as the end of assessments or education in general, but rather as an opportunity to improve them, as most of these challenges already exist. There are several other fraudulent practices in education, from ghostwriting to rewriting texts with the help of translators. Students and teachers are already using online platforms and AI tools that process large amounts of personal data (e.g. social media, proctoring), sometimes without thinking about the further use of this data by companies (and sometimes governments).

The disruption caused by ChatGPT can provide the momentum to discuss and find ways to address all these challenges, but also to raise awareness of the legal and ethical implications of using AI, in order to prepare students for a (near) future in which this kind of tools will be part of our lives.