Musings on AI, Learning, and Copyright

A photorealistic image representing the complexity and nuances of AI's relationship with copyright and content creation. The scene includes a human figure surrounded by a swirl of books, magazines, love letters, emails, essays, and scribbled notes, symbolizing the diverse content humans consume and internalize. Nearby, a large, abstract representation of an AI language model, depicted as a complex, digital brain-like structure, is absorbing a vast array of similar content, showing the immense scale of data it processes. The background is split into two halves: one side illustrates a traditional library, representing human learning and creativity, and the other side is a futuristic digital landscape, symbolizing the digital realm of AI. The central theme is the comparison of human and AI content consumption and creation, with an underlying question of copyright and originality, subtly represented in the image. — (Image: DALL-E 3)

One of the more contentious parts of the rise of AI is its relationship to the rightsholders for the content on which it is trained. Many consider it blatant copyright infringement.

I’m not so sure.

Before I dive into my musings, I want to be clear: I don’t know the answer. I’m not discounting the value of the content that’s being consumed, and I’m certainly not saying that content creators should be ignored in this process.

Let’s start by looking at how humans consume content.

Throughout our lives we read books, magazines, love letters, emails, essays, scribbled notes, and more. These come from a variety of sources. They might be things we’ve purchased, things we’ve borrowed, or things that are available for free and without restriction.

While we remember some of it, all of it has the potential of affecting exactly how and what we think. In a sense we build an internal model of thought based on all that we’ve consumed throughout our lives.

When we create, we use that internal model to generate our ideas, words, or other creations. While those creations might have similarities to what we’ve consumed in the past, they are our unique and original creations. Striking similarity is not uncommon — we do occasionally hear of musicians being accused of theft, when the music in question was legitimately and independently created. Given the quantity of things being created, occasionally synchronicity seems inevitable.

Now, compare that to how large language models (LLMs), and so-called AI systems, work.

They consume immense amounts of content. Far more than any human could in a lifetime. That content is publicly accessible. (Unless laws were actually broken, or other arrangements were made.)

They “remember” every word. Their internal model mimicking thought is literally based on everything they’ve consumed.

When they create, they use that internal model to mimic creating ideas, writing words, drawing images, or creating something else. While those creations might have similarities to what the LLM has consumed in the past, it’s a unique and original creation. Striking similarity is, I believe, rare, but much like the music example above, we do occasionally hear of a series of words being strung together by AI that happens to be identical to words written by a prior author. Again, given the quantity of things being created, occasionally synchronicity seems inevitable.

My question is this: given the strong similarities to the processes used by both LLMs and humans, where is the plagiarism? Where is the copyright violation?

What’s the difference between me reading your book and writing an essay on a similar topic, and an AI doing the something similar?

I’m not saying that there isn’t something important to be explored here, but the existing concepts just don’t seem to me to apply.

If I asked ChatGPT to write me a story about a sea captain obsessed with killing the whale that cost him his leg, and it responded by spitting out Moby Dick verbatim, that’s plagiarism, no question. Indeed, even if only a few paragraphs of the result were unattributed word for word copies, it would be a problem.

But that’s not what’s happening. Something else is happening, and we need to come to grips with whatever that is, and build some kind of consensus about what is and what is not fair game.

Today’s copyright law and calls of plagiarism aren’t it.

Addendum

I asked ChatGPT. Its response is both insightful, and in some ways, misses the crux of my concern. You’ll find that here.

1 thought on “Musings on AI, Learning, and Copyright”

Imants Vilks

December 4, 2023 at 6:07 am

Chatbots response is understandable and, IMHO, corresponds to reality: for most processes or events for which they have an external world model EWM, humans have a unique, individual, personal experience (body feelings, emotions). If they are not trained in a special field, bots don’t have. Nice answer, which says that soon they will surpass as.
You write: “One of the more contentious parts of the rise of AI is its relationship to the rightsholders for the content on which it is trained”. I learned in patents courses that patent is used (copied) only when all of its properties are copied, taken. But … if they are commonly known, then there is nothing to argue about.

Addendum

1 thought on “Musings on AI, Learning, and Copyright”

Leave a Comment Cancel reply