Is my AI-generated art original or copied?

Good artists copy, great artists steal!

2023 has not started off well for StabilityAI. They have been the subject of 2 lawsuits. The first one was made by 3 artists, who might be representing many more artists in a potential class-action. The plaintiffs claim that these companies have infringed on 17 U.S. Code § 106, exclusive rights in copyrighted works, the Digital Millennium Copyright Act, and are in violation of the Unfair Competition law. In other words, they claim SD copies the protected IP of the artists and distributes it in a way that competes with the original artists. So the question is does SD copy the artists’ protected IP? Historically, an artists’ style is not protected IP, only individuals pieces of art can be copyrighted. So this lawsuit hinges of determining whether SD copies individual pieces of art and reproduces them or if it only imitates the styles.

The second lawsuit is by Getty images who are claiming StabilityAI scraped and used their images to train SD without permission. The interesting wrinkle here is that ‘scraping and viewing’ Getty copyrighted images is not illegal. However using the images in any commercial work or removing the Getty watermark is illegal. So this lawsuit hinges on whether running Getty images through a model counts as having viewed the image and learned a style/concept or having literally used the images.

So both in a way depend on the question - Does SD memorize and reproduce images from its training dataset? Or does the AI not learn any pieces of art and only learn the various styles and methods of artistry?

Unfortunately, yet perhaps unsurprisingly, a team from Google and academia proved that diffusion models does memorize examples from its training dataset. While the vast majority of the images produced by the models are ‘original’, they found that diffusion models sometimes reproduce near-exact copies of examples from the training dataset. So if you are using Stable Diffusion or Midjourney to create AI art for your own personal or commercial use, it is a good idea to know how to tell if the image you just generated is your original or a copy.

The text below may have some terms that may not make sense if you have never used generative models like StableDiffusion.

Intuition 1 - Memorization

Diffusion models work by sampling images from a distribution of possible images. Every image created by StableDiffusion or Midjourney uses two inputs, a text prompt and seed number. The text prompt specifies a region in the model’s internal vector space while the seed is a random number representing a random point in the distribution. Thus by changing the seed we can get new images from the same distribution corresponding to the text prompt.

Thus each prompt maps to a region in embedding vector space, and all the points within this region are valid output images for the prompts. The mapped region can be of different sizes. If the model’s has a big imagination, each prompt would map to a very large region, i.e. the model can imagine a lot of images that map to a single description. Before training, the model is at one extreme with zero memory and infinite imagination, i.e. any prompt can produce any image possible, which is likely to produce garbage since the number of possible garbage image is infinitely larger than the number of beautiful images. Training is therefore the process by which the model learns the rules of making good images. Its imagination is reduced and it memory increases. The less imaginative the model is, each prompt would correspond to fewer and fewer images. At the extreme case, if the model has memorized just 1 image perfectly, then no matter the prompt the model would only return the same memorized image.

Practical models like StableDiffusion should live in between these extremes, they should only remember the rules and principles of how to make good images without remembering individual images in their entirety. We use large datasets and parameters such as regularization to make sure this happens.

The problem is that we can only set rules and parameters that affect the whole model and dataset, we can’t really control at the individual image level if something is memorized. So while we have gotten very good at making models generalize, even the best models will memorize at least a few data points.

So how can you tell if an image that your generative AI model has created is memorized from the training set vs an original one imagined by the model?

Intuition 2 - Attractors

Remember the more acutely the model has memorized an image, the smaller is the space of images that prompt can generate. So how can we measure this space of images?

Remember for diffusion models like Stable Diffusion, DallE and Midjourney, each generation takes two inputs - the text prompt and a random seed. The random seed is used to create a random starting point, therefore even for the same prompt the model can imagine different images that can be described by the same prompt. This provides some easy ways to measure the imagination of the model.

  1. Given a prompt, generate 500 images with different random seeds. If the resulting 500 images all look quite different from each other, it means the model has a large imaginative region corresponding to this prompt and none of the produced images are likely to have been in the training data. In contrast if a significant fraction of the images (say >10%) effectively produce the same image, then it is very likely that this image has been memorized from the training dataset since the model’s imagination is curtailed in the vicinity of this image.

  2. Given a prompt-image pair, we can narrow down the above search and get a more precise answer with fewer queries. Mask out some important part of the image and prompt the model to fill in the masked area, say 25 times, with different seeds. If the model keeps producing very similar outputs for different seeds, it indicates the model is remembering rather than imagining this image.

So is this something you should be worried about? So none of this is legal advice, that is something for the courts to decide. But if you just want to create content and want to know how likely it is that you are just replicating data from the training set, here are the numbers. The Google/Berkeley team picked 350,000 random images from StableDiffusion’s training dataset and tested if the model tended to reproduce it from memory. They found a total of 109 images that seemed to have been memorized, so approximately 0.03%.

————————————————————————————————————
There you have it, my intuitions for how generative models tradeoff between memory and imagination and how you can determine if the image you generated was memorized or imagined. For more intuitions on AI/ML, subscribe below and follow me on Twitter. You can also check out my other blog and projects on nirsd.com.