Critical TOPICS: AI IMAGES
CLASS 1: LOVE IN A TIME OF CHOLERA
The video below is a recording of the class lecture, followed by a slightly altered transcript re-written for reading.
Lecturer: Eryk Salvaggio
Hello everyone! Welcome to Critical Topics: AI Images.
First off, I want to set some expectations of what we’re going to do in this course. IM 450 is a critical topics class. That means we’re going to do some thinking about AI images. What are they exactly? And what do they mean?
Now, if you’ve done any kind of design you know that the materials and medium we use are used with intention, right? We choose the medium of our expression carefully — we want to use the best tool for the story we want to tell. Think about photographs and movies, for example. If you want to show someone’s face or remember a specific moment in time, a photograph might do the job. But if you want to tell complex stories that unfold over time, with dialogue and motion, maybe you lean into film or video.
With AI images, we’re really at the birth of a new medium. That’s an exciting time to think about what we can do with it — anything you do makes you a kind of groundbreaking explorer of new artistic forms! That’s an important feeling to hold on to, because we want to use that to make things that make us — or audiences — think differently.
But we also want to be careful. In this class we’re going to look at the backstory of Artificial Intelligence. What does artificial intelligence mean, really? What impact does it have on the way we make art and images? What impact does it have on the way we think, and the way we work? Is AI all that new after all? We’re going to think critically about a lot of that. I want to be clear: I teach this class because I think AI images are exciting, and fun, and sometimes, almost magical. But as we go through the class, we might find out a lot of pretty nasty stuff about what goes on behind the scenes. It makes sense to balance these two things — what excites us, and what upsets us — so we can try to make things that we feel good about, and other people feel good about too.
I want to make sure that you, as artists and designers, are informed about this history. I want you to make intentional decisions about how you use these tools, and maybe, one day, how you design these tools or with them. I want us to think about the way we imagine AI images: how they’re made, what they mean, and why we might use them.
You’ve probably seen a lot of these AI images created by tools like Stable Diffusion, DALLE2, or Midjourney. Maybe you’ve used Lensa, or even checked out some of the tools offered by Hugging Face or Runway. You probably know that these are images created by typing some words into a window, called a prompt, and getting images back in exchange.
Here’s an example of one, to the right: This is an image created by Stable Diffusion, which combines the style of a psychedelic film director Alejandro Jodorowsky with the Disney science fiction franchise, Tron. This is the result of putting some words together in a screen, describing the scene in detail, and probably tinkering with something in photoshop. But the images were never staged, or filmed, they’re the result of this AI tool, Stable Diffusion, just one of the many tools that are out there.
For our first class I want to go a little deeper into what these images actually represent. Because they aren’t photographs: there was no camera set somewhere to take a picture of a real event. Some people suggest that they’re a collage: that bits and pieces of these images are taken from other images that are in the AI’s dataset. That’s not quite right, either.
So today, I want to talk about what I think is the better way to conceptualize AI images.
Part 1: Love in a Time of Cholera
In this class we’re going to look at AI generated images through a few different lenses.
The first is to think of them as infographics. You’re probably aware of infographics, or data visualizations.
On the left you have an image of two people kissing, generated by DALLE2. On the right you have a map of a small area of London, with black dots representing cholera cases in 1855. You may not think these images have much in common. But what I’ll suggest - and what this course is designed to make clear - is that they are essentially the same thing.
In 1854, a Cholera epidemic was sweeping London. Nobody quite understood what the cause of this disease was. At the time, it was widely believed that Cholera traveled in the air as an ethereal miasma. Nobody knew what germs were or how they spread. On the right, you have a map created by Dr. John Snow. This map visualized the locations of Cholera deaths in the Broad street area of London.
Famously, Snow’s map showed us, visually, that these deaths tended to surround a single community water pump. Today this is common sense, but at the time, this was quite controversial, and he had to convince the city council to turn the pump off. When they did, they saw a corresponding decline in Cholera cases. This map is often credited with creating the field of epidemiology in Europe.
The map is a powerful visualization of information. But I want to point out that this map showed us things we already knew. It represented information we already had, but it presented it in a new arrangement. It’s a data visualization. Here you see the kind of data Snow was using to make his map. Charts and tables. Only after he plotted deaths and illnesses as little black squares (in the third image) could he see that they tended to cluster around the center of the map, an area which happened to have a water pump.
And so I want to bring your attention to something here. There are many layers going on. The first layer is the real world: people getting sick on the streets of London. The second layer is the information gathered about those people: gathered by doctors, each number representing one of those people. And finally, you have the map, where black squares represent the information gathered by doctors.
So remember the foundations of a data visualization. First you have reality. Then you have data describing that reality. And then you have images that show us that data in new ways. At each step of the way, you reduce things, cut out information you don’t really need.
Maybe we can imagine it like this:
The pyramid shrinks as you move upward, to reflect that shrinking of information that’s present in the graphic. But it also leans, and that’s important. Because each of these layers represents somebody, or some thing, making a decision about what to include and exclude. We call this bias. And we’ll talk more about bias in this class - we will talk a lot about bias in this class. But for now, remember that whenever we shrink the world down to data, and shrink that data, what we include and exclude matters.
Part Two: The Datafication of a Kiss.
Every AI generated image is an infographic about the dataset. AI images are data patterns inscribed into pictures, and just like John Snow’s map, they tell us stories about that dataset and the human decisions behind it. That's why AI images can become “readable” like infographics or maps. When people make images, they consciously and unconsciously encode them with certain meanings, and viewers decode those meanings. Even human made photographs draw their power from intentional assemblages of choices, steered toward the purpose of communication.
When we make images, we bring references. When we look at images, we make sense of them through references.
To the right is an example. It’s a photo that went viral, taken by Richard Lam. It shows a couple, on the ground, clearly in some kind of riot. The couple is kissing. At first glance, if you are an American like I am, you might think this is from 2020, when protests swept the US. You might see it as romantic, as a couple that was so into each other that they wouldn’t let a riot stop them from making out. You may see this image as something political and radical.
But this photograph is from the real world, and there’s lots of information happening beyond the frame of the photo itself. In fact, this isn’t taken in America; it’s from Vancouver. It’s not from 2020, it’s from 2010. It’s not a political protest: it’s the celebration of a sports victory. And the couple isn’t making out. The woman was knocked to the ground, and was crying and the boyfriend was making sure she was OK.
When we look at an AI image, we don’t have a real world to find out the real story. We have the story we make up in our heads. And we can’t go and find out the facts about what’s going on. Instead of the real world, like a photograph, AI photographs only have data. They draw out pictures from information about the real world.
Here are pictures of kissing, created by DALLE2. Kissing doesn’t make a lot of sense to a machine. The machine has scanned hundreds of thousands of pictures of people kissing. It understands patterns of pixels that get lumped together whenever the word “kissing” is in the caption of an image online. Yet, when it creates these images, there’s something we as humans sense is a little… weird.
So photographs already frame the world and tell us specific stories about that world. We can think of real-world photographs as photographer going out in the world, in search of specific information to include inside the frame of the photograph. The photographer will leave out irrelevant information, and really focus on the things that tell the story they want to tell.
In many ways, people can do the same thing when they collect data about the world. When you think of a dataset, you might have an idea in mind — like the collection of information we saw earlier, about the Cholera epidemic. A category, and some numbers that count up the things in that category. A dataset can also be a collection of images, or pictures, a series of social media posts or videos. Data is really anything we want to collect from the world in order to learn about it. But it’s almost always a sample — even if it contains billions of pieces of data, it’s nothing compared to the complexity of the real world.
When we talk about AI, we will talk a lot about datasets. Datasets are really what we talk about when we talk about AI. But this word, “data,” might make you think it’s objective, or neutral — free of human bias. In fact, that’s impossible. When we make a dataset, we operate within specific cultural, political, social, and economic contexts. How we decide what to include inside our datasets is just as biased as the way we decide what we might keep inside the frame of a photograph. Scientists and researchers and AI companies have a question we'd like to investigate: What’s going on during this sports riot? We find samples of the world that represent that question: Well, here’s a couple kissing. We snap the photo when those samples come into specific alignments, as if to say, “Here is a thing that happened at the sports riot.”
The photographer has to exclude a bunch of stuff. For example, we’d think it was weird if the newspaper ran an extreme close up of a piece of gravel from the sports riot. But it was still there! The concrete was just as much a piece of the story of the riots as the couple making out. But we focus on the information we find in the world that matters to the story we want to tell, or that we think will help us answer the question. Everything else, we kind of ignore.
We record data this way, too. We have questions: What’s going on with this cholera epidemic? We identify where the useful data for answering that question might be: Well, who gets cholera and what do they have in common? We seek out ways to capture that data: Where do they live? Once captured, we allow machines to contemplate the result, as in machine learning, or we work through it ourselves, as John Snow did.
In this class, we will think about AI images less as photographs, and more as data visualizations. There’s a reason for this: if you want to make AI images, you need to understand how they get made. And the way they get made is through a combination of datasets, algorithms, and content moderation decisions. As designers and artists using AI, it’s important to understand these systems, because by understanding them, you will learn how to steer them toward doing the things you want to do.
Along the way, we’re going to learn a lot about artificial intelligence, and data, and images. These things go well beyond making images, and so we’re going to think critically about them, too, because artificial intelligence is a tool. With any tool, you want to know why you’re using it. Why use AI to make an image when you could use photoshop or a camera? By knowing more about how AI works, and what AI means and how it means, it can help you make better decisions about when and how to use AI.
How to Read an AI Image
For a deeper read on this topic, I’ve written How to Read an AI Image. It was assigned as a reading for the class, the section below is a review.
We’re going to do something called media analysis. Typically we use media analysis for film, photographs, or advertisements. AI images are not really films or photographs. They're infographics for a dataset. But they don’t tell us how to read them. We have to figure that out for ourselves. It’s like reading an unlabeled map.
The first step is to find an image that seems kind of interesting. I stumbled across pictures of people kissing and found them to be incredibly strange. So I wanted to explore it a bit. With DALLE2, you can enter a prompt into a window, and you get four pictures back. I asked it for “Studio photograph of humans kissing.” Here are some examples of what I got back. (It’s the same image you just saw, I’m just adding it here for easy reference).
If we want to understand why the images are the way they are — weird — the first step is to ask: how does the machine know what kissing looks like? We’re going to go into this in more detail when we talk about GANs and Diffusion later. But for now, the short answer is: it has billions of pictures, paired with captions, and it identifies patterns in those pictures with those captions.
Below is an example of one way to see this data. You have this big circle, with every image in the dataset - at least 4 billion image descriptions for something like DALLE2. Inside the dataset there are labels for photorealistic images, somewhere else there are labels for actions and verbs like running, jumping, cooking, kissing. There’s labels of images of people. And somewhere between all of these things is the label for kissing.
When we type a prompt into the window and ask DALLE2 for that image, it assembles information from all of these categories and recreates pixels that fit these patterns.
So we may want to start asking some questions about these categories and where they came from. Where did this information come from? Does that change what we see or don’t see? What’s in the dataset and what isn’t? And how was this data collected? Is it representative of the things we want to make? These are the types of questions that we’re going to explore in this class as we think about how to get these tools to do the things we want them to do. So keep that mindset with AI images, and in the next few classes, we’ll explore how we can sharpen our thinking of these images, and the way AI image models work.
Want More Stuff to Do?
For more on reading an AI image, I recommend reading “How to Read an AI Image,” linked above. If you prefer video content, I’ve recorded a lecture for students at Aarhus University that covers the entire process, step by step. It’s embedded below.