Critical Topics: AI Images,
Class Twelve
Seeing Like a Dataset
This class doesn’t have a video, because it featured a GAN-training exercise using Runway ML, which is no longer online. Instead, it focuses on the opening of that class, which described my own practice — consider this an artist talk by Eryk Salvaggio — dealing with gathering datasets and generating images from them. In it, I am going to blend a technical description of the process, as best as I can, while describing the ideas I was working through conceptually. I think these things go hand in hand for most artists who work with these technologies.
Some of the text below is elaborated further in two publications. Infinite Barnacle was published in Leonardo, and Seeing Like a Dataset was published in Interactions of the ACM. Both are paywalled, but accessible through most libraries.
In 2019, I was beginning to explore AI image making and I wanted to train my own dataset. I was running a workshop on technologies and the preservation of memory after we die — a workshop and event series in San Francisco called Dying Tomorrow. I was curious to think about lost memories, and how they might be reassembled into something that goes on after we die.
At that point, I found an online archive of public domain photographs. Public Domain means they’re more than 75 years old and no longer under copyright protection. These images came from a Romanian photographer named Costica Ascinte. Ascinte was a studio photographer — people would pay him to sit and have their portrait taken and leave with a photograph. Ascinte kept the negatives, but they were eventually found, after years of neglect. They had mold, fire damage, water damage. Nonetheless, the Costica Ascinte society scanned all of these negatives, damage and all, and shared them on Flickr.
I was interested in this relationship between analog photographs and digital technologies: the way that AI might learn from mold and debris. I was inspired by a film called Decaysia: a film assembled out of old, decaying silent films about to be tossed in the trash. They were scanned, mold and all, with the filmmaker, Bill Morrison, then going through hours of footage to find moments where the actors in the screen were interacting with the decay in some way, such as the boxer fighting the patch of mold on the screen. The film is pretty avant-garde, it’s not a popcorn movie, obviously, but it’s just so fascinating to see how this turned into a story about our fight against being forgotten.
So I turned to the Costica Ascinte archive with this in mind. What if we trained a GAN on images full of decay, loss, and erasure? How might the system reassemble these images? So I downloaded the images in the archive, one by one. I knew that these images had to be square, and I knew that they had to follow particular patterns. Remember, GANs are looking for repetition in the dataset, things they can easily reproduce. What I liked about these images is that they all had the same background, the same portrait studio. But if there were two people in the image, I had to put them in one category, or if people were standing, another category. I ended up focusing on people sitting, because there were so many of them.
I then took these images and cropped them into squares, because that’s what StyleGAN needs to make a new series: they have to be 1024 by 1024 pixels. I did this using a Photoshop macro, recording myself resizing one file, and then running it on a whole directory of images. The result was adding black lines to each side of the rectangular image.
Finally, I then took that directory — which had maybe 500 images — and I reversed it. I mean this literally: I had Photoshop open up every image in the dataset, and flipped it horizontally. By reversing the dataset, I was basically able to double the size of it, because reversed images were new images that the GAN could learn from. Faces are mostly symmetrical, so the patterns are more or less the same.
On the other hand, I also kept the images of mold and decay in the dataset. I didn’t want to exclude that from the image collection — arguably, if I wanted to make a clean dataset, I would have. But I didn’t want that. I wanted to see the GAN work through that decay and mold, to see how it would make sense of it. This was an intentional decision — I didn’t know what would happen, but I wanted to find out. Most of the work I do is based on this kind of experimentation. But it was also a conceptual strategy: I wanted to incorporate the context of the original images into anything else that the GAN made with them. I wanted this to be about the breakdown of photographs and memory. I didn’t want to clean the data of its original markings and history, even — perhaps especially! — if that made the images less commercial, or visually appealing. I was happy for them to be weird, if it made sense to the overall project.
The results are not photorealistic, given that my dataset was still quite small, but that’s OK: they don’t need to be. They’re a little cartoonish, which I don’t like, but that leads me to questions about why, and to think more about what makes a realistic image. Lots of training data, obviously; but also, the noise in these images made it harder to find patterns. The rot and mold and fire damage ran interference to perfect patterns.
So that was my experiment with existing images, which I used to make my own dataset. After this, I turned to the idea of using my own digital collages as training data, which was also a bit abstract. After that, I started to explore the idea of taking photographs for a GAN.
Part 2: AI Photography
Being an AI photographer is not as futuristic as it sounds.
GANs were the first real AI image synthesizing tool. By now, Diffusion models rule the day. But it’s still worth knowing about GANs and building your own, because GANs show you the fundamentals of how AI image generation works. In that sense, it’s also like developing film to understand a digital camera. It helps.
GANs are also much more holistic in a sense. Ursula Franklin talks about holistic vs prescriptive technologies:
Holistic technologies are normally associated with the notion of craft. Artisans, be they potters, weavers, metalsmiths, or cooks, control the process of their own work from beginning to finish. Their hands and minds make situational decisions as the work proceeds, be it on the thickness of the pot, or the shape of the knife edge, or the doneness of the roast. These are decisions that only they can make while they are working. And they draw on their own experience, each time applying it to a unique situation. The products of their work are one of a kind. However similar pots may look to the casual observer, each piece is made as if it were unique.
GAN photography is a holistic way of working, because every image you take shapes the final product. As you develop this skill, you get better at understanding how your crafting of a dataset will influence the outcome. The first time you train a GAN — or today, perhaps, a LORA — is like the first time throwing potter’s clay on a wheel. You will not be making beautiful clay pots on your first try. But you’ll understand how the craft works.
You’ll start to learn what the machine needs to work, which is helpful, because so often we think about algorithms as suggestions, as a thing that want something. That’s a prescriptive way to think about technology, from Ursula Franklin:
“Here, the making or doing of something is broken down into clearly identifiable steps. Each step is carried out by a separate worker, or group of workers, who need to be familiar only with the skills of performing that one step. This is what is normally meant by “division of labor”.”
Without customization, diffusion models rely on division of labor. They are a whole range of people contributing tiny pieces of data to the overall project. The result is that it’s very hard to make something that is unique to you, or your craftsmanship. It’s hard to steer a diffusion model toward making something that nobody else could make.
This machine still needs certain things from us, and they don’t tolerate ambiguity. My dog needs things, but she also wants things. An AI has no desire for anything. But it needs things before it can do what we ask it to do. As a GAN photographer, we have to learn to understand what the AI needs, and how to fill those needs in ways that respect our own creative vision. Here are the basics.
First, you will need a camera with enough memory to take at least 500 to 1500 photographs. At first you might think that’s just too many photographs for a human to take, but you’re wrong. It’s actually too few. Nonetheless, when you get out and start taking these photographs, you’ll find it isn’t hard to do.
That’s because you aren’t taking pictures of interesting things, you’re taking pictures of the space around interesting things. You need to know that the AI is going to weigh everything in your photograph as a probability in relationship to everything else. Once you see the world through machine vision, you can quickly take 500 or even 5000 images. You don’t have to compose them, or find a subject. You want to find similar patterns.
It’s easier to understand with some examples.
Gather Photographs
Here we have a series of images I created with a GAN. You can look at these and you can quickly understand what was in the dataset, right? I took 500 photographs of these pussy willow branches with my iphone. I set them up against a white wall, put a light on them, and then moved around them, taking pictures of many different angles. Then I rearranged the stems, and did it again. I spent an afternoon doing this, until I had about 500 images, maybe more.
Here’s what I did not do:
I didn’t compose the images or look for meaningful assemblages. These were shot quickly, in rapid succession, sometimes using shutter burst mode as I rotated around the image.
I didn’t colorize them or do anything cropping. I shot these images in square format, so I didn’t have to edit them. There was no post-production done on the original images.
Then I saved all of those images to a folder. It took me maybe an hour or two hours.
Here’s another example.
Sometime in 2021, with our time on the eastern side of Virginia coming to a close, my partner and my dog made a journey to the Chesapeake Bay. We were in the midst of what would be a nomadic and tumultuous year. We had no idea where we might live or work after this place. For that day, the sky was gray but the air was warm and the Chesapeake Bay was a reprieve from our time spent at a boarding school surrounded by rural farmland.
I had lived near water for most of my life, until leaving San Francisco for the inland space of Canberra, which had Lake Burley Griffin, but I did not see the ocean except to fly over it for 17 hours to and from there. My parents are in Maine, on the seacoast, where there is a walkway that takes you to a place where the river and the Atlantic meet.
So finding myself surrounded by slightly salty debris felt comfortable and familiar. As we walked around the empty shoreline I took photographs — hundreds of them — of the barnacles and logs and sand and kelp that washed ashore.
These photographs of the time and place would become both memories and data. It’s been useful in my thinking about the distinction between the two. The memories are fragments — my dog running ahead, splashing in the still-too-cold water. My partner looking for bald eagles in the trees through binoculars. The magic appearance of two swans.
The data was in my phone: five hundred or so photographs, taken in square formats. The artifacts of memory, maybe, or else the debris. A photograph is a kind of tracing of light into film, in digital images it is the tracing of light into data.
I often take my own photographs of natural patterns found on walks along beaches or forest trails: things like barnacles on wood, or leafy trails in autumn.
These are for building a dataset of that outing, training a model, and then generating an extended, simulated wandering. I take a few hundred of these photographs at a time, because that’s what you need to train a generative adversarial network (GAN). In turn, these GANs will make a study of those natural patterns, assign them coordinates and weights, and then reconstruct these clusters and patterns into new, unseen compositions.
As a result of this practice, my vision as a photographer has shifted. The rules of photographic composition are pointless to an AI eye. Just as the camera used to shape how and what I saw, the AI — and what the AI needs — shapes it, too.
One learns to think like a dataset. When you go out to take pictures, remember that you aren’t photographing things. You’re photographing patterns. And you need to find a relationship between consistency and variety. Too consistent, and you overfit: the model just makes the exact same image over and over again. Too inconsistent, and you get images that don’t create any cohesion.
With too much variation in your data, the patterns won’t make sense. The results will be blurred and abstracted. Too little variation, and you overfit the model: lots of copies of the same thing.
So, as you go out with your camera, you will seek continuities of patterns between each shot, with variations in composition. You want to balance similar proportions of the elements within the frame. You aim to balance the splashes of Apple-red maple leaves, patches of grass, and bursts of purple wildflowers, without introducing anything that isn’t in the other pictures.
You are dealing with bias: you’re deliberately biasing the dataset toward things that fit this checklist. Not too much variety, and not too much repetition.
This is an inversion of the photographer’s instincts, as well as the mushroom forager’s instinct. A photographer, and a mushroom hunter, will typically look for breaks in patterns. If I stumble across a mushroom, the instinct might be to capture it, on film or in a wicker basket. By contrast, the AI photographer looks away. The mushroom disrupts the patterns of the soil: it is an outlier in need of removal. The AI photographer wants the mud, grass, and leaves. We want clusters and patterns. We don’t focus on one image, we focus on the patterns across a sequence of images.
We want to give the system something predictable.
Another example I could talk about is what happens when you go to the other extreme. I mentioned that I used to make digital collages. All of the photos I would cut up came from the Internet Archive or Flickr Commons. I’d go, type in a keyword, and find interesting images. Then I’d download those images into a folder. So over the years of working this way, I had a bunch of folders that were sorted into categories, like flowers, circles, dancers.
One of the interesting things you can do with GANs is combine an image archive. In this case, I combined all of the images I had of flowers and dancers. So this is a way to see how GANs make sense of really different sets of data. You’ll see that it finds similarities between the patterns in the images. So what we see is not necessarily a combination of flowers and dancers, though that’s what it looks like. It’s actually depicting the common patterns that it identified in both images of dancers and images of flowers.
A key thing to remember if you combine datasets is that the model has no idea that you’re combining different pictures. It will never say to itself, “Oh, these are flowers and these are dancers.” Instead, it will look through the entire dataset and look for ways to predict a new image that would fit into that dataset. So you get these in-between kind of images you see above.
This can also make weird results, too, so it’s a risk. For example, for one botched experiment. I went to a toy store and bought plastic animals. Little plastic cats, farm animals, zoo animals. Then I photographed all of them, individually, and asked the GAN to generate new plastic animals.
The problem here is that there simply weren’t enough photographs. There was also an issue of heads and tails appearing in different directions, which means that half the dataset had heads on the left and half had tails on the left, and so the GAN literally couldn’t make heads or tails of the images. So this is another piece of advice: for best results, work with things that are symmetrical or have patterns across the entire image.
We’re building datasets that will create something predictable from a machine. Prediction is a matter of time and scale. If it sees enough images, a model will be able to predict what else might go into that dataset with greater accuracy. Take enough photographs of mushrooms and one can start generating images of mushrooms. The same is true in any kind of data we put together for AI systems: pick enough data points and you can find patterns to support any conclusions. The underlying principle is the same: as AI photographers we will look away from the unique aspects of the world, and look instead to the patterns we usually ignore.
This tells us something about the images and decisions an AI makes. On the one hand, we might view it as flattening the world. On the other hand, it heightens my awareness of the subtleties of the dull. The singular is beautiful: we take pictures of birds when they land on a tree branch, not just tree branches. We take pictures of mushrooms when we see them on a walk, we don’t usually take photographs of the grass. I take photographs of the person I love standing against a bright wall: I rarely take pictures of a blank wall.
Yet, the world behind them, the world we ignore, is exactly what a machine is looking for.
Much of this background world is lost to us through what psychologists call schematic processing, or schemas. We acknowledge that the soil is muddy and covered in leaves, and so we do not need to individually process every fallen leaf. Arrive at something novel in your environment, however, and you pause: what bird is that? Is that a mushroom rising from that log? What kind?
The benefit of schemas is also the problem with schemas. The world gets lost, until we consciously reactivate our attention. (Allen Ginsberg: “if you pay twice as much attention to your rug, you’ve got twice as much rug.”) An AI photographer is looking for the colors we don’t see, the patterns that demand our inattention.
The Model Mind
AI pioneer Marvin Minsky used schemas to organize computational processes. For Minsky, whatever the machine’s sensors picked up, could be placed into the category of ignorable or interruptive:
“When one encounters a new situation (or makes a substantial change to one's view of a problem), one selects from memory a structure called a frame. This is a remembered framework to be adapted to fit reality by changing details as necessary. A frame is a data-structure for representing a stereotyped situation like being in a certain kind of living room or going to a child's birthday party. Attached to each frame are several kinds of information. Some of this information is about how to use the frame. Some is about what one can expect to happen next. Some is about what to do if these expectations are not confirmed.” (Source)
Minsky took a metaphor meant to describe how our brains work, and then codified it into a computational system. The brain, in fact, does not “do” all of this. Rather, schemas are a shorthand devised to represent whatever our brains are actually doing.
Brains are not machines and do not work the way machines work. There is no file system, no data storage, no code to sort and categorize input. Rather, we’ve developed stories for making sense of how brains work, and one of those stories is the metaphor of schemas. The schema metaphor inspired the way we built machines, and somewhere we have lost track of what inspired who.
Today, when we see machines behave in ways that align with stories of how human brains “work,” we think: “oh yes, it’s just like a brain!” But it is not behaving like a brain. It is behaving like one story of a brain, particularly the story of a brain that was most easily adapted into circuitry and computational logic. The machine brain behaves that way because someone explained human brains that way to Marvin Minsky, and Marvin Minsky built computers to match it.
There are many different ways that people think. Some people do think this way. But not everyone does. As you make images with GANs, it’s easy to connect this to the ways humans imagine things, or learn things, or create things. And it’s really important to say: that’s because everyone born after the 1950s has been taught one model of the way brains work, and that model is also the model that computer scientists and neuroscientists tend to use. But there is not one way that brains work. Some of us cannot visualize images in our head. Some of us have a running dialogue about the world around us. Some of us possess senses that others do not.
But as an indirect result of brain-metaphors being applied as instruction manuals for building complex neural networks, GANs behave in ways that align and reflect these human schemas. Schemas are not always accurate, and information that works against existing schemas is often distorted to fit. Remember what happened when only a handful of people in the dataset had friends in their photographs with them? The result was that enough people were present as “buddies” in the training data to show up, but not enough to repeat that pattern consistently. So images where faces were melting or distorted were allowed to pass through the discriminator.
Keep that in mind as you start thinking about taking photographs or using images for your GAN. We may not “see” a mushroom when we expect only to see leaves and weeds. Humans often distort reality to fit into preconceived notions. Likewise, GANs will create distorted images in the absence of enough information: if one mushroom exists within 500 photographs, its traces may appear in generated images, but they will be incomplete, warped to reconcile with whatever data is more abundant.
A careful photographer can learn to play with these biases: the art of picking cherries. The dataset can be skewed, and mushrooms may weave their way in. We may start calculating how many mushrooms we need to ensure they are legible to the algorithms but remain ambiguous. A practiced AI photographer can steer the biases of these models in idiosyncratic ways.
The AI-Human Eye
AI photography, or dataset building, is about series, permutation, and redundancy. It is designed to create predictable outputs from predictable inputs. It exists because digital technology has made digital images an abundant resource. It is simple to take 5,000 images, and ironic that this abundance is a precondition for creating 50,000 more.
As a result, the “value” of AI photography is pretty low.
GAN photography is the practice of going into the world with a camera, collecting 500 to 5,000 images for a dataset, cropping those images, creating variations (reversing, rotating, etc.) and training for a few thousand epochs to create even more extensions of those 500 to 5,000 images.
It all sounds wildly old fashioned compared to typing a prompt into a window, but the technology was cutting edge just three years ago and remains the most customizable version of image generation as of right now.
GAN photography is personal, like journaling or most poetry. The process is likely much more captivating for the artist than the eventual result is to any audience. GAN photography is a strangely contemplative and reflective practice.
The GAN photographer is constantly returning their attention to the details we are designed to drift away from, the sights we have learned not to see. The particular is often beautiful, but it’s not the only form of beauty. In search of one mushroom, we might neglect a hundred thousand maple leaves.
Beyond the Frame
There is something else at work, though. Beneath the images produced by GANs are a convergence of information and calculation, reduction and exclusion, that flattens the world. It’s one thing to produce images that acknowledge the ignorable. It’s another thing to live in a world where these patterns are enforced.
Beyond the photograph frame, this world would be wearisome and unimaginative. It’s a world crafted to reduce difference, to ignore exceptions, outliers, and novelty. It is bleak, if not dangerous: a world of uniformity, a world without diversity, a world of only observable and repeatable patterns.
The AI photographer develops a curious vision, steered by these technologies: a way of seeing that is aligned with the information flows that curate our lives. The GAN photographer learns to see like a dataset, to internalize its rules.
But as we’ve seen, you can also start to play with those rules, blending categories between your datasets, balancing your dataset ever so slightly in one direction or another. This is where you can start exploring creative ways to navigate the dataset. It’s all about how many pictures you have, and what the permutations of patterns looks like across that dataset. For example, if you draw 500 images of flowers by hand and then add 100 images of starfish, you may end up with something slightly starfish like. If you do 500 and 500, you’ll consistently end up with something halfway between starfish and flowers. These aren’t fixed rules, but they give you an idea of how you can steer these things. It’s about playing with ratios and proportions of the types of data you include in your dataset.
In another class we discussed Helena Sarin’s work, for example, who used a dataset of nature imagery and Japanese calligraphy to create GANs that looked like trees made out of haiku. That’s done by balancing and mixing datasets the way you might mix colors on a palette. I definitely encourage you to play with that as you brainstorm and conceptualize a GAN art project. What can you mix? What proportions would be interesting? Just remember that you need to go out and get this data.
Through practice, the rules become instincts or habits. The data, meant to capture what we see, changes how we see.
And I encourage you to be mindful of how this exercise changes the way you see: whether it’s the stuff you are photographing, or the way you see the world of patterns, or how your eye shifts focus. None of this is wrong, but it’s interesting, and this self-awareness of how the process changes you is a unique way of thinking about this practice as a craft.
I want to take a brief sidetrack here to show a brief part of a talk by Anna Ridler, who has been making a variety of GAN art projects, including this one, which cycles through the permutations of the datasets she builds and works with. When you make and train a dataset, you’ll be able to access something called a “latent space walk,” which is basically a short video where many of the permutations of your output are cycled through very quickly. These are interesting to explore as products, and Anna Ridler’s work on Tulips takes it a step further, moving through a sort of synthetic tulip blooming using a piece of software. Basically, as the price of bitcoin rises and falls, the tulips in these images either bloom or wither.
I wanted to play a bit of Anna Ridler talking about GANs and datasets as well.
So, consider all of this and try to put together a proposal for what you’d like to go out and document for your dataset. Will it be images you can find online? Images that you take yourself? What subject matter do you think fits this criteria? Where might you go and find them? And most importantly, what would be interesting about that process for you?