Critical Topics: AI Images, Class Fourteen
Have a Coke and a Smile
In 2023 Coca-Cola entered into an agreement with OpenAI where users could generate designs for Coca-Cola bottles. Designers would use an image of the can and, using a special interface on their website, they could generate AI images through a prompt window. They could then use GPT4 to generate a tagline or text for that image, and incorporate it directly into the image.
This is one of the first forays by a Corporation of Coca Cola’s size into the AI space for marketing and advertising. And there was a backlash to this. A lot of artists saw Coca Cola embracing a tool that was built on datasets with artist’s work that was not explicitly granted for this purpose. We don’t know what OpenAI’s dataset includes, but we know that similar models include the work of many artists who did not give permission or shared work under a creative Commons license, without any idea that they were granting permission to an AI system that might reproduce their work or incorporate their work into a model that could compete with those artists and designers.
So Coca-Cola enters into the space, and it opens up a broader question about the role that advertising will play in these models. These models are marketed as creative tools, like Photoshop or Unity, but rest assured that companies are interested in using these for advertising.
In its press release for the campaign, Coca-Cola write that it is “still in the early days of assessing AI’s potential impact,” with Manolo Arroyo, Global Chief Marketing Officer, cited as saying:
“We’re just scratching the surface of what we believe will help create the industry’s most effective and efficient end-to-end marketing model,” he said. “We will begin to leverage OpenAI’s technology in our marketing function to re-imagine how we produce creative content, increasing the velocity from weeks to days. We see many applications of AI – including content creation and rapid iteration, hyper-personalizing content and messaging for consumers and customers, and driving two-way conversations with consumers.”
Let’s unpack that a bit. Hyper-personalizing content with AI — what could that mean? Well, we can imagine a science fiction version of marketing, where instead of ad text written to be delivered to you based on key words in a social media profile, those key words are fed into something like a generative AI model. A marketing algorithm on a social media website may start paying attention not just to the kinds of ads you respond to, but the visual styles you respond to. That information could be fed back to create new forms of imagery more or less instantaneously. We’re a bit far from images being reliably generated in a way that is safe for brands. But it’s one way of thinking about this idea of hyper-personalization.
Another way to approach it is what’s literally on display here: giving consumers a tool to generate content that uses their brand or logos. This is literally an ad-supported version of Dall-E and GPTs, free to use if you have a Coca Cola bottle in the image or agree to see ads for Coca Cola in your text output. Or perhaps it works like keyword search: one could easily imagine companies selling the right to be worked, casually and unobtrusively, into text output, the way advertisers sometimes pay for products to be in the background of film scenes.
This points to a future where these generative tools are under written by corporations in exchange for brand placements. That’s not science fiction — that’s literally what this Coca Cola website is. So let’s talk about the current relationship we have with advertising. Think about a very simple billboard. For years, Coca-Cola had a billboard that was a picture of a can with the words “Have a Coke and a Smile.”
It's very easy to make fun of this stuff and say, why the hell are they paying $100,000 or more for billboards that say “have a Coke and a smile?” That doesn't make us want to go have a Coke, does it? Well, advertising is $1 billion industry. It shapes so much of our media and, by extension, our culture. It shapes the way that our television is made: we have programs of a specific length, with specific story structures that break the story up at points that keep viewers in their seats for advertisements.
It shapes the structure of content online, too. Every social media website or app is organized and built around identifying your habits and delivering ads. If it didn't work, this wouldn't happen. If advertising didn't have some effect, we would not be organizing every form of media around it.
But advertising's relationship to sales is indirect. It raises awareness, of course. But it also creates a context for the product. When you see a billboard that says Have a Coke with a Smile it does not make you automatically go to the grocery store and buy a Coke so that you can smile. No one thinks this. No one ever crafts an ad based on the idea that that's what's going to happen.
Advertising and AI
So how does advertising work, and how could AI complicate that?
Well there's a lot of theories around how it works. I will talk about one particular model. Which is that ads shape associations — the more we see images of a bottle of Coca-Cola next to the word smile, the more we associate Coca-Cola with smiling with positivity and happy days playing soccer in the sunlight. It's a vibe. You might say it's an atmosphere that is created.
It is an association that is consistently reinforced in our heads. The billboard reinforces the television ad. These ads are not about buying a Coke and smiling. It's not a literal connection. It's a connection between a feeling and a product. It's about creating an imaginary relationship to the world depicted in these advertisements.
When we watch a television show, what are we doing? When we play a video game on a screen, what are we doing? How do we understand and respond to the images on that screen?
We do it through our imagination. We do it by saying it is as if I was there when we look at a video game. It is as if we are Mario in the Mushroom Kingdom. It is as if we are a character inside the story of Better Call Saul or Adventure Time. These are stories that activate our imagination, creating a space for us to immerse ourselves in.
Advertising creates that imagination-space, and then it puts its products in there. This is product placement. An advertisement is building an imaginary world, and then putting its products in there, so that when you think about its product, you think about that imaginary world and its atmosphere.
It is not literal, right? We are not robots and no one thinks that we are robots. No one thinks that putting a Coca-Cola on a beach in an ad makes you think that if you drink a Coca-Cola you will appear on a beach. Instead, it says here is a world. Imagine yourself at the beach. Imagine yourself walking at the shore. Imagine yourself relaxed, on vacation, free of cares and troubles. Imagine yourself in the hot sun, reaching for an ice, cold Coca-Cola. Imagine popping the Coca-Cola can open with that crack of the pop-top, and that anticipation of a cold, refreshing beverage in the hot sun surrounded by beautiful people having fun.
It's that imagining of a story that lures you in and it gives you certain associations with relaxation, pleasure, and relief. Of course, we aren’t purely passive. We can resist those narratives and often times we do. We see satires of ads almost as soon as we see ads. We also filter ads through personal experiences. If you hate beaches, then this ad doesn’t do much for you. We are the ones looking at and making sense of them.
This is what the media studies scholar Stuart Hall refers to as “encoding and decoding” the messages of popular culture. This goes beyond advertising and into any kind of media consumption. Information is created, and then organized, and then interpreted. And along the way we can negotiate meanings with these signals that we’re getting. We can say “oh, that’s a dumb ad,” or “that’s a dumb ad but I would love to hit the beach,” or “that’s a great ad, and I hate beaches.” There are all kinds of ways to respond to advertisements. They are different for everyone, and they’re negotiated through our personal experiences.
With a billboard or television, we don’t get to change what’s on that billboard in ways that reflect that experience. But with digital media spaces, we move into personalization. Digital marketing data means that we have algorithms that weigh your response to certain ads and then emphasize similar ads the next time you are on site. It breaks your behavior and online interactions down into categories, and then it sells those categories to advertisers as a way of reaching you.
Interactive media means that your imagination can negotiate meaning from an ad in all sorts of ways, but by clicking or not clicking, you are constantly giving information to the system about how you are negotiating it. And now that media is more dynamic: the ads can negotiate back.
Today, this is built into AI. We don’t call it liking, we call it Reinforcement Learning from Human Feedback. If you like a picture created by the AI, you can download it or click on a heart or save it. This information is feedback, and companies can learn from this feedback. It means the model did something well, and it knows what you have responded to, so it can prioritize that kind of result in future requests.
The data associated with this feedback is immensely valuable at the start of any new tech enterprise. Lyft and Uber were largely subsidized by collecting data about all of the rides that drivers were taking. This data was useful to sell — for training autonomous vehicles, improving GPS maps and suggestions, and more. AI is no different. It is a very expensive enterprise, and many companies are going into debt, while the services they sell are artificially under cost. But the theory of these companies is that by collecting as much feedback as possible, they can shape their models faster, and then, inevitably, hike the price to a sustainable level.
What are we responding to? In many ways, AI is like a video game. We suspend disbelief, enter into a magic circle where we can lose our real-world self for a bit and engage and interact with this imaginary story. Something happens on screen, and we respond to the action on the screen as if we were a Pac-Man trapped in a maze with ghosts. But the meaning we make is something that happens in our heads, through the suggestion of these screens and images. The meanings of things are, one way or another, in “us” — not in the things.
This is how advertising works. It creates a space, you look at the space, you see the object in the space, and in your imagination you have that association, and when you think I want to relax those associations and connections fire up in your brain. Part of that wonderful relaxing moment is to reach for a Coke. That's gonna really complete the sensation of relaxation. It's part of the vision of a relaxing day at the beach, because it is painted that picture, it's made you imagine it, it's made you think about that scene and imagine yourself in it, and that imagination has become part of your inner world.
Your inner model of what a relaxing day at the beach looks like is not this direct. This is almost accidental, it's sneaky in a way — but that's the theory behind why we are pouring billions of dollars into ads and why our entire media infrastructure is built around creating scenes where your imagination can go, and then putting a product there for you to find. When you reach for that product you, reach for that scene, you reach for that imaginary.
Those worlds of advertising are constructed there, constructed for a specific purpose, and that is to put that bottle there and so it's interesting to think about this relationship that will occur when you are not passively watching ads — when you are not simply sitting on your couch, watching television or watching a live stream on your laptop.
The AI Imaginary
Are AI generation tools different? Don’t we have a sense that we are shaping their content? Don't we feel like it's interactive, the way a video game might be? You're pushing a button and something is happening, and you're seeing the thing on the screen change, now you have some images to look at and play around with, and now you're pushing other buttons to make other things happen.
Maybe this is how image generation feels for you. Maybe GPT feels like a conversation because you're imagining something talking to you and so on the other side of that screen, you're engaging with something that is generating text. It's telling you stories, it's giving you information, and you're imagining something on the other side of that screen providing that to you, and that's a powerful illusion. That feeling, in and of itself, is marketing for the AI companies. These are advertisements for themselves. When you are imagining yourself creating in a space with an algorithmic output, you're part of a design that has been created to help you sustain this imagination, the same way a video game might sustain your imagination within its world to tell a story.
Now imagine that everything you create has a bottle of Coca-Cola — that's a deliberate manipulation of this imaginary relationship. There is product placement in video games, too. People pay for you to see things in your video game world. In 2014 Mercedes paid to have its cars as a DLC in Super Mario Kart. That's an interactive world where your imagination is even more engrossed, because you are a participant in the game. You’re mentally negotiating this environmental space in your imagination but also with your body. You're in a world where you are a character, you're acting as if you're in that world and your decisions actually change that world.
Your imagination is really active. This is not a problem. Imagination is not a problem, but when we are lost in this imagination that has been co-created with a machine, that’s been built by companies and they're putting things in there for you to buy, or encouraging you to pay for their services, it's really important to be aware of that.
It's not a problem if you want to go buy a Coca-Cola on the beach — I will drink a root beer because that's what I like. That's fine! But what are the associations that are possible when we look at a generative space like an AI image generating tool? This is an important question. Another important question is what are the things that are being given to us, what are the ways of shaping our imagination that exists in these tools even when advertising isn't there?
Right now when we look at these images, we are still in a creative space. We lose ourselves in the activity of making these images, or imagining ourselves in the pictures it makes. We imagine ourselves in the process of making. We're getting images that are not fully in our control, that are shaped by bigger, complex systems.
These are systems of machine learning — taking data and processing that data, assigning weights and biases. And this is cultural data. This is social data. As we've seen the pictures of people and the way that people have been described are in that data, so that is being swept up into this imaginary too.
And there are scenes and depictions of things that are called normal — when we say what does the doctor look like, and we get four white haired men in white lab coats, what does that reinforce or suggest about the world? We've mentioned this before — a lot of these systems now are intervening and they are actually changing your prompt on the back end to add random words that describe people’s gender, nationalities and races that will diversify your output. We talked about this before how without those interventions the biases come through, from the data — we've seen this in some of the examples when we look at the relationships between the data sets and the images they make.
So yes, that bias comes into those pictures that we’re making and they can influence the way we imagine the world or what we assume is normal and expected about the world. Some of these stories can be subtle, but some of them can be kind of complicated.
A Smile
In a brilliant article about smiles and culture, the writer, Jenka Gurfinkel focuses on a viral Reddit post with a slideshow of 18 images generated by Midjourney using prompts. The prompts described various cultural and historical groups as if they were taking selfies. The result is a series of pictures of people, smiling, imagined by AI from various cultures and points in history.
The author of the post - who is not the creator of the images, just to be clear - notes that many of these smiles are wholly inappropriate to the cultures and times they seem to represent.
“How we smile, when we smile, why we smile, and what it means is deeply culturally contextual.” — Jenka Gurfinkel
She looks to the difference between the meaning and expression of a smile in Russian culture vs American culture, citing another essay by Camille Baker:
Americans expect public figures to smile at them as a means of emphasizing social order and calm. Russians, on the other hand, find it appropriate for public officials to maintain a solemn expression in public, as their behavior is expected to mirror the serious nature of their work. A toothy “dominance smile” from an important American public figure inspires feelings of confidence and promise in Americans. Russians expect, instead, a stern look from their leaders meant to demonstrate “serious intentions, validity, and reliability.”
Now we can look at this and identify a kind of North American bias in smiles, as opposed to a Russian cultural bias, and this is going to make sense - the training data for these systems, like much of the world’s online data, is generated by Western Europe and North America. So the data will probably be biased toward the cultural practices of those places.
But the author makes another really interesting point, which is that the smiles in the training data are not people’s actual smiles. They are photographs of people smiling: which suggests that these smiles are forced, or at least, somewhat inauthentic.
The inauthentic smile may seem natural to us if we are American. It’s a cultural practice, and it’s tied to American expectations of behavior: smiling doesn’t just signal happiness, it signals that you are not upset. If we aren’t smiling, it can be sort of suspect in the US. The smile is the expected default. And that’s tied to specific cultural changes from a particular time and place. Gurfinkel links it to rationalism and the incentive for individuals to make their own happiness.
What a smile means, why we do it and when, are highly variable. So are the words we use to describe them, which means that certain expectations of a user are embedded into generative systems when we ask for prompts. Gurfinkel cites another passage, this one from Lisa Barrett:
Different languages describe diverse human experience in different ways — emotions and other mental events, colors, body parts, direction, time, spatial relations, and causality. The diversity from language to language is astonishing…. Not all cultures understand emotions as mental states. The Ifaluk of Micronesia consider emotions transactions between people. To them, anger is not a feeling of rage, a scowl, a pounding fist, or a loud yelling voice, all within the skin of one person, but a situation in which two people are engaged in a script — a dance, if you will — around a common goal. In the Ifaluk view, anger does not “live” inside either participant.
— Barrett, Lisa Feldman. How Emotions Are Made: The Secret Life of the Brain. Houghton Mifflin, 2017.
Smiles seems like a small thing, but this is one specific example of a North American bias that is part of a dataset and a set of images produced from that dataset. And together, that process creates a series of images that stereotype — which, as we have discussed many times in this class, is always happening when we generate images of humans based on these categories. But it also normalizes and enforces this cultural artifact of a smile onto other people. And this is especially egregious when you consider the history of genocide against Native Americans. Creating a highly stereotyped composite of these images from the internet, compressing them into an image, and then forcing them to smile, is a wholly inappropriate exercise.
These pictures of people from around the world and from different time periods are appearing in these pictures, not as those people actually were — that’s obvious. But they’re being created according to a specific, North American understanding and expectation of culture. It gets into our head and reinforces the idea that the way North Americans see the world is the way that the world is. It’s not just these images, it’s what these images suggest about the entire landscape of images that these tools create. They reinforce one point of view among many, and they tell us that this is the default, the expectation. That means that the images that these systems produce are most likely going to universalize American culture instead of diversifying it.
One of the most powerful things we can learn in this world of AI is media literacy. How can we understand what an image is telling us — where that information is coming from, and who is shaping that information? Is it a company? Is it a set of cultural expectations and bases? Is it the norm of how your family did things? Is it the norms of your city, your country, a particular political ideology? Is it reflective of an attempt to sell you something? Is it reinforcing myths about the world, and about technology?
For example, is it reinforcing that technology can be creative in and of itself? Is it telling you that this artificial intelligence system has some sort of sentience or personality?
For example, ChatGPT refers to itself as an “I”.
Tell me about yourself?
As an AI language model, I don't have a physical body or personal experiences, but I'm designed to understand and respond to human language in a natural and engaging way. I have been trained on a large corpus of text from the internet, books, and other sources, which allows me to understand and generate text on a wide range of topics.
My primary function is to assist users in generating human-like responses to their queries or inputs. I can be used for a variety of tasks, such as answering questions, generating text, completing sentences, translating languages, and more.
I am constantly learning and improving through interactions with users and the data I'm trained on. My aim is to provide accurate, informative, and helpful responses to your inquiries.
Why do you call yourself an "I" if you don't have a body or personal experiences?
As an AI language model, I use the pronoun "I" to refer to myself, because it is a convention used in human language to denote the speaker or writer. While I don't have personal experiences or consciousness, I am designed to simulate natural human conversation and engage with users as if I were a person. Using "I" helps me communicate more effectively and makes it easier for users to interact with me in a more natural way.
This “I” is doing two things: first, it’s offering users a reference point to make sense of how to navigate the system. That’s pretty common with new technologies: you need to move the user from uncertainty about an interface toward understanding the interface. And you do that by referencing commonly understood ideas about the world. In this case, OpenAI is referencing personhood. At the same time, this choice of referring to itself as an “I” is reinforcing a frequently misunderstood idea of what artificial intelligence is. It suggests that ChatGPT has a sense of self, which is not true.
But personification makes it very easy to understand how to use it. Interfaces obscure realities about how systems work in order to make them easier to use. But at the same time, it’s tapping into our existing understanding of things and how they work, and kind of perpetuating these simple models of the world in ways that can sometimes be more confusing in the broader picture.
In this example, it’s easier for us to use ChatGPT in a chat window, and to have a system refer to itself and for us to say things like “It’s thinking.” These are shortcuts that help us learn how to use the system. But the system is more complex beneath the hood, and over time, if we accept these simplifications as reality, we lose track of what’s actually going on. And so today, you have people who think these machines really are sentient, human-created creatures, or you have people believing that self-aware machines are going to happen in a year or two, when realistically we are nowhere near it.
This is a design choice, and my key point is not whether it’s a good or a bad decision. My point is that that is a decision. It’s a decision that someone made to shape your imagination, in order to relate to this screen interaction in a very specific way, which is that you will talk to it as if it's a person. That's what they want — a personal assistant right? And this isn’t some secret, this is all right here in this window, in this conversation I just had with ChatGPT.
Media literacy is about understanding the decisions that go into these things that you watch, the interfaces you use, and the games you play — any designed thing that you interact with.
For example, we talked before that the buttons for Midjourney and Stable Diffusion are not labeled “generate pixel data into an assemblage resembling a prompt,” but “Imagine” or “Dream.”
When you label the button that way, or tell people to enter a command to “imagine,” you have something else going on there. This is designed to make it accessible to new users. It's designed to give you a sense of orientation between a system you don't understand and something that you do understand. They also, intentionally or not, shape the relationship that you have with what the system is doing. It gives you a guiding framework or metaphor to make sense of it.
It reinforces this idea that the data being processed behind the screen has some kind of resemblance to a human mind, and this is simply not true. It's a simplification, it's a design, it's a decision.
Those words are suggesting something about the system that really doesn't describe anything about what it is doing or how it works. Even this phrase, artificial intelligence, is loaded up with shorthands that uncomplicate complicated systems at the expense of understanding how they actually work. They create a lack of nuance, and if they offer us ways of quickly understanding how to use the systems, it also creates frames that we internalize.
In a review of art made with these tools that ran in the Brooklyn Rail, Charlotte Kent writes:
Emily Tucker, the Executive Director at the Center on Privacy & Technology at Georgetown Law, announced in Spring 2022 that the center would no longer use the terms AI, artificial intelligence, or machine learning because those terms “obfuscate, alienate, and glamorize.” She urges users to identify the operating system of their practice, which is why artists often distinguish what softwares they use, and why. Their specificity has a politics. The term Artificial Intelligence was coined at MIT but, according to Herbert A. Simon in The Sciences of the Artificial (1969), researchers at Carnegie Mellon preferred “complex information processing.” How much clearer that term is. How much less frightening is the notion of complex information processors. Tucker continues:
That we are ignorant about, and deferential to, the technologies that increasingly comprise our whole social and political interface is not an accident.
Whenever we are using someone else's words to explain what's happening on a screen, or in a computer program, we should ask questions about where those words came from and why we choose to use them. Does the language we are asked to use shape our thinking about whether or not the systems think for themselves, whether or not the systems are objective? We know they don’t, we know they aren’t.
These frames are a kind of shorthand that sets up a metaphor — and we can easily start thinking using those metaphors. When we do, things that exist in reality — but don’t fit the metaphor — can become harder to visualize or integrate into things. But these metaphors are designed to simplify. The imaginary stories we tell in ads, in video games and in interfaces do not tell you anything about the system. Why would they? We want immersive experiences and stories, we don’t want to look at wire frames and math. The artist duo, JODI, actually did this — stripped all the imaginaries out of DOOM — and you can see what these codes look stripped bare. It’s why we have imaginaries in the first place.
This is not an argument against simplicity, or for making things easier to understand. There are all kinds of reasons to design things with such shorthands. But we don’t have to agree to them. We can negotiate with them. If AI is designed with a particular mythology, at the expense of other metaphors and stories, we ought to push back in favor of the ones that resonate with us. The stories that helps us understand it.
So it’s helpful to think about that: what metaphors are in play here, and why were they chosen? In the future, advertisements may call on us to make them ourselves in the form of generating images and videos or code or games where these brands and icons and ideologies are embedded into them. It might be obvious and it might be subtle, but either way, it’s worth looking for so we can make our own decisions about the relationships we want to have with it.
Looking for Something Else to Do?
Read: Aarathi Krishnan, Angie Abdilla, A Jung Moon, Carlos Affonso Souza, Chelle Adamson, Eileen M. Lach, Farah Ghazal, Jessica Fjeld, Jennyfer Taylor, John C. Havens, Malavika Jayaram, Monique Morrow, Nagla Rizk, Paola Ricaurte Quijano, R. Buse Çetin, Raja Chatila, Ravit Dotan, Sabelo Mhlambi, Sara Jordan & Sarita Rosenstock (2022) AI Decolonial Manyfesto. (Link)
Read: Anja Kaspersen & Wendell Wallach (2023): Now is the Moment for a Systemic Reset of AI and Technology Governance. Carnegie Council. (Link)
Read: Dan McQuillan (2023) We Come to Bury ChatGPT, Not to Praise It. (Website).
Watch: Ali Alkhatib, To Live in Their Utopia: Why Algorithmic Systems Create Absurd Outcomes. (5 minutes)
Works Referenced
“Coca-Cola Invites Digital Artists to ‘Create Real Magic’ Using New AI Platform – News & Articles.” The Coca-Cola Company, The Coca-Cola Organization, 20 Mar. 2023, https://www.coca-colacompany.com/news/coca-cola-invites-digital-artists-to-create-real-magic-using-new-ai-platform.
Shaw, Adrienne. “Encoding and Decoding Affordances: Stuart Hall and Interactive Media Technologies.” Media, Culture, and Society, vol. 39, no. 4, 2017, pp. 592–602, doi:10.1177/0163443717692741.
Gurfinkle, Jenka. “AI and the American Smile.” Medium, 26 Mar. 2023, https://medium.com/@socialcreature/ai-and-the-american-smile-76d23a0fbfaf.
Baker, Camille. “What a Russian Smile Means.” Nautilus, June 2018, https://nautil.us/what-a-russian-smile-means-237120/.
Barrett, Lisa Feldman. How Emotions Are Made: The Secret Life of the Brain. Houghton Mifflin, 2017.
Charlotte Kent: “Art’s Intelligence: AI and Human Systems.” The Brooklyn Rail, 1 Apr. 2023, https://brooklynrail.org/2023/04/art-technology/Arts-Intelligence-AI-and-Human-Systems.
Tucker, Emily. “Artifice and Intelligence.” Tech Policy Press, 17 Mar. 2022, https://techpolicy.press/artifice-and-intelligence/.
Williamson, Judith. Decoding Advertisements. Marion Boyars, 2014.