Critical Topics: AI Images, Class THREE:

From Expert Systems to The World Wide Web

Lecture by Eryk Salvaggio

In our previous session together we talked about the early history of AI, up to about the 1980s. We’ll pick up a bit of that history in this talk, and help think through the factors that got us to where we are today. An essential part of that story is the World Wide Web, and the massive amounts of data that soon became available for building AI systems. But before that, we’ll look at AI in the 1980s. 

AI in the 1980s was less exciting to the art world. Development of AI was situated in industry, and dedicated primarily to a task called “expert systems.” At the time, artificial intelligence work was all about taking the things people did for work, and breaking them down into a series of steps that a computer could follow. 

Beverly and William Thompson, from Byte Magazine Volume 10 (1985). Source.

Above is one example, from Beverly and William Thompson in 1985, of a “cardboard inference engine,” a series of notecards that you might create from a conversation with a botanist. Once you had these notecards, you could code this if/then program into a language called PASCAL.  

It was in the 1990s that an anthropologist, named Diana Forsythe, went into these computer companies and universities to study the people who were doing this work, then considered artificial intelligence systems. They were called “knowledge engineers.” Her paper on this was called “Engineering Knowledge: The Construction of Knowledge in Artificial Intelligence,” and it’s still relevant to understanding the culture of AI today, but also how culture works differently from the way technologists often understand it.

Forsythe went to these labs and observed people. She asked questions, and reported on what she saw. At first, she was struck by the comparison to what these knowledge engineers did and what anthropologists such as herself did. Anthropologists go into places they aren’t familiar with, and observe, take notes, and write about what they see.

So did these knowledge engineers. They’d go into offices or factories, look at the decisions workers were making, and take notes on what they saw. Anthropologists write books, knowledge engineers write software. The difference, Forsythe wrote, was that the anthropologist is chiefly concerned with culture — while the engineers were chiefly concerned with actions

“The values and assumptions shared within a group constitute part of what anthropologists call “culture.” Culture defines what we take for granted, including explicit, formal truths of the sort embodied in scientific paradigms; the tacit values and assumptions that underlie formal theory, and the common-sense truths that ‘everybody knows’ within a given setting (or type of setting). Our cultural background influences the way in which we make sense of particular situations, as well as the actions perceived as possible and meaningful in any given situation.” (Forsythe, 1993, 448). 

Forsythe observed the ways this culture emerged through how these engineers approached their work. If they were trying to write a computer program based on watching what professionals did, they would observe these professionals first. 

These AI researchers had the idea that they could model human expertise by studying it and breaking it down into steps, or procedures. Once you had these steps, you could write a program that would follow those steps as a set of coded instructions. Then, the program could apply those steps and you’d automate the worker’s actions through a series of commands. This is a way of thinking about problems that is specifically focused on translating the actions of a human into something useful to one piece of technology — the computer. But to do translate those actions, people had to explain what they were doing. They had to be able to describe their thinking about what to do in ways that could be turned into steps. But it turns out that such intuitive practices — cultural practices — were challenging to explain in that way. 

Now, there are lots of ways to define culture, but for our purposes today, we’re going to think of culture as systems of knowledge shared by a relatively large group of people (source). Such knowledge isn’t always circulated in detailed, step by step instructions. A famous example is that few people are ever told to turn around and face the door when they get into an elevator. You just kind of pick up on this by watching other people. But walking into an elevator and staring straight ahead without turning around is, in some ways, a perfectly sensible thing to do.

So it’s telling that these knowledge engineers rarely asked why the rules they observed were followed, and weren’t looking at how they might be done differently. They simply wanted to move something they saw into a language that machines could understand and follow. In a sense, they were aiming to remove culture completely from what they saw, because machines don’t understand context and culture. They understand steps and instructions. So these knowledge engineers — and the culture of artificial intelligence they created — didn’t ask why people turned around to look at the front door of the elevator. They only wrote down that people did.

So AI development was kind of frustrated by cultural aspects of knowledge and learning, because culture couldn’t really be broken down into the procedures that a computer needs to understand instructions. And that, Forsythe wrote in her paper, was its own kind of culture. We might imagine that in this culture, creativity was a frustration, because if an expert tackled a problem in a different way every day, well, that posed a real challenge in moving it into a series of steps and commands. So for this reason, and technical constraints, AI in the 1980s is focused on work that tends to be repetitive, or follow sequential steps to reach a goal. This is AI as a series of if/then statements: if this condition is met, do this. If not, move to the next question. The idea of asking an AI to make something really new, like art or painting, is just too far off the radar - it’s challenging to describe that process of creativity as a series of steps

Here’s an expert system flowchart for a medical diagnosis. Would you be able to follow these to become a doctor? Would you trust a machine that followed these steps to recommend a medication?

Flowchart of a medical diagnosis, 1985 (source)

Part Two: The World Wide Web to Deep Dream

Soon, though, this world of engineering and culture would fuse together in the development of the world wide web. 

In the video we just saw, you hear an often repeated description of the world wide web as “vast amounts of information at our fingertips.” When the Web emerged, there were a lot of utopian dreams about what was going to be possible and how to prepare the world for that transformation. Think about all the things that happened to make the Web possible. Computers were more common, and affordable, and fit on a desk, instead of a room. You could connect these small machines to other small machines and create new, extremely powerful networks of machines. All these connections allowed people to communicate and make decisions in real-time in a way that would have been very hard to do before. 

And in 1996, for the first time in human history, storing information on a computer became cheaper than storing it on paper. And it keeps going down. So now more people are digitizing more information.They’re looking for ways to store it, and to make it accessible.

In Silicon Valley, there was also a movement afoot that saw the world wide web as reshuffling existing power structures. Distribute real-time decision making, they believed, and people would be free to coordinate themselves. It was going to change politics and media.

We’re going to watch a clip that explains some of this, particularly, the question of social coordination and emergence. It introduces an early experiment at a conference where attendees were given image sensor devices attached to paddles. Watch what happens. 

So I want to point out something here, which is that everyone is acting of their own free will, and it’s coming together in this collective play of the video game. People described the World Wide Web as an emergent, global super-organism. Suddenly, this idea about neural nets becomes relevant again. Not because we are building neural nets into computers, but because the world wide web is seen as a way to harness this collective intelligence, where each person in the system is making decisions and reacting to information, often in real time. In other words, rather than building a neural network inside a machine, the Web was understood as using human beings to process information, for the sake of delivering more information to more people, who would process and respond to all of that information as a coordinated super-consciousness.

Instead of building a device that mimics the way humans think, now humans are the intelligence of the machine, this global machine called the Internet. Soon you saw a rise in companies trying to get people to create content: to tell stories about themselves, to post photos, to share news and observations.

But all this information flow needed to be structured. The Web wasn’t being built in University labs anymore, which is where AI efforts were mostly located. Instead, the Web was in the world, and to develop new ideas, you needed to find a way to make a profit off of anything you built.

There’s a lot of power and responsibility there. So in one sense, power starts moving from things like traditional media outlets — remember that there were just a handful of TV channels in people’s homes. Instead, people could make and share content, and companies would build infrastructure. And they could make money off of that infrastructure! There was a fight over who controlled the Web: it’s hard to believe now, but companies engaged in all kinds of illegal behaviors in order to have the most used web browser.

Over a few years, power starts getting concentrated into the hands of a few big companies: Google, being one of the first big ones, Microsoft too. By 2005 you have Facebook on the scene. And a lot of these companies are driven by this idea that they can give users space to create content, and in exchange, they will collect data and use that data for advertising and marketing purposes. We start to hear this phrase, which had been around in the 1980s: Big Data. 

These companies saw massive amounts of information moving through their networks. They could use that data for marketing, but also to power up their networks: to make decisions about what to show to who, but also to understand business side stuff, like where and when to increase server space, how to streamline their site infrastructure, and find out what stuff people were interacting with and using. 

By 2010, Eric Schmidt, CEO of Google claimed that humanity was producing as much information every two days as the human species had created up to 2003. Thirty years of video content are uploaded to Youtube every day — a baby born right now could live to be 90 years old watching nothing at all aside from YouTube content uploaded this weekend. 

So that is a lot of data, and there is a lot of useful information in that data, but no human worker is going to spend 90 years of their life categorizing it all. So increasingly, we have turned to machines to do that for us. But we’ve also, consistently, been relying on people. The more data we have, the easier it is for us to build machines to learn from that data, find patterns in that data, and tell us something about that data. Today, that’s increasingly come to drive the types of things we define as artificial intelligence

Take a look at how these companies define artificial intelligence today. 

  • IBM: “artificial intelligence is a field, which combines computer science and robust datasets, to enable problem-solving.”

  • Google: “Artificial intelligence is a field of science concerned with building computers and machines that can reason, learn, and act in such a way that would normally require human intelligence or that involves data whose scale exceeds what humans can analyze.”

  • Meta: “AI is an evolving field of computer science that's already part of daily life for billions of people around the world. It can take single tasks that are tedious and time-consuming for humans and then scale them up to benefit more people in more places.”

What do we make of this? For me, each of these companies is defining artificial intelligence with regard to their own business strengths, all centered around the collection of data, analysis, and automation. 

As companies gathered more and more data, sorting that learning from this grab bag of information became the predominant challenge. The movement for AI starts to shift from business automation to automated data analysis. Think about it: you have enough data coming through your servers that it would take a human being 30 years to look at it, much longer to find patterns or make sense of it. So engineers use the data they have, and they start looking at small samples of it to find patterns. They can then encode these patterns into sets of rules to follow to label and categorize the information that comes in. 

That initial step is done by humans. There is a whole industry of what is called clickworkers, or sometimes ghost work, where people are paid pennies for small tasks, such as looking at an image and identifying it as a puppy or not. This stuff, because it’s so low-paying, tends to be shipped off to developing countries. This stuff is also done by you and me. When you upload a picture of a puppy to Instagram, or Facebook, you might tag it as a dog, and then the image becomes part of the data that Meta can use to identify puppies. If you have ever solved a captcha, where it asks you to identify, say, three pictures of bicycles in a grid to confirm you are human? That is giving verification information to a company about what’s in that picture. 

Here’s a short interview with Mary Gray, who wrote the book, Ghost Work. We will talk more about Ghost Work and invisible labor in AI in later classes.

So people are generating this data and the companies are having people label and classify that data. Once you have data sorted into categories, you have a sense of the kind of information that is in, say, a picture of a puppy. This isn’t just for images, of course: you can also use this to figure things out like, is this person going to buy a toothbrush? Is this person going to respond to a specific kind of advertisement? 

Once you have enough data, and enough categories labeled with some degree of confidence, machine learning comes into play. We’ll go deeper into how this works in a later class, but for now, we can think of machine learning as a tool that automates data analysis. It looks at 10,000 pictures labeled “puppy” or “cat,” identifies which patterns tend to be present in the pixels of that image, and then it says, ok, that’s a puppy. This is where generative AI comes in: if you can identify a puppy, based on the pixels present in an image, well, it’s a short step to writing those pixels yourself, and making a new picture of a puppy.

Much of this work was originally done for the sake of image recognition. Teaching an AI system to see the world was helpful for all kinds of things: a car that drives on its own, for example, would need to recognize when something came into the road so it could stop. A vacuum cleaner would need to learn not to bump into things. Scientists had massive datasets of images to help identify species of plants and animals. You also have lots of images of handwriting, so you can learn how to move information from a pen into a digital text. 

And that’s where the first real breakthrough with generating images happens, in 2006. It’s not much to look at, but it’s an early example of a new idea. It’s the idea that if you have a system that has enough data to recognize shapes and predict what they mean, then you also have enough data to generate shapes. Yann Lecunn, an AI researcher, created image recognition software for handwriting in the 1990s. But here you see some numbers being generated, by a machine learning model that has identified patterns across lots and lots of samples of people writing down numbers.

From Hinton, G. E., Osindero, S. and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation 18, pp 1527-1554. (Source)

There’s some interest in this technology because it means that you can generate new data that already fits into labels. Instead of asking people to look at footage where there’s a rock in the road and label it for a car, you could theoretically generate that image, knowing that there would be a rock in the road. You could then create an infinite landscape for a car to travel through and respond to, accelerating the speed of training that car’s system to things like Stop signs or boulders: you kind of plug the car’s computer system into a virtual reality and run it exponentially faster than a real car could ever move through the real world.

Google Image Classification Demo, 2013

Let’s jump ahead to 2013. Google has a massive image database, and it needed to label and identify those images for search. In that year, they finally arrive at pretty reliable image recognition, giving users the ability to upload an image and have a computer tag it, instead of a person. Here you see the example that Google used to talk about this breakthrough. It shows us that the system has identified a puppy, and identified a hat. Basically, the machine has seen enough images of hats and puppies by looking at massive datasets of hats and puppies. Every hat and puppy is read as pixel information. When certain arrangements of pixels are present, there’s a match — the computer can declare it is a hat. 

The classifier has to look for patterns in the image based on what it’s already seen. But it wasn’t always clear how the system was identifying these things, which could lead to problems. For example, if you have a bunch of pictures of arctic wolves, and a bunch of pictures of dogs, how do you know if the system is recognizing the dog, or the snow? Or, as a personal anecdote, I was in front of an image classifier in Australia in 2020, wearing a mask, and it identified me as a Kookaburra, because it interpreted the mask as a large beak.

In 2015, Google Engineers were looking for a way to see what the machine was seeing, so they could have better insight into how their pattern recognition models matched pixels to words. Finally, they figured out something weird: they could design the system to infuse the patterns it detected in an image, and present it as a new image. This ends up being very surreal.

They call it DeepDream. You load an image of noise into the system, and then you ask it to detect patterns in that noise based on what it has encountered in the training data. For example, below, it’s an image of noise paired with a training dataset of banana photographs.

Here you have an image of noise. And to the right, you have the images that a model, trained to identify fruit sees as potentially a banana. Think of “trained with priors” as, “given a particular dataset to look for” - “priors,” in data science, basically just means your starting assumptions. So what’s happened is now we can see how the image classifier is making sense of these pixels. These are predictions, based on clusters of pixels. But as we are both aware, the image on the right doesn’t have any bananas in it. So what is, initially, designed as a way of finding and identifying things becomes a way to produce images of those things.

They see some of the strange associations in the model, too. To the right is an image set trained on images of dumbbells. But it turns out that the image set has clumped together arms holding dumbbells with the dumbbells themselves. 

Almost as a goof, the engineers decide: what if we give the system something that isn’t noise? And they see that the system will look at that image through whatever training data they choose, and show them a kind of hybrid between the two states. So on the left you have a picture of a waterfall, and on the right you have a picture of what the model thinks it sees in that picture. DeepDream creates a lot of pictures of things that it has already seen and labeled, based on the training data that Google had collected up to that point. 

Deep Dream is an image recognition system, but in a sense, it’s one that has been interrupted. Image recognition scans for patterns in an image based on how they overlap with the arrangements of pixels in the training data. Usually, it would reveal these images once they reach a high degree of certainty. But with these images, we aren’t asking for certainty. We’re essentially interrupting the system in the midst of that scan, and asking to see what patterns it is finding in various states. So if you jostle the image even a bit — Zoom it in, for example — you create a cascade of puppy-slugs.

Here’s a video from artist Memo Akten from 2015. You can see exactly how it works.

The engineers also start taking the images that DeepDream produced and then feed it back into the same model, and this sharpens and strengthens the images. It starts creating these really intense fractals, but if you look really closely, you see something a little... disturbing, which is that it is all lizard skin, animal eyes and snouts. And this is basically what DeepDream does. Some artists get ambitious and give us short films rendered through DeepDream, by taking every frame of a video and running it through the tool.

So we’re going to watch this video because it is an early example of this new form of AI generated art, but we can also see how the image recognition system is making sense of what it sees. 

Based on what we’ve talked about, maybe the above video makes sense. It starts with noise, and then slowly you start to see something abstract — a recognition of basic shapes and patterns, like fabrics. It stays abstract for a while... and finally moves into a space that is a vague kind of stone tablet era... and then it gets a little lizard-like, a little peacock-like. It’s finding fragments of the stuff it has seen before and showing that to us: fragments of peacocks and lizards and dogs based on abstracted noise. It identifies a dog eye in the noise, and then a snout. Those identified patterns suggest a dog’s face, and so on.

So basically what you are seeing here is the machine’s internal bias to find something in these images and showing us how it sees them.

This is what kicks off the image generation shift in AI. Everything we have now is built off of this.

In a few weeks we are going to go deeper into what came next: it’s a lot, and this is a class about contemporary AI tools, so the big ones will get a whole class each. We’re also going to have a pretty introductory, high-level overview of machine learning and what it all means. 

But I want to conclude with another way of thinking about all this data. Thinking about how the images that we got out of DeepDream were completely shaped by the images that Google had already collected and classified. Deep Dream knew a lot about a small set of images, because if you wanted an image classification system for fruit, you had to collect the data for fruit for this system. Automatic classification would come. 

We talked earlier about the culture of AI — of knowledge engineers going into the world and recording what they saw other people doing, in order to translate those patterns for the sake of building machines. It’s worth coming back to that here, for a few reasons. First, where did the images used to train these systems come from? Why do you think it was Google that developed this tool?

One reason is that they had all these images to begin with, and were developing ways to analyze and label them. But also notable: the images didn’t belong to Google. They were indexed from the World Wide Web. I am flagging this here because there’s this cultural component around data, images, and artificial intelligence that has turned into a major controversy in AI circles. Think about this in cultural terms.

The culture of AI engineering, traditionally, was to go into the world and observe it, and then break those observations down into steps. The observations became data points, and rather than having engineers go into the world, the World Wide Web brought the world straight into Google’s buildings — into its servers and infrastructure. It would make sense to look at all of that data as free for them to use — after all, it was inside of their computer systems! So there’s a very particular relationship that Google has with images, which is different from the relationship that people have with images they post online. Initially, Google was building tools to analyze and label and sort this data. But then, the technology they developed got reversed: the data was being used to make new images. And that shifted, quite literally, the relationship Google had with culture: it moved from analysis to production.

Add to this another factor, which we’ll explore in coming classes: think about how datasets are constructed, what they include and leave out, and how they shape the thing they measure. When we are thinking about the data used in today’s image generation, it’s important to think about this stuff for a lot of reasons. 

But to start us off, I want to talk about the colors of birds

Works Referenced in this Section:

  • Beverly Thompson and William Thompson (1985) Inside an Expert System. Byte Magazine. (Link)

  • Diana Forsythe (1993) Engineering knowledge: the construction of knowledge in artificial intelligence. Social Studies of Science 23: 445. 

  • CNN’s First Reports on the World Wide Web. YouTube.

  • Adam Curtis (2011). Loren Carpenter SIGGRAPH Experiment 1991. From the BBC Documentary, All Watched Over By Machines of Loving Grace.

  • MG Siegler (2010). Eric Schmidt: Every 2 Days We Create As Much Information As We Did Up To 2003. TechCrunch. (Link)

  • Digital Future Society (2021). Interview with Mary Gray. YouTube.

  • Hinton, G. E., Osindero, S. and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation 18, pp 1527-1554. (Paper)

  • Alexander Mordvintsev, Christopher Olah, and Mike Tyka (2015). Inceptionism: Going Deeper into Neural Networks. Google Research Blog. (Link)

  • Jonah Nordberg (2015) “Inside an Artificial Brain.” Generative Artwork. (Link)

  • Abraham Werner (1812). A Nomenclature of Colours. William Blackwood Publishers. (Link)

  • Eryk Salvaggio (2020). Who Decided the Colors of Birds? Four Lessons About Data From the History of Color. Cybernetic Forests. (Link)