Critical Topics: AI ImagEs, Class Four

Who DecideD the Colors of Birds?

A comparison of an illustration of one bird, using the colors available in 1886 and 1814, respectively. The way we experience the world is a reflection of the technologies available to us — can the history of color offer lessons for our new era of data-driven technology? Illustration by Eryk Salvaggio. CC-BY-SA.

We talk a lot about data and datasets in this class. So I want to explore this idea a bit further, by talking about what data is and how it shapes our understanding of things. We’ll start by talking about color.

Nature creates colors, but humans organize and name colors. That involves human decisions, technological influences and economic interests — things that reflect how datasets are being assembled and distributed today — and the hardening of certain subjectivities that have come to describe our world. In the image above you see a simulation of the same bird, painted according to the colors of inks that were affordable to printers across time. If you’re a person in the 1800s trying to learn about a bird that you haven’t ever seen, it may be difficult to find out what a bird looks like in 1814 compared to a bird in 1886. 

Let’s think about observation and description for a moment. How might you describe a bird in 10 words? The first thing I do is declare a dominant color: black with white stripes. This won’t do the job: you might be imagining a zebra. The stripes may go in different directions in your mind and on the bird. My perception of “black” might be a purple seen in shadows. In every case, we’re imagining different creatures. 

The following is a text I wrote in 2020, and I am sharing it here because it frames an important set of questions:

  • How might we read a dataset critically, the way we might read a text?

  • How might we interrogate the way these datasets are constructed, what they say, and what they leave out?

  • What do we lose in that process, and what could it teach us about the “datafication” of the world? 

This is a deep dive into the hexadecimal system for representing color on the World Wide Web: every web page uses a series of numbers to tell the browser what color to display. Where did that come from? How far back are its actual origins? The answer takes us back at least as far as paint catalogs and Color Dictionaries. Each represents human decisions, technological influences and economic interests that describe how datasets are still being assembled and distributed today — and the hardening of certain subjectivities that have come to describe our world.   

It’s also a way of thinking about data, history, and the ways that ideas about the world can become embedded into technology — or totally disregarded. So to understand the colors of this web page, let’s go back to 1812.

The Nomenclature of Colours 

Before color photography, colors had to be described through an error-prone format: the written word (Lewis, 2014). The first attempt to standardize this in the English language came in 1812, mineralogist Abraham Werner’s Nomenclature of Colours, which named the colors observed in rocks. Later, this book was illustrated and republished with color patches by an Edinburgh flower painter named Patrick Syme.

The page of “Blues” in Werner and Symes’ 1821 edition of The Nomenclature of Colours. CC0, courtesy of the Internet Archive.

For the first time, people could look at the book and see the exact color as it was described. It wasn’t the first book about color, but it was an influential early reference. Goethe published a work on color theory with illustrations, Zur Farbenlehre, two years earlier, but it wasn’t exactly a reference guide. Isaac Newton came up with the color wheel, but it wasn’t designed to correspond to anything observed in nature — it was mapped to correspond colors of the rainbow to musical notation, which gave him the idea of a color wheel in the first place. Even then, Newton adjusted the color spectrum to fit the musical scale, rather than his naked observations: orange and indigo were reverse-engineered to make it work with scales (Fisher 2015).

Werner’s color system came from his references in mineralogy, and so he sought to achieve a “standard” color system based on the colors of 79 known minerals, extended to 110 through observations of other natural phenomena (Werner 1821, p9). It also introduced some unfamiliar hierarchies: “Orange” was a subset of yellow, whereas “Purple” was a subset of blue (Werner 1821, p11) because many purple dyes were simply too costly to produce and obtain. You could reproduce the dictionary’s structure today in Excel: a series of columns and rows, with a number on the right to reference the row. This color was named; then a sample of it was painted into a square box. Next to the splotch were the names of an animal bearing the color in its plumage or fur, a vegetable, fruit, or a mineral. Some colors struck all the categories, some just one. 

The goal was to establish a reliable, common reference that left less to the subjective imagination than written words. Werner writes: “How defective, therefore, must description be when the terms used are ambiguous; and where there is no regular standard to refer to,” (1814). As with any standard, the data was only useful if it was widely referenced. In that regard, it was wildly successful. The book may have been intended for mineralogists, but its most notable claim to history came from a completely different field, referenced heavily by a young biologist named Charles Darwin as he described the colors of birds, mammals, and even the skies around Galapagos. Michelle Nijhuis catalogs some examples Darwin pulled from the book in a 2018 New Yorker article: 

Darwin describes cuttlefish as tinted with “hyacinth red and chestnut brown,” a sea slug as “primrose yellow,” and a type of soft coral as “light auricular purple.” 

The guide had come during an explosion of discoveries, exploration, and documentation that quickly led to its use in describing all kinds of natural phenomena, showing up in research from “naturalist Sir John Richardson, the botanist William Hooker, and the Arctic explorer William Parry” (St. Clair, 2018). 

The book emerged as a standard reference for color, though by no means was it an objective or exhausting list of color. Werner and Syme would convince everyone to see the world in their personally selected shades: what was useful in identifying minerals and a handful of plants. That was one constraint. The work was further constrained by what could be reproduced with available paints. 

Colors had been lifted from the natural world and standardized into a world of language and symbols. To be understood, you drew lines between observations of the colors of minerals to their closest representations in the dictionary. This was a way of expanding the way we communicated. It helped us have more accurate understandings of new natural discoveries. But in the process, we lost something: gradients and shades of the world that didn’t fit the dictionary moved into something that was. 

Lesson One:

  • Whenever you think data is “accurate”, remember a crow’s wings in sunshine. In a world where purple is too expensive, the tips of those wings are black. 

Naming the Colors of Birds

One page of blues from Robert Ridgeway’s 1886 A Nomenclature of Colors for Naturalists, CC0, courtesy of the Internet Archive.

If you had to imagine birds — even aided by an illustration — with a reference only to the colors of rocks, it would be a different bird than one presented with 1,115 colors selected specifically for describing birds. Our world is — and some birds are — more complex and vibrant than could be communicated with 110 rock-colored words. 

As an answer to this, Robert Ridgeway’s 1886 book, A Nomenclature of Colors for Naturalists, expanded the dictionary of color well beyond the colors of minerals. This one contained 1,115 colors observed from his study of birds, with names associated with those birds (“Jay Blue”) and other elements of nature, rejecting names that didn’t describe something natural (Lewis, 2014). Ridgeway’s book updated Werner’s, making use of new tech: printing and dyes had radically advanced since Werner and Syme’s work. Ridgeway wanted to build off of their work, but describes what we would today consider an “emulation” problem in updating their data: the colors of Werner’s books, which were hand-painted into each volume, had faded or decayed. Elsewhere, the recipes for mixing those colors were lost. 

Meanwhile, the production of dyes had transformed into a more industrial process. Cheaper, synthetic alternatives to ultramarine allowed wider access to vibrant blues by 1844, for example (Mertens 2004, 220), while increased trade of fertilizer serendipitously led to the discovery of cheaper, bolder purples (Blaszczyk 2012, 22) which were too pricey to be included in Werner and Syme’s book. The same issues haunted a 2018 reprint of Werner’s book by the London Natural History Museum, where the most vibrant reds of the original printings had faded beyond almost any recognition besides “brown.” (St. Clair 2018). 

So Ridgeway “emulated” Werner’s colors, reimagined from their original descriptions, and brought them into his updated data system (Ridgeway 1886, p10). In naming these colors, Ridgeway is clear that they inherently lose specificity: “if the ‘orange’, ‘lemon’, or ‘chestnut’ on the plates does not match exactly in color the particular orange, lemon, or chestnut which one may compare it with, it may (or in fact does) correspond with other specimens” (1886, p16). So, despite the broader palette, the user was still indexing colors to what had been selected — and what had been selected was what was cheaply reproducible. What seems so “natural” a process as color is in fact deeply expanded or constrained within constraints of industry and scientific possibility. 

Lesson Two:

  • More data means more detail, but bringing old datasets into new datasets requires adaptations. Those adaptations should reflect the observed world, not just “what it takes” to make those systems compatible.

Lesson Three:

  • Make an ingredients list for your “paints”. You don’t want someone to come across your data and have no idea how you got those colors (or results). Anticipate decay and obsolescence! 

The same world, with different references: a comparison of Werner and Symes’ page for “Blue” based on minerals and paints from 1821 (left) with Ridgeway’s page of “Blue” based on observations of birds using paint from 1886 (right). Ridgeway adds an additional page dedicated entirely to purples, and now purple has its own category. Both images CC0 courtesy of the Internet Archive.

From Pantone to Papayawhip 

A page of blues taken from the Pantone Color Matching System. Copyright Pantone, with fair use applied for educational purposes.

Ridgeway’s system was a success. You can see traces in today’s Pantone Scale (Meier, 2016), showing that how we structure data can linger for decades, even centuries. Both simplified the dictionary into a block with color up top, then text, a number, and a name. Pantone’s system is used across printing, design, and paint companies to standardize color references; it’s even named in official documentation for national flags, such as Scotland’s standardization of its blue to Pantone 300 (Macqueen 2000). But these systems simply didn’t work for digital.

From the early 80s into the 90s, colors came to computer screens. It was an advance for interfaces, but a centuries-backward step for the representation of color. Early computer screens were monochromatic. Typically amber, lime green, or white on black screens, another result of available tech: cathode ray tube monitors worked with phosphors which burned on-screen in green or white. By the 1980s, home computers could display gradients of red, green, and blue in a single pixel. Combined with computational power, this expanded the palette to a whopping 256 colors. The Pantone scale, with its thousands of options, now had to be reduced to less than were originally presented in Ridgeway’s 1856 color dictionary. 

Designers needed to know how to create with those 256 colors, especially on the Web, because browsers could render “outside” colors in unpredictable ways. This led to the categorization of “web-safe” colors, which could consistently be reproduced across browsers and operating systems (Fuhrt, 2018). 

Each color in the hexadecimal system is identified for its value in red, green, and blue. Each pixel on the screen can show some mix of those three. So designers needed to tell the pixel how much red, how much blue, and how much green to display. They did this through a six-digit (hexadecimal) number: #000000 would be black (no intensity). You could read that as Red: 00, Green: 00, Blue: 00. If you wanted more of one color, you used the alphabet up to F to shade it in the appropriate slot: #FF0000 would be pure red, #00FF00 pure green, and so on. In a way, this naming convention was purely descriptive: the “name” of the color was literally identical to the values produced on screen. 

To help humans, 16 core colors could be called by name: aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, purple, red, silver, teal, white, and yellow (W3C, 2020). These were “color keywords,” themselves derived from a 16-color standard that was established in 1982 for a software system — a file called “rgb.txt” written for MIT’s graphic-user interface experiment, X10R3 (Sexton, 2014). In 1989, that X10R3 system versioned up to X11. X11 dramatically expanded that range of named colors and would be the last update to rgb.txt. The W3C consortium, which is responsible for maintaining Web standards, would later take that updated file and integrate it into the new color system for the Web.

The list of colors in the final version of rgb.txt was named by Paul Raveling and John C. Thomas. Raveling had the first go, naming colors based on how they appeared on his home’s uncalibrated computer monitor in comparison to paint swatches he had laying around his house from a local paint company, Sinclair Paints, which offered a Pantone-like catalog (Sexton, 2014). Raveling posted that he had wanted to use more standardized color dictionaries, but gave up after his letter and check to the American National Standards Institute for catalogues went unanswered (Sexton, 2014). Instead, the names are a reflection of Paul Raveling’s experiences with color: “Dodger Blue,” for an American baseball team’s uniform (Tveten, 2015); “Alice Blue,” for Alice’s dress from Alice in Wonderland (Sexton, 2014); and “Navajo White,” one of a few problematic Sinclair paint colors referencing Native Americans (Tveten 2015). 

Expanding that list of colors shortly afterward, John C. Thomas referenced the colors in a Crayola box he found on his floor. At the time Crayola included colors, such as “Indian Red,” which have since been removed after an outcry that the colors reinforce cultural stereotypes.

Others were simply strange, unnatural choices: “papayawhip”, “gainsboro”, and “peru” being particularly unintuitive. These all, nonetheless, remain a standard name for the color within CSS (Tveten 2015). When called out for it in 2001, the W3C decided it wouldn’t change these color names, because they were already too widely used on websites that would stop functioning if the names were removed (Tveten 2015).

The 256 “web-safe” colors and their names, cc-by-sa via Wikipedia.

Today, calling colors by these names is a rare find in website code. Design tutorials and color wheels today are much more likely to find a pure hexadecimal value than a name. 

Lesson Four:

  • When compiling a dataset, look at other people’s crayon boxes. What else needs to be included for this data to be valuable? How do your decisions about translating data into labels reflect your subjectivity? What efforts have already been made, beyond your immediate experience? Bring other people in. 

The reason we’re talking about this is because colors became a dataset. Someone collected examples, put them together, and institutionalized their names. The colors in that book were constrained by the technologies available at the time: purple was expensive, so they didn’t use it. The colors were assembled by people with a specific field of expertise, so we have the colors named after minerals. Nothing wrong with that, but it’s there. They distributed this dataset as a book, which everyone referenced, even though it was incomplete. 

Today, data really doesn’t work any differently. But we talk about it differently. Today we might say that a machine did something, or an algorithm did something. But in fact, people designed those machines, people decided what data to use and how to collect it, how to represent that information, what to include and exclude. 

Whoever collects this data has an enormous power and responsibility over how to shape the way they are shared, used, and even understood. So we’re going to look at data from that perspective when we talk about artificial intelligence, because right now, as many of you know, there is a big scandal around how the datasets used for AI tools were collected. And it’s possible that, like the names of colors, those decisions about these tools are going to be with us for a very long time.  

When we talk about today’s AI, we are talking about data. Data is carved out of the world, by someone, for some purpose, with some limits. Data simplifies and reduces the world, which can be essential for communication. But what we observe and communicate should never be confused for what is “true.” Data of any kind, by definition, shrinks the world to discrete points on a chart or in a computer. 

Color dictionaries were designed to create a shared reference between the author’s observations of the world, and people who couldn’t see that world. That’s something that data scientists and engineers can learn from the experience of designers and artists: to think clearly about the information you’re collecting, to show people how it was collected, to think carefully about the tools you use to reproduce that data, and the choices you make about what to include or leave out. 

Want Something Else to Do?

  • Designer Nicholas Rougeux has created a wonderful GDoc that connects Werner’s color dictionary to their hexadecimal equivalents.

Works Referenced