The use of color in data visualization

While I was working on one of my last visualizations, I needed much time than I expected because the incorporation of colors was extremely complicated.

After of each failed intend, I did not understand consciously the reasons because I was completely certain about why I have the sense that each image was wrong but I clearly intuited it, so I let my work rest for a while to understand what is the origin of that strong though elusive belief.

Rather than resign myself to a mediocre visualization, I was determinate to make the most of my visualization. While reflecting on how to follow, a sturdier element to break up the circle of multiple mistakes: not so long I had finished with “Data at work” and inside the book, there was a chapter dedicated to the application of the principles of the use of color. Suddenly all those ideas returned to my mind and I understood that that chapter was the origin of my discomfort with each of the created visualizations.

No one wants to share the mistakes that we made, but let’s face it there is no better way to understand something that fails over and over and over, as Thomas Alva Edison said: I have not failed. I’ve just found 10,000 ways that won’t work.
In this opportunity fortunately, they didn’t 10,000 ways just a few times only.

But, first the data

The visualization is based on the stats about multiple births along the last three decades. This data presents a high range with very large numbers for birth rate and tiny numbers for triplets, quadruplets, quintuplets births. 

First attempt: Add very, very few colors

In order to put the color to work for us, I chose to add a few colors, using a very simple palette. And as you can see in the image below these lines, the result is not fair with the visualization created because it turns it in something extremely plain.

[Image with two colors]

Error: Lack of alignment between color and suitability of the task. The excessive simplicity of the color palette renders it incapable of showing the variability of the data.

Second attempt: Stimuli intensity & suitability to the task

After to understand the origin of the mistake in the previous attempt, I looked for a palette composed several colors: with different values and a high contrast between them.

The final result can be seen below these lines, and in this opportunity even when there was a reasonable good choice in the color selection, the final result is a graph where the data associated with multiple births (triplets and more) are represented with one single color.

incorrectusepalette2

Even though the choice of the palette was improved, it is still not the appropriate and final palette. The use of this palette was inappropiate due to lack of suitability to the task: it not allowed to show the sequence in the data (in this case clearly exists because we have more births of twins and less quintuplets), and also prevents perceive the order of the data.

 

Third attempt: Correcting stimuli intensity and misrepresentation of data

Stimuli intensity and Palette

I finally chose a palette from Rcolorbrew in blue shades, with 9 colors, showing a sequence showing the orderly display of this data.

Variability of the data

To solve the problem of huge range between data, I generated separately three graphs grouping the data sets with values similar to each other, in order to show the variations between them.

correctingpalette_3separategraphs2

For the first two graphics I used the colors of the Blues palette of Rcolorbrew, however for the third graphic I decided to change the last color of the palette for a lighter one.

To help me in the election of the lighter color based on a specific color, the best tool that I found was:   http://www.color-hex.com

Final graph

I chose a palette with few colors but with high contrast between them, simplify the design without extra columns for years, change for a more clear typography… And this is the result:

us_birth_availability-2

Leave a comment