At the beginning of each visualization work, along with the question to answer the most important thing is to look for a reliable data source. Within the different origins that can have the data there are different degrees of quality and statistical reliability, which can range from data obtained through official sources to data obtained from social networks
In this case, the data came almost at the same time as the question to answer while I was reviewing the fantastic web http://statusofwomendata.org, and I started wondering which are the states where there is less difference between wages according to gender and race.
Collecting the data
Even when I found part of the data at http://statusofwomendata.org, the source of the data belongs to the IWPR analysis of the American Community Survey data Integrated Public Use Microdata Series, Version 5.0.
Assesing data availability
Once established the fact that the source is reliable and accurate, those characteristics does not exempt it from the existence of missing values in it.
And even if I could detail in which states exist a lack of information, it would be better to create a graph that allow to visualize the lack of data.
[Heatmap with all the information]
Adjusting the data
Considering the question that we want to answer, we will be working with percentages instead of absolute values. For that reason, we will consider the median correspond to men as a point of comparation against women from different ethnical origins.
[Heatmap with the remain information]
Exploring the data
In order to understand the data, the creation of some graphs nothing extremely complex just something very simple and plain is usually a terrific tool. I would like to present the next graphs: one shows the trends in salaries by gender without any separation by ethnic group. The second graph shows the trends in salaries for women and men separated by ethnic group.
The purpose of both graphs is to be used as a tool, instead of serving as an information, like an insight about the data, in order to understand what are the trends inside behind the plain numbers.
Some loosey ideas about the data:
– There is a gap by gender. Mathematically this value is around 20%
[Graph 1: Trends for women and men]
[Graph 2: Trends for women and men separated by ethnic group]
Answering the question
The general idea is created a rich chart, to present the information to the reader and let some room to analyze the information.