How to deal with problems as a team

How effectively a team deals with problems has a critical impact on whether a project succeeds or fails. It is too easy to ignore problems for too long and continue pushing forward on a project, hoping the problem will somehow go away or resolve itself. But problems eventually stop us from working, and they often results in us having to undo work we’ve already done.

Being hit with the need to diagnose a problem and then having to do rework to correct it can have a dramatic effect on our projects.

Mike Griffiths | PMI-ACP Exam Prep

It’s almost impossible to add something to this paragraph because it summarizes many concepts and ideas in a very precise way: we see so many times how people (try to) ignore a problem until it’s become a crisis which exploits in their hands.

And when it happens, when a problem turns into a crisis, everyone panic, opening all the umbrellas they can, with the primary intention of covering themselves.

The real solution is to be proactive, and if we found a problem, in that very moment we must start working on that because to stick our heads, necks, and everything else into the sand is never the answer.

The real answer is to solve the problem.
And it applies in every aspect of our life, never is the solution deny the reality, excellent, good, average o poor it is what it is, and the only real way to improve something is working on that.

The use of color in data visualization

While I was working on one of my last visualizations, I needed much longer than expected it because the incorporation of colors complicated everything.

After of each failed intend, I did not understand consciously the reasons because I was completely certain about why I have the sense that each image was wrong but I clearly intuited it, so I let my work rest for a while to understand what is the origin of that strong though elusive belief.

Rather than resign myself to a mediocre visualization, I was determinate to make the most of my visualization. While reflecting on how to follow, a sturdier element to break up the circle of multiple mistakes: not so long I had finished with “Data at work” and inside the book, there was a chapter dedicated to the application of the principles of the use of color. Suddenly all those ideas returned to my mind and I understood that that chapter was the origin of my discomfort with each of the created visualizations.

No one wants to share the mistakes that we made, but let’s face it there is no better way to understand something that fails over and over and over, as Thomas Alva Edison said: I have not failed. I’ve just found 10,000 ways that won’t work.
In this opportunity fortunately, they didn’t 10,000 ways just a few times only.

But, first the data

The visualization is based on the stats about multiple births along the last three decades. This data presents a high range with very large numbers for birth rate and tiny numbers for triplets, quadruplets, quintuplets births. 

First attempt: Add very, very few colors

In order to put the color to work for us, I chose to add a few colors, using a very simple palette. And as you can see in the image below these lines, the result is not fair with the visualization created because it turns it in something extremely plain.

[Image with two colors]

Error: Lack of alignment between color and suitability of the task. The excessive simplicity of the color palette renders it incapable of showing the variability of the data.

Second attempt: Stimuli intensity & suitability to the task

After to understand the origin of the mistake in the previous attempt, I looked for a palette composed several colors: with different values and a high contrast between them.

The final result can be seen below these lines, and in this opportunity even when there was a reasonable good choice in the color selection, the final result is a graph where the data associated with multiple births (triplets and more) are represented with one single color.

incorrectusepalette2

Even though the choice of the palette was improved, it is still not the appropriate and final palette. The use of this palette was inappropiate due to lack of suitability to the task: it not allowed to show the sequence in the data (in this case clearly exists because we have more births of twins and less quintuplets), and also prevents perceive the order of the data.

 

Third attempt: Correcting stimuli intensity and misrepresentation of data

Stimuli intensity and Palette

I finally chose a palette from Rcolorbrew in blue shades, with 9 colors, showing a sequence showing the orderly display of this data.

Variability of the data

To solve the problem of huge range between data, I generated separately three graphs grouping the data sets with values similar to each other, in order to show the variations between them.

correctingpalette_3separategraphs2

For the first two graphics I used the colors of the Blues palette of Rcolorbrew, however for the third graphic I decided to change the last color of the palette for a lighter one.

To help me in the election of the lighter color based on a specific color, the best tool that I found was:   http://www.color-hex.com

Final graph

Project step by step: What is the range of age with most players in the NFL?

A popular belief that has been considered valid throughout the centuries is the best way to understand an idea is to explain it to someone else. Seneca said so many centuries ago: “While we teach, we learn”.

Based on that I chose a very simple project in order to implement a histogram, firstly to break down the steps about how to use it, secondly because through this simplification we obtain very small, manageable pieces that facilitate understand the concepts behind each idea.

Definition of histogram

A histogram is a graphical method for displaying the shape of a distribution. It is particularly useful when there are a large number of observations.

The data

I will use a dataset with the 2,764 NFL players for all team rosters as of July 22, 2016. Some of the information includes jersey number, name, position, age, height (in inches), weight (in lbs), years in the NFL, college they graduated from, NFL team, position grouping (OL, QB, tailback, TE, WR, Front 7, DB, special teams), side of the football (offensive, defense or special teams), and their experience level by years played.

I mentioned all the elements of the data due to the completeness of this data set allow me to consider it to answer other questions and implement other tools.

In this particular example in answer to the question What is the age range with most players in the NFL?, Considering the age of 2764 NFL players which are between 20 and 43 years.

Preliminary analysis of the data

Interval’s Lower Limit Interval’s Upper Limit Class Frequency
20 25 1652
25 30 902
30 35 160
35 40 17
40 42 3

This table shows the range of ages (intervals). The first interval is from 20 to 25, the second from 25 to 30, etc. In the next column, the number of players falling into each interval was counted to obtain the class frequencies.

Code

After to analyze the data, we can create an histogram in R:

#Load the data
nfl_2016 <- read.csv("NFL_Players_2016.csv")

#Cleaning the data
nfl_2016 = subset(nfl_2016, nfl_2016$Age != "--")
nfl_2016$Age <- as.numeric((as.character(nfl_2016$Age)))
 
hist(nfl_2016$Age,
 main="Distribution of NFL Players by Age", 
 xlab="Age", 
 col="blue",
 breaks=5,
 ylab="Quantity of players",
 ylim=c(0,2000)
 )

Final result

nfl

Sources

Project step by step: Is U.S. multiple birth rate hits record high?

 

After read “Data at Work” I knew that I want really try to use the sequential steps to solve a problem proposed by the book showing a very logical path to solve my own questions.

I would like to add how much I found this book extremely useful to understand and solve good problems. You can read my review and my notes about the book. Even when the book is completely oriented to solve problems using Excel 2016 as a tool, the steps involved (and the way of thinking) is entirely valid for any data visualization problem.

Collecting the data

Perhaps the most ungrateful task because there are few chances to find a good and reliable source of data.

Assessing data availability

After a good research into internet looking for different government sources, I found several datasets that can be used as a source of information in “Vital Statistics of the United States, 1980. Volume I, Natality” and subsequent reports submitted for each year until 2014.

Adjusting the data

Fortunately, the data was already normalized, so in this case, there is no need to make any adjustment.

Exploring the data

I examined the data creating a few charts to understand it as a first step. Along these lines, there are two graphs and even when they are very simple, clearly show trends and proportions.

Distribution of the information

In this visualization, we can observe the distribution of the information by color indicating how much multiple births happened by year and category: total, twins, triplets, quadruplets, and quintuplets.

 

us_birth_availability-2

Evolution along time

The rise in multiple birth rates has been associated with expanded use of fertility therapies such as ovulation-inducing drugs and assisted reproductive technologies (ART).  Also, older maternal age at childbearing also contributes to more multiples births because of elevated FSH (follicle-stimulating hormone) as women age.

us_multiplebirthcomparison3x

Final conclusion

The answer to the initial question << Is U.S. multiple birth rate hits to record high? >> is yes, multiple birth rate are in a continuous growth during the last three decades.

Notes:

Project step by step: In which states there is less wage gap between men and women?

At the beginning of each visualization work, along with the question to answer the most important thing is to look for a reliable data source. Within the different origins that can have the data there are different degrees of quality and statistical reliability, which can range from data obtained through official sources to data obtained from social networks

In this case, the data came almost at the same time as the question to answer while I was reviewing the fantastic web http://statusofwomendata.org, and I started wondering which are the states where there is less difference between wages according to gender and race.

Collecting the data

Even when I found part of the data at http://statusofwomendata.org, the source of the data belongs to the IWPR analysis of the American Community Survey data Integrated Public Use Microdata Series, Version 5.0.

 

Assesing data availability

Once established the fact that the source is reliable and accurate, those characteristics does not exempt it from the existence of missing values in it.

And even if I could detail in which states exist a lack of information, it would be better to create a graph that allow to visualize the lack of data.

[Heatmap with all the information]

Adjusting the data

Considering the question that we want to answer, we will be working with percentages instead of absolute values. For that reason, we will consider the median correspond to men as a point of comparation against women from different ethnical origins.

[Heatmap with the remain information]

Exploring the data

In order to understand the data, the creation of some graphs nothing extremely complex just something very simple and plain is usually a terrific tool. I would like to present the next graphs: one shows the trends in salaries by gender without any separation by ethnic group. The second graph shows the trends in salaries for women and men separated by ethnic group.

The purpose of both graphs is to be used as a tool, instead of serving as an information, like an insight about the data, in order to understand what are the trends inside behind the plain numbers.

Some loosey ideas about the data:
– There is a gap by gender. Mathematically this value is around 20%

[Graph 1: Trends for women and men]

[Graph 2: Trends for women and men separated by ethnic group]

Answering the question

The general idea is created a rich chart, to present the information to the reader and let some room to analyze the information.

 

 

Ideas:

Data at work

No man is an island

John Donne

There is no better way to explain the nature of data but in the context of relationships.

We have enough data around us than ever in the story of the mankind, but we dont need more data specially if it’s not accompanied by the right skills to transform it  in better data.

“Data at work” make focus in business visualization as a way to communicate complex information in a context of business, not using beautiful charts but effective charts… and off course, if your graphs are effective they also can be beautiful.

There are many aspects that can be considered relevant in this book, however I would like to make focus on: how to use visualizations in a work context, the presentation of several basic concepts during the first chapters and the practical approach of the book… One of the many concepts and ideas expressed along the introduction was:

Data visualization is not a science, it is a crossroad at which certain scientific knowlodge is used to justify and frame subjective choise.

How I read this book

When I started reading this book I hope to read it and at the same time do the exercises, but it has 400 pages with theory (good theory for someone who has zero idea about data visualization, with examples, and concepts) and practice (no easy charts, but practical examples like the graphs that you could use at work). So, first time I read it in order to understand the main ideas and principles about theory of data visualization,  check the graphs and that it.

Second time I read with the spirit to understand each graph and to give a second thought to the theory read before.

Third time I read it while I used other data set and take the book as a theoretical and practical basis for my own visualizations.

That is the way that I worked for me. It is not an easy book, instead of it, it is a very useful and rewarded book, full of ideas for your own work. For moments it is a little wordy, but that is the way that the author found to present his ideas.

What about the companion site: dataatworkbook.com

I dont have more than words of acknowledgement about the companion site because for each of the chapter the author presents aditional and relevant information about concepts and ideas presented into the book.

During my second and third re-read of the book I started to visit the site, and I found a lot of information, ideas and research about each of the concepts presented in each chapter. And in most of the cases practical implementations created by The New York Times, Wall Street Journal and some many diferents sources.

 

Creating a SVG Map: Women in the Parliament

Nathan Yau‘s book, Visualize This, is a book for learning by reading and doing. The theory and concepts explained are stated clearly and colloquially, as if it were a great master class, in the same way that the examples are explained step by step, and detailing the reason behind each line of code.

I liked several graphs, their practical application and the way they allow you to visualize the data. I have already included another example (inspired by the book) corresponding to the use of a heatmap.

In this case, take as an example the practical case shown in the section Map Countries from Chapter 8: Visualizing spatial relationship about how to process countries using SVG map.

1 SVG File for World

SVG files are XML files. It can be easily edit it using a text editor. So, you can edit the color into the file, the XML tells the browser what to show, such as the color and the images.

2 Coloring the countries

I used the designer.colors function in R (in the fields package) to make a linear scale of 256 ‘new’ colors.

Ncolors <- 256
ColRamp <- designer.colors(n=Ncolors, col=c("#CCEBC5", 
"#A8DDB5", "#7BCCC4", "#4EB3D3", "#08589E", "#08589E"))

3 Generate hexadecimal colors

for (i in 1:nrow(wParliament2016)) {
 for (j in 1:nrow(Countries)) {
 
wParliament2016$Country_Code[i] <- ifelse(
    wParliament2016$Country_Name[i] == Countries$Name[j], 
    Countries$Code[j], wParliament2016$Country_Code[i] ) 

}#for j
}#for i

wParliament2016[is.na(wParliament2016)] <- 0

#Set up the vector that will save the CSS code per 
country
CSS <- rep("", nrow(wParliament2016))

#Divide the range of Life Expectancy in Ncolor bins
Bins <- seq(max(wParliament2016$X2014, na.rm=TRUE), 
     min(wParliament2016$X2014, na.rm=TRUE), length=Ncolors)

ColRamp[which.max(abs(Bins-
 wParliament2016$X2014[100]))]

#Loop through all countries. Asign a color.
#Save the CSS text in a vector
for (i in 1:nrow(wParliament2016)) {
 #Find which Bin is closest to value of Life Expectancy
 ColorCode <- ifelse(!is.na(wParliament2016$X2014[i]), 
    ColRamp[which.min(abs(Bins-
    wParliament2016$X2014[i]))], "white") 
 #Country.ID is the alpha-2 country code
 CSS[i] <- paste(".", tolower(wParliament2016$Country_Code[i]), 
 " { fill: ", ColorCode, " }", sep="")
}

write(CSS, "output2.txt", sep="\n")

4 Edit SVG file

The next step is edit the SVG file for the map, in order to change fill atributted with the correspondent hexadecimal color according for women representation that each country have.

.af { fill: #67C7D3 }
.al { fill: #86CFBD }
.dz { fill: #4BB0D2 }
.as { fill: #C8EAC2 }
.ad { fill: #1A71AE }
.ao { fill: #217BB4 }
.ag { fill: #B9E4BA }
.ar { fill: #2A87BC }

5 Edit the SVG map using Inkscape

Finally, the last step is edit the resultant map using Inkscape (or Illustrator) in order to add a title, source of the data and extra information to the image.

6 Final result

screenshot-2016-12-05-13-37-12

Hardening Iteration

A Hardening Iteration is an iteration that Agile project teams use to perform integrated testing on each of the different product increments developed during each iteration so far in the current release to ensure that they work together as a whole.

Osmotic Communication

Communication is the lifeblood of any successful Agile team, and osmotic communications make certain that the whole team maintains a useful level of awareness about issues and developments with a minimal amount of overhead and formality. Methodologies such as Crystal Clear and XP rely heavily on osmotic communication. Crystal Clear needs people to be very close to each other so that they overhear useful information and get questions answered quickly (Olson J. S. et. al., 2000), while the “Caves and Commons” room arrangement is recommend in XP.

“Commons” are areas maximized for osmotic communication while “caves” are organized to give people a private place. Research by Ben Rich in Shunk Works (1994) demonstrates how teams have obtained good results in making everyone accountable for reducing the energy cost of detecting and transferring ideas.

Visualize this

De la misma forma en que muchos se empiezan a interesar por Big Data y al buscar informacion empiezan en Internet, descubri que hay blogs que son referencia en el tema y uno de ellos es FlowingData, luego de conocerlo y navegarlo multiples veces no puedo estar mas que de acuerdo con el boca oreja popular.

El blog tiene sus anios y sigue siendo vigente en parte por sus tutoriales, su forma sencilla de explicar como visualizar conceptos e ideas, y porque a pesar de tener cierta trayectoria o precisamente por tenerla se mantienen actualizados y publican acerca de acontecimientos actuales y como analizarlos utilizando tecnicas de visualizacion de datos.

El autor del blog Nathan Yu, ha publicado dos libros relacionados al tema, el primero es “Visualize this”, piedra fundacional para principiantes (pensemos en el conocimiento como un edificio y que por algun lugar tenemos que empezar). El libro permite organizar el conocimiento que tengas de haber leido blogs, notas, papers, dado que  una forma sencilla e intuitiva presenta los conceptos basicos para entender que es big data, porque y como visualizarla, y una vez sentadas esas bases muestran ejercicios sencillos y desarrollados paso a paso; la mejor forma de aprender: haciendo.

Los primeros tres capitulos (Chapter 1 – Telling Stories with Data, Chapter 2 — Handling Data

Chapter 3 — Choosing Tools to Visualize Data) presentan la idea de como contar historias con data, como manejar los datos para que se transformen en informacion y como elegir herramientas para visualizar los datos. En cada uno de estos capitulos la idea es presentarle al lector la variedad de herramientas y formas de trabajo que existen actualmente y darle un panorama general.

Los siguientes capitulos son mas practicos y muestran en el Chapter 4 — Visualizing Patterns over Time muestra como visualizar la informacion en el tiempo dado que la informacion va variando de acuerdo a lo que suceda. Tambien indica que de acuerdo al tipo de informacion con la que se cuente (discreta o continua), el tipo de grafico a utilizar varia.

A lo largo del Chapter 5 — Visualizing Proportions is about data grouped by categories, subcategories and population. This chapter shows how to represent the individual categories, but at the same time how to each choice is related with the others. We will see data as a part of a whole and how to represent the information when proportions varies over time.

The most remarkable concept in this chapter is the visualization should represent  in a very good way the proportions.

En el Chapter 6 we will see Visualizing Relationships between the data, the similarities between groups, within groups, and even within subgroups. Looking for relationship in your data could be challenging (an elegant adjetive for the word trabajoso y dificil) but it is highly recommendable because the data shows be itself its own story though relationships and interactions. As the author explains (and I feel totally agree with that) playing with data is explore the data and perhaps during the process you find something interesting. And when it happens you can explain to your readers what you find. After all, in those cases is the data who choose to tell a story instead of force to the data to adjust a previous idea.

Chapter 7 is about how to spot groups within a population and across multiple criteria, and spot the outliers (values up or down to median value) using common sense.

It is simple when you need to compare across a single variable, but you need more tools when the dataset have a lot of variables for each object to compare.

Chapter 8 is about Maps, and what can I write about maps that can not be written before? After all, it is an excellent way to visualize informacion because it is more than intuitive: all are familiar with Maps, so look for the way to show information within them is move on one step under well-known land.  

I really enjoy this chapter because the results achieved using R at the beginning, and later Python and SVG are amazing, sume unas pocas pinceladas of Illustrator (or Inkscape) and the final result are sobresalientes y profesionales.  

Chapter 9 is the closure of the book, and it has a lot of recommendation, the most valuable is remember you are design and present the information for other people, no for yourself: it’s your job and responsability to set the stage.   

 

Chapter 1

What software should I use to visualize my data? There is a lot of options, some are out-of-the-box and click-and-drag. Others require a little bit of programming.

Chapter 3

What software should I use to visualize my data? There is a lot of options, some are out-of-the-box and click-and-drag. Others require a little bit of programming.

Out-of-the-box Visualization

Copy and paste some data or load a CSV file and you’re done. Select the graph and voila!

 

  • Microsoft Excel | Google Spreadsheets
  • Many Eyes

 

    • Tableau Software: offers a lot of interactive visualization tools and does a good job with data management. There is two version one free and other paid, the free version offers a reduce set of graphs and the data to create each graph is public, the paid version allows to maintain the information private and offers the complete set of tools and graphs.

 

  • Trade Offs

 

    • Even when you gain some flexibility and you can customize some things, there is a small variety of options to choose.

Programming

Even when requiere a considerable mount of effort and time to start, once you achieve some point you can do whatever you need with your data. Some of the tools that you could chose:

    • Phyton / PHP
    • HTML / Javascript and CSS

 

  • Trade offs

 

    • It is learning how to speak in a new language, with all the work, effort and time involved in that.

Illustration

If you are an engineer, well, you are out of a comfort zone, and this is another thing that you need to learn. Nevertheless, you should know how to manage at least in a comfortable way some of the most well known illustration tools because you gain a lot of control about the information that you present to the public, and if you present a polish data graphics people can clearly see the story that you are telling.

  • Illustrator: Adobe Illustrator is the industry standar. Every graphics that goes to the print at NYTimes was created with it. You can do where you need to do in graphics terms, the downside though is it expensive.
  • Inkscape: the free alternative very similar to Illustrator.
  • Trade Off: These are tools for illustration and graphics, there are not tool created for data manipulation, however those are a necessary complement for your presentation work.

  

Chapter 4

How to visualize time series data? Time data is everywhere. It is simple natural to have data over the time.

Temporal data could be categorized as discrete or continuous. Knowing which category your data belongs to can help you decide how to visualize it.

In discrete case, values are from specific points or blocks of time, and there is a finite number of possible values. For example, people take a test, and that’s it. Their score dont change afterward.

In continuous case, it is constantly changing, like the temperature, it can be measured in any time of the day and it changes.

Discrete points in time

  • Bars graph: Simple but useful graph.
  • Stacked bar chart:
  • Points using scatterplot: each dot has an x- and y- which represent each value. This kind of graph is used to visualize nontemporal data. For temporal data, time is represented on the horizontal axis, and values or measurements are represented on the vertical axis. The value axis of scatterplots doesn’t always have to start at zero, but it is a good practice.

Continuous data

Using continuous line.

  • Smoothing and estimation: LOESS locally weighted scatterplot smoothing, it enables you to fit a curve to your data. LOESS starts at the beginning of the data and takes small slices. At each slice it estimates a low-degree polynomial for just the data in the slice. LOESS moves along the data, fitting a bunch of tiny curves, and together they form a single curve.

Chapter 5

Que buscamos visualizar en las proportions: maximum, minimum and the overall distribution.

Parts of a Whole:

This is a proportion in the most simple form. It is a set of proportions from 1 to 100.

  • Pie: Simple, old fashion school (from 1801 by William Fairplay). Main recommendation: dont put too many wedges in one pie.
  • Donut chart: It is almost the same than a pie chart, but with a circle in the middle. Usually that space is used for a label or some other content.
  • Stacked bar chart: to show data over the time, o to show data by categories.
  • Hierarchy and rectangles: Or tree-structured data.

Proportions over the time:

  • What happen if you have a set of proportions over time? The most common thing is those proportions varies and there is different ways to show that:
    • Stacked Continuous:  Cuando tomamos cada uno de los graficos correspondientes a cada periodo de tiempo y los mismos son “apilados” uno encima del otro.
    • Point-by-point: Muy similar al Stacked continuous graph, pero una linea representa cada recta representa cada una de las categorias y su variacion en el tiempo. Resulta en un grafico tal vez mas facil de leer que el anterior.

Chapter 6

It is about visualizing relationships between variables. Along this chapter we’ll see three different concepts: Correlation, Distribution and comparison.

  • Correlation: when one thing tends to change in a certain way as another thing changes. In all the cases, but specially on those which involving correlation, the graph is important, but even more important is the interpretation of the results.
    • Relationship between two variables: We will use a scatterplot function to find it.
    • Relationship among several variables: We will use a scatterplot matrix, specially useful during exploration phases. Also it’s possible to create a scatterplot matrix with fitted loess curves.
    • Bubbles: even when scatterplot graphs are the horse battle for correlation, you can use bubble graphs to add a third variable in the same graphic: area size of the bubbles, plus x axis position and y axis position.
  • Distribution: We’ll see graphs to visualice everything about your data, in order to see the full distribution.
    • Distribution bars, or histogram
    • Density plots
  • Comparison: In some opportunities it’s useful to compare multiple distributions rather that just the mean, median and mode. In those cases is useful use a histogram matrix. At these point, the books presents different cases but at the end, the most important concept indicated along this section is: refine your graph to avoid interpretation problems for your readers, you need to do your best to explain the data plus take extra care in telling the story.

Chapter 7

This chapter is about how to spot groups within a population and across multiple criteria, and spot the outliers using common sense.

With a lot of common sense, the author explains what happen if you want to compare the square fit for two houses, it’s easy because its one single variable, but what happen when you want to compare number of bathrooms, floors… and perhaps more variables. At the end, it’s tricky and that is why we look for a way to comparing across multiple variables.

Comparing across multiple variables

  • Showing it all at once: Instead of the numbers, you can use colors to indicate values, facilitating to find high and low values based in colors.
    • Create a heatmap, to show how to do that different groups of variables, indicating by color how high o low is the value. Remember that heat map it enables to see all your data at once, however the focus is on individual points.
    • Create a Chernoff faces, you can use faces to show multivariable data, to see each unit as a whole instead of split up by several metrics, however this method is a little nerd, and just confusing for general public.
    • Create a star chart, you can use an abstract object to modify the shape to match data values. The center is the minimum value for each value, and the ends represent the maximum. It posible to represent several units on a single chart, but it’s become useless in a hurry, which makes for a poorly told story.
  • Running in Parallel, to identify groups or variables could be related.
    • Parallel coordinates: One line per unit, and after connecting the dots, you can look for common trends across multiples units. With relation a relative scales, axes span minimum and maximum for each variable. Due to the quantity of variables and lines, this graphic could be a little confusing, so, as good practice the last step should be editing the graph in Illustrator (or similar) to add colors, labels, blurbs and text in order to obtain a clear result.
  • Reducing dimensions using a multidimensional scaling, to put together those entity whit more similar variables. Nevertheless, this graph is really abstract and perhaps not really for a general audience.
  • Searching for outliers: All the previous cases presented along this chapter were about how units of data belong in certain groups, and in this section we are on focus of units that don’t belong in certain groups (estamos centrados, estamos viendo, que pasa con las unidades que no pertenecen a ciertos grupos). These points are called outliers. Sometimes they could be the most interesting part of your story, or they could just typos with a missing zero. The point behind that is you don’t want to make a graph on the premise of an outlier, because at the end, the resulting graph doesn’t have any sense.
  • You can use specific functions, but nothing is better that common sense, basic plots and knowledge of the data that you are managing. Once you find the outliers, you could use varied colors, provide pointers or use thicker borders to remark them into a graph (if this is your intention, off course; otherwise and if it doesn’t add any relevant information you can eliminate it).
    • Also you can use a box plot, with shows quartiles in a distribution. Box plot can automatically highlight points that are more than 1.5 times more o less than the upper and the lower quartiles.

Chapter 8: Maps

Maps, and a revision of this subject using R, Python and SVG. Using maps is almost the same than using statistical graphics, instead of using x- and y- coordinates your are deal with latitude and longitude.

Also, it is quite interesting when we introduce time. One map could represents a slide of time so several maps represent several slides of time.

  • Specific locations, just map a list of locations based on latitude and longitude.
    • Map with dots: map of specific points
    • Map with lines: to connect the dots on your map
    • Scaled points: We are using the map with points, but adding the principles of the bubbles plot and use it on a map.
  • Regions, to represent no only single locations but counties, states, countries as a entire regions.
    • Color by data: using choropleth maps are the most common way to map regional data. Based in some metric, regions are colored following a color scale that you define. Variations of colors, categories and symbols {permiten} contar la historia completa, as well as annotate your maps to highlight specific regions or features, and aggregate to zoom in on countries.
  • Incorporation of time, in order to visualize the data over space and time.
    • Small multiples maps, one map for each slice of time.
    • Take the difference, no always is necessary to create multiple maps to show changes. Sometimes it makes more sense to visualize actual difference in a single map, it highlights changes instead of single slices in time. There is specially useful to add a legend, source and title if the graphic is for a wider audience.
  • Animation: One of the most obvious ways to visualize changes over space and time is to animate your data. Instead of showing slices in time with individual maps, it is possible show the changes as they happen on a single interactive map.

Chapter 9: Design with a purpose

How your design your graphics affects how readers interpret the underlying data.

Visualization is about communicating data, so it is necessary to take the time to learn about what makes the base of each graphic.

Important highlights:

Know about the data, after all, how can you explain interesting points in a dataset when you don’t know the data?

Learn about numbers and metrics.

Figure out where they came form and how there were estimated, and see if they even make sense.

Take the time (and seguramente va a llevar tiempo) to get to know your data and learn the context of the numbers.

Punch some numbers in R to understand what each metric represents.

After you learn all you can about the data, you are ready to design your graphics: if you learn about the data, the visual storytelling will come natural.

Prepare your readers:

The objetive of a data designer is to communicate what you know to your audience. Assume that your reader receive the graph without any context so, to accompany the graph with labels, titles and colors is vital.

Conclusion

Es un libro que vale la pena leer, es corto pero no demasiado, orientado a mostrar conceptos y en forma practicar explicar como crear graficos basados en datos. Bien organizado, bien diagramado internamente, y con buenos graficos es un excelente comienzo para todos aquellos interesados en visualizacion de datos.