Technical Illustration & Infographics | Visualizing Data

 
 

Visualizing Data

Explore Feltron.com

You might think graphs and charts are pretty boring. They might conjure images of corporate reports, Powerpoint presentations, or high school Economics classes.

Graphs and charts are indeed generated for those purposes (and designers and illustrators are hired to make them!), but they go beyond that. Data reflects life.

Data are numbers that record our activity and identities in real life. They detail how we live, how we die, how we eat, how we love, and so many other endeavors. And charts can help us and others understand our lives.

Nicholas Felton, a graphic designer, pursued that kind of understanding by charting his own life's activities. In a nod to corporate reports, Felton created the "Feltron" reports (he added an "r" to make his own name sound more like a corporation), in which he meticulously recorded and visualized data from his life, such as his drinking and eating habits, his travels, and his phone calls.

Felton's charts and graphs are beautiful, charming, and excellent examples of how data and data visualizations are compelling windows into our lives.

In this lecture, you are going to learn how to use the principles of data visualization, and data visualization's staple forms of charts and graphs, to turn jumbles of numbers into something clear, compelling, and maybe even beautiful.

In this lecture, you can expect to:

Explore the principles of data visualization.
Learn how to encode data into visual properties.
Analyze the conventions of common graphical forms.
Learn how to deconstruct the anatomy of the chart.
Review how to create graphs and charts in Adobe Illustrator.
Learn Edward Tufte's guidelines for chart design.

 

 

 

 

 

 

 

 

 

Rescuing Numbers

 

Numbers are most commonly visualized with graphs and charts, and they remain the staple of infographics. Numbers by themselves are often difficult to understand, and trends remain hidden.

If you look at a table of numbers, the rows and numbers will likely look like just a jumble of characters.

Even using statistical methods to crunch the numbers to give summary results can fail to discover patterns or reveal differences between different data.

A famous example is known as Anscombe's Quartet, seen here below:

The Quartet is a series of four datasets (labeled here in Roman numerals, I-IV), each with an x and y data point.

The four datasets are nearly identical in these statistical properties: the mean (or average) of the x and y values, their variance, and the correlation between the two variables. Making these calculations tells you nothing about the data.

This is where data visualization comes to the rescue. Actually plotting the data gives remarkably different appearing graphs.

When you plot the x value on the horizontal axis and the y value on the vertical axis (we call that a scatter plot, which you'll learn about in this lesson), the four datasets show very different patterns.

Data visualization is all about seeing patterns in the data, and graphs and charts are the graphical forms for doing so.

Encoding Data into Visual Properties

This is the first illustration from Cleveland and McGill's groundbreaking study on graphical perception, detailing the basic visual encodings we use to organize and perceive information.

Visualization of data involves, in essence, the encoding of numeric, quantitative information into visual properties by the designer. The variation of numbers is translated into a darker or lighter color, a longer or shorter rectangle, or a smaller or larger circle, for example.

Readers, in turn, decode the information from the graphics. Which visual property works best for this process of encoding and decoding?

The answer lies in an oft-cited academic study in information design by Cleveland and McGill on graphical perception. That study, published in 1984 in the Journal of the American Statistical Association, is "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods."

Cleveland and McGill studied how accurately people were able to decode the information from various kinds of visual encodings (shapes, colors, sizes). Their ranking of the graphical encodings in order of most accurate to least accurate were as follows:

 
 
  1. Position
  2. Length
  3. Angle
  4. Area
  5. Density, saturation, and color hue
 
 

Let's look at each of these one by one.

Position

Readers who looked at graphs that used position to encode data, like the one below, were able to make the most accurate comparisons. Imagine a sample of four data points, A, B, C, and D, where A=21, B=32, C=33, and D=14. The four points are arranged vertically by their numerical value. It's quite easy to see that C is higher than B, A, and D.

Length

Now look at the same data using length as the visual property. The length of each rectangle represents the numerical value of A, B, C, and D.

Again, it's quick and easy to see that C is larger than the others, though it may take a little longer, and gauging exactly how much larger C is over B may be a little harder than in the previous graph that encodes the data by position.

Angles

The next graph encodes the data with angles.

The angles that make up the slices for B and C look very similar, and it is very difficult to see which one is larger than the other.

Area

This graph encodes the four data points by the area of a circle. Bubbles are actually very common in infographics because, as design elements, they are dynamic and look attractive.

However, from an information design point of view, they are notoriously ineffective at allowing readers to make accurate comparisons. How much larger is C than B below?

The answer is that it's very hard to tell! If you wanted to answer that question, you would have much better luck by presenting the data contrasts through position or length.

Hue, Saturation, and Density

Finally, the graphs that were found to be the least effective (in other words, the readers had the most trouble making accurate comparisons) used hue, saturation, and density—all of which are components of color. Hue is the color (red, yellow, green), saturation is how dull or vivid a color appears, and density can be thought of value, which is how dark or light a color is.

The following graphs use values of gray to encode the numerical information in A, B, C, and D. As you can see, distinguishing fine differences in gray are impossible, as is making comparisons with color.

Color has no natural order, meaning we don't intuitively think of one color as being greater or larger than another color. In the graphic below, is red larger than blue? Who knows? The only context where we accept an ordering of colors is in a weather map, where convention dictates that color corresponds to temperature.

Cleveland and McGill's results aren't meant to be a definitive answer on what to use and what not to use in your infographics. Their conclusions don't mean that all your graphs should use position to encode your data. They note in their study:

The ordering does not result in a precise prescription for displaying data but rather is a framework within which to work.

The operative word here is "framework." Knowing what works well and what doesn't work as well helps us deal with a variety of data and balance different goals for different audiences.

The bottom line is that you have to know these basic rules of information design before you can bend or break them.

Graphical Forms

Now that we've spent some time learning about what Cleveland and McGill call the "elementary perceptual tasks"—the basic kinds of visual properties used for encoding data—let's look at how those translate into forms of graphs and charts.

Bar/Column Chart

The most common graph is the bar or column chart, which uses length to encode information. Bar charts are best for comparing values across categories.

For example, if you wanted to know the differences in salary between college majors, then a bar chart would be the most suitable. Each bar represents a different major, and the height of the column or length of the bar represents their salary. The following bar graphs are from National Public Radio.

View online at the NPR Web site.

Line Chart

If your data involves changes across time (we say that the data is continuous, rather than categorical), then use a line chart. A line chart plots each data point along the vertical axis and connects them with a line.

The following chart shows the gradual increase in median monthly rents in Brooklyn, from 2008-2014, showing how Brooklyn rents have slowly crept up to Manhattan rents. Line charts effectively show trends over time because the connected lines trace the ups and downs in the data.

Scatter Plot

When you want to show a relationship between two variables, then use a scatter plot. A scatter plot places each data point according to the first variable along the horizontal axis and the second variable along the vertical axis.

The pattern of dots reveals the relationship between the two variables. For example, the following graph plots gun ownership levels with suicide rates. Each dot represents a state.

Image credit: Washington Post

The mass of dots form a general upward pattern, which suggests a correlation between gun ownership and suicide. As gun ownership increases, so do suicide rates.

Keep in mind, however, that correlation does not mean causation. Umbrella sales are correlated with traffic accidents, but that doesn't mean that buying an umbrella causes an accident.

Bubble Plot

Hans Rosling's bubble plot presentations turn data visualization into a spectator sport. (Image credit: TED)

Scatter plots are great for showing relationships between two variables, but if you want to see the relationships between more than two variables, you'll have to use a graph called a bubble plot.

Bubble plots are like scatter plots, but they encode more data dimensions using the size of the circles and sometimes with the color of the circles.

Bubble plots have been recently popularized by Hans Rosling, whose dynamic presentations at TED talks have put a spotlight on data visualizations, and in particular, animated bubble plots.

In Rosling's chart below, individual countries are plotted. Each country's health spending per person is plotted on the horizontal axis, and their life expectancy is plotted on the vertical axis.

The question this graph is asking is, "do countries that spend more on health care have populations that live longer?" A third variable is also included. The area of each bubble reflects the GDP (Gross Domestic Product) of each country—the larger the bubble, the richer the country.

As you might guess, the richer countries spend more on health care (the larger bubbles are on the right-hand side of the graph). And in general, the more money spent on health care, the longer the life expectancy (the bubbles on the right-hand side of the graph are also higher up on the graph).

There are exceptions to this general trend, however, and the visualization reveals them. Exceptions are known as outliers, because they lie clearly and visually outside the normal spread of data, and they make for interesting stories. What are these outliers doing differently that bucks the trend?

For example, Bangladesh is a relatively poor country that doesn't spend a whole lot on health care, yet their life expectancy is pretty high, outpacing many other countries that spend much more, like South Africa. Good visualizations clarify complex data and allow for these kinds of further explorations.




 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If you want Illustrator to recognize a number as a label rather than data to plot (for example, if you want the year 2014 to be a label), then use quotations around the number (as in "2014").

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Stacked Bars or Lines

Normally bars and lines are positioned with their baselines positioned at zero. Measuring all your bars or lines from the same baseline helps you compare them.

However, when you have several categories that are parts of a larger whole, then stacking them on top of each other lets you see the contributions of each category to the whole.

For example, in the following graph from the New York Times, the columns show total deportations. Stacked columns show the different types of deportations. The second graph shows gray columns, representing deportations from inside the U.S., stacked on top of the tan columns, which show deportations from the border.

View infographic and the article on the New York Times Web site.

Another type of stacked bar is the stacked percentage chart, which shows the columns as percentages of 100%, highlighting their proportionality to the whole rather than their actual numbers. The three kinds of bar or column charts are shown below.

The clustered column, stacked column, and stacked percentage column with the same data, explained on Visual.ly, an online infographics site.

You can do the same kind of stacking to better show proportions with line charts as well. When you stack lines on top of each other, color the area below each of the lines.

The following stacked line graph (also called a stacked area graph) shows the different sectors where the federal government spends its money.

Since the data is showing spending over time, a line graph is a good choice. And since it's important to see how spending priorities have changed (note how the proportion of defense spending has decreased while Medicare and Social Security have increased), then stacking the lines helps see those changing proportions.

View this stacked area graph online NPR.

Pie Charts

Though many information designers condemn pie charts because of their low accuracy with readers, they remain very popular because of readers' familiarity with them.

Pie charts show different parts to the whole, and each part is represented by a different slice of a circle. The following pie chart from the Pew Research Center shows the proportion of working mothers and stay-at-home mothers, and how those proportions have changed from 1970 to 2012.

Visit the Eager Eyes post on "Understanding Pie Charts" for further discussion about the merits and pitfalls of using pie charts.

Anatomy of the Chart

Each kind of chart encodes the information with a different visual property, but every kind of chart has a set of common features. Let's look at the anatomy of a typical chart.

Chart adapted from www.asymco.com

Charts are usually put on a set of axes. The horizontal line is the x-axis and the vertical line is the y-axis. Tick marks show the spacings between units on the axes.

Sometimes horizontal and vertical lines extend from the tick marks into the graph area—we call those grid lines. Both axes should be labeled and the tick marks should be labeled.

The whole graphic should have a title or headline. If there are color or symbol encodings, a legend or direct labels should explain to the reader what those encodings mean.



Creating Charts in Illustrator

Now that you know the anatomy of a chart, the different kinds of charts, and the basic concepts of visual encoding, it's time to actually make a chart. Follow along!

There are many different software programs that can generate charts from data, but you'll use Adobe Illustrator, which provides more opportunity for a designer to customize the chart and integrate them into larger infographics.

Choosing the Graph Type

The Tools panel in Adobe Illustrator contains the Graph tool, which allows you to quickly generate various kinds of graphs.

Click and hold the small triangle at the lower-right corner of the Graph tool icon to reveal the graph tool options.

Decide which graph would best serve your data and select that graph type. In this example, choose the Column Graph tool (the first option).

Now define the size of your graph. You can either click and drag a rectangular area on your artboard, or you can simply click and enter a width and height dimensions in the dialog box that appears.

Illustrator automatically creates a graph with a single data point as its default value. A window appears that looks like a spreadsheet, where you enter your data points. The only data point is 1.0, in the top left corner of the spreadsheet.

Adobe Illustrator's graph generator

At this point, you can enter data in each cell of the spreadsheet, but it's much easier and more accurate to copy and paste the data from a spreadsheet program like Microsoft Excel.

Adding Data

We'll provide the data for this example. The data is from the Bureau of Labor Statistics for the average annual salaries of select designers.

Copy the two columns and seven rows of data, and in Illustrator, paste it in the top left corner of the spreadsheet window. When you're done, click the checkmark icon to apply.

Close the spreadsheet window to see the resulting graph.

It's not pretty, but it's a good start.

Obviously, we have to do some more editing to get the text legible, but we've got a graph! Illustrator plots all the values for each occupation as columns. The vertical and horizontal axis labels are positioned automatically.

Editing the Graph Type

Now, let's do some clean up. The names of the professions collide with each other, which make them impossible to read. Perhaps a bar graph, where the rectangles run from left to right, would work better than a column graph.

You can change the graph type at any time. Right-click (Windows) or Control-click (Mac) your graph and choose Type.

In the Graph Type dialog box that appears, choose the Column Graph type and click OK.

Select a horizontal bar graph.

Illustrator changes your column graph into a bar graph. The names of the professions now run horizontally and are easier to read, though we'll still need to do some editing for the top of the graph.

If, at any point in time, you need to edit the data itself, you can Right-click/Control-click your graph and choose Data.

The spreadsheet window will appear again, and you can copy and paste new data or edit the existing data.

Graph Options

Now let's explore some of the graph options. Right-click/Control-click your graph and choose Type again.

The Graph Type dialog box appears, which gives you options for the placement of the legend, the width and spacing of the bars, the position of the axes, the appearance of the tick marks and grid lines, and other preferences.

If it's not already selected, choose On Bottom Side from the Value Axis dropdown menu. Deselect the Add Legend Across Top option, and enter 100% or Bar Width and 90% for Cluster Width. (Illustrator keeps the same settings from the most recent graph that was created, so your initial settings may be different).

Click OK. Your graph changes, reflecting your settings in the Graph Type dialog box.

Now things are looking legible! There are a few more adjustments you can make. Right-click/Control-click your graph and choose Type again. Change the top pulldown menu to Value Axis. Choose Full Width for the Length of the Tick Marks, enter 3 to draw more subdivisions, and enter a dollar sign ($) in the Prefix field. Click OK.

The graph now adds a dollar sign before each of the labels on the horizontal axis, so it's clear that the graph is about money. The tick marks extend the full width of the graph.

Customization

The final graph is a group, which Illustrator treats as a single object. To make any modifications, you can use the Direct Selection tool.

Click an element of your graph to select it. You can use the menu option, Select > Same to choose other elements that share similar properties.

For example, choose one of the black bars. Now choose Select > Same > Fill Color. All the black bars, including the one in the legend, become selected. You can also Alt-Click (Windows) or Option-click (Mac) the same element multiple times to select similar elements.

Select all the black rectangles, and change the fill color to an attractive green color as a reference to the money theme of the graph.

Continue making customizations to polish your graph. Select the grid lines and choose a light gray for the stroke color. Add a comma to each of the labels on the horizontal axis to separate the thousands, and change the font family and size, if you wish.

If you want to delete a graph element, you have to ungroup the group, which you can do by pressing Shift-Ctrl-G (Windows) or Shift+Command+G (Mac). Unfortunately, once you ungroup the graph group, you can no longer make global edits to the data or change graph style options. You've been warned!

The final graph example has been ungrouped, the legend has been deleted, and a title added. Congratulations, you've made your first graph!

Read more about how to generate and customize charts in Illustrator from FlowingData and consult Adobe's documentation on the Graph Tool.

Tufte's Principles for Charts and Graphs

Making charts and graphs in Illustrator is a straightforward and fairly simple process. But making effective charts is a whole different story.

Edward Tufte, in his landmark book, The Visual Display of Quantitative Information, lays out essential principles of graphical integrity that are the foundation for modern information design. Let's examine his most important ideas, starting with "data-ink ratio."

Data-Ink Ratio

Tufte's defines his concept of "data-ink ratio" as the proportion of marks that are used to present actual data compared to the marks used to show the entire graph.

Tufte urges designers to maximize the data-ink ratio so that, essentially, we see more data and less interface or framing elements, such as grid lines, redundant labels, and extraneous borders. Tufte's data-ink ratio is about data density and striving for clean, minimalist graphics.

This graph has a low data-ink ratio because much of the ink is used in the grid lines and as the background color.

Low data-ink ratio.

This graph of the same data has been edited to demonstrate a high data-ink ratio. The ink that we see is used mostly for the data elements.

High Data-Ink ratio. Read the full article.

Chart Junk

A concept related to the data-ink ratio is chart junk, which refers to any extra decoration on a chart. Tufte is an ardent critic of any kind of decoration because it doesn't contribute to the communication of the data. Chart junk is basically extra ink.

Chart junk includes any frivolous elements like illustrations, ornamental fonts, or fancy framing. Chart junk also includes deliberate distortions to the graphic for the purposes of decoration or style. For example, the bars of a chart that are made to appear 3D may look cool, but they make it difficult to compare lengths.

The following chart is a classic example of chart junk. The monster gobbling the chart is rendered with its teeth as the data columns. Although a memorable illustration, as a data visualization, it emphasizes style over data, and the distortions counter the point of using lengths to compare information.

An example of chart junk.

Small Multiples

Tufte champions the idea of small multiples, which are small, similar graphs placed in a series. This visualization, from the Chronicle of Higher Education, is an example of how small multiples make it easy to compare spending on higher education by the state (gold) or by the student (blue).

By putting each state side by side, readers can quickly spot the differences and variations between graphs.

An example of small multiples from the Chronicle of Higher Education.

Humans are attuned to visual differences and to any violations in a pattern. It's how our brains are wired, and it allowed our ancestors to survive in the wild. Using small multiples takes advantage of that natural way of seeing.

Successful Data Visualization Principles

Edward Tufte summarized his ideas on effective data visualization in the following list of principles:

 
 
  1. Have a properly chosen format and design

  2. Use words, numbers and drawing together

  3. Reflect a balance, a proportion, a sense of relevant scale

  4. Display an accessible complexity of detail

  5. Often have a narrative quality, a story to tell about the data

  6. Draw in a professional manner, taking care with the technical details of production

  7. Avoid content-free decoration, including chart junk
 
 

These are wise words that should guide your thinking and practice of data visualization.

In the next exercise, you'll have a chance to give form to these ideas and explore the different ways charts and graphs can communicate data.

     
Learn how to apply the concept of visual hierarchy to information design.
Explore different approaches to visual hierarchy in infographics.
Learn how to use abstraction to organize visual information.
Explore iconography in information design.
Study examples of process instruction design.
 

Discussion
Share your thoughts and opinions with other students at the Discussion Boards.

Exercise
Create a chart that organizes complex information into a well-balanced, professional, accessible story.