The Visual Display of Quantitative Information

Edward R. Tufte

Mentioned 7

Graphical practice. Theory of data graphics.

More on Amazon.com

Mentioned in questions and answers.

Histograms and scatterplots are great methods of visualizing data and the relationship between variables, but recently I have been wondering about what visualization techniques I am missing. What do you think is the most underused type of plot?

Answers should:

  1. Not be very commonly used in practice.
  2. Be understandable without a great deal of background discussion.
  3. Be applicable in many common situations.
  4. Include reproducible code to create an example (preferably in R). A linked image would be nice.

Check out Edward Tufte's work and especially this book

You can also try and catch his travelling presentation. It's quite good and includes a bundle of four of his books. (i swear i don't own his publisher's stock!)

By the way, i like his sparkline data visualization technique. Surprise! Google's already written it and put it out on Google Code

I am a devoted R (r-project.org) user, and love infographics.

I just came across this article: http://www.noupe.com/design/fantastic-information-architecture-resources.html

Giving a long list of resources for information designers.

And it raised in me the desire to do more beautiful (not just informative) R plots.

Do you have any suggestion/resources on how to make this leap?

What books/software/skills do I need to have/develop in order to be able to make beautiful infographics?

Here's a list of resources that I would suggest:

Tufte's books are really excellent, although my favorite is actually his second book: Envisioning Information. Separately, I always found the periodic table of visualization methods to be entertaining. Ross Ihaka also taught a course on this subject in the past.

For R, learn ggplot2. The learnr.wordpress.com blog is an excellent resource for this. You might consider the ggplot book and the original Grammar of Graphics book.

Here's another useful article from the same site that you linked in your question: Data Visualization: Modern Approaches.

Some good blogs on the subject:

In some cases, you might want to do your data manipulation in R, but create the visualization with another tool (see, for instance, this list). Here are some of the best tools that I have found over the years:

Lastly, an interesting open visualization platform is available at many eyes.

If the R side of your skills is pretty good, then you'll definitely want to start reading Edward Tufte's books, particularly The Visual Display of Quantitative Information and Beautiful Evidence, both of which provide excellent insights into how to present data effectively and efficiently.

You should be somewhat forewarned that everyone has a different idea of "beautiful," however. Tufte is a big believer in maximizing a quantity he calls the "data-ink ratio": how much of the page's ink is dedicated to data instead of what he calls "chartjunk". This causes his work to have a sleek, minimalist oeuvre that certainly makes it easier to digest everything but that some people may find too utilitarian. But for Tufte, function and form are pretty close to one thing: the more it helps you, the more beautiful and elegant it is.

I am making the distinction between User Interaction Experience and pure User Interface (UI) design here, even though there is often a correspondence. You can have great user interaction even with a ‘boring’ grey interface, (note that a boring interface is not a requirement!).

My bookshelf contains the following:

What other books or resources would you add to this list?

I have a matrix of numbers. A DataGridView is used for displaying it. I need an algorithm which for each cell calculates its BackColor, based on the value of the cell. So, when looking on the DataGridView I will be able to see visually a whole distribution of values. So, cells with close values will have similar colors.

There's a lot of literature on that (e.g. Tufte, Ware, Bertin)

Basically, you have to decide what you want to see:

  • If you want to recognize similar values, a colourmap (like the one @belisarius showed) is good. Looking at the color distribution, you can easily see "ok, the point in the middle has about the same value as the one at the bottom right". But looking at the colors, you can't see which value is higher than the other, or where the maximal/minimal values are. Hue has no intuitive order.
  • If you want to display order information, a greyscale is ideal. enter image description here

    Here you can immediately see the local and global maximal/minimal values. Downside: yon can only compare brightness values very well, when they're close together. You can't visually compare the color in the center and the color at the bottom right. Lots of optical illusions are based on that.

  • If you want to highlight small differences between adjacent cells, using a derivative for the brightness might be better (similar to topographic maps)

Finally, if you're target users are average males, you'll have to think about color-blind people. Even non-colorblind people are a lot better at comparing lengths than we are at comparing colors.

I have some time series data that represents cumulative sums of a number of data series over time, basically money flowing in and out of a market. Some are positive and some are negative, but the different strands of data do of course sum to the money flow for the market as a whole. I have been mulling over how to visualise this using ggplot and so far small multiples seem to be the clearest way to go - see below for image and code.

Does anybody have any other suggestions for a striking visualisation with such data, using R and (preferably) ggplot? I have tried using geom_area but that gets very messy and I can't seem to work out a way to have each data series shown clearly, even after playing with the stack keyword.

small multiples

require(ggplot2)
require(scales)
require(gridExtra)

mymelt <- structure(list(mydate = structure(c(15340, 15340, 15340, 15340,
15340, 15340, 15340, 15340, 15340, 15340, 15340, 15340, 15371,
15371, 15371, 15371, 15371, 15371, 15371, 15371, 15371, 15371,
15371, 15371, 15400, 15400, 15400, 15400, 15400, 15400, 15400,
15400, 15400, 15400, 15400, 15400, 15431, 15431, 15431, 15431,
15431, 15431, 15431, 15431, 15431, 15431, 15431, 15431, 15461,
15461, 15461, 15461, 15461, 15461, 15461, 15461, 15461, 15461,
15461, 15461, 15492, 15492, 15492, 15492, 15492, 15492, 15492,
15492, 15492, 15492, 15492, 15492, 15522, 15522, 15522, 15522,
15522, 15522, 15522, 15522, 15522, 15522, 15522, 15522, 15553,
15553, 15553, 15553, 15553, 15553, 15553, 15553, 15553, 15553,
15553, 15553), class = "Date"), variable = c("b", "bc", "f",
"in", "it", "l", "of", "o", "pr", "s", "total", "tr", "b", "bc",
"f", "in", "it", "l", "of", "o", "pr", "s", "total", "tr", "b",
"bc", "f", "in", "it", "l", "of", "o", "pr", "s", "total", "tr",
"b", "bc", "f", "in", "it", "l", "of", "o", "pr", "s", "total",
"tr", "b", "bc", "f", "in", "it", "l", "of", "o", "pr", "s",
"total", "tr", "b", "bc", "f", "in", "it", "l", "of", "o", "pr",
"s", "total", "tr", "b", "bc", "f", "in", "it", "l", "of", "o",
"pr", "s", "total", "tr", "b", "bc", "f", "in", "it", "l", "of",
"o", "pr", "s", "total", "tr"), value = c(-23, 6.90000000000001,
459.799999999999, -403.6, -56.1, -95, -13.8, 32.6, 121.5, -15.7,
26.2000000000007, 12.5, -25.1, 238.3, 1047.2, -803.2, -151.5,
-260.5, -59.6, -93.8, 461.5, -37.7, 26.7999999999993, -288.8,
-46.4, 249, 1289.8, -783.2, -188.1, -414.9, -77.7, -61, 928.4,
-36.8, 17.4000000000015, -841.7, -46.5, 276.2, 1384.8, -541.1,
-71.8999999999999, -433.3, -61.3, -28.3, 494.699999999999, -23.4,
-14.5999999999985, -964.5, -46.1, 376.2, 1020.1, -119.4, 56.8000000000001,
-447.7, -9.50000000000001, 14.2, -9.20000000000164, 2.5, -42.7999999999993,
-880.6, -52.9, 345.5, 892.599999999999, -241.8, 144.3, -428.2,
-3.30000000000001, 91.9, -294.800000000002, -5.19999999999999,
-42.1999999999971, -490.1, -64.5, 379.7, 679.299999999999, -143.1,
185.9, -419.8, -4.30000000000001, 182.4, -421.900000000002, 1.80000000000001,
-59.8999999999978, -435.2, -80.2, 422.2, 645.499999999998, -391.4,
76.6000000000001, -387.4, -1.70000000000001, 211.2, -131.500000000002,
-10.6, -40.8999999999978, -393.6), fill = c("#A4D3EE80", "#A478AB80",
"#01AEF080", "#8DC73F80", "#F8931D80", "#FFAAAA80", "#8C8C8C",
"#D38D5F80", "#23238E80", "#77B9B780", "#C8373780", "#EEDD8280",
"#A4D3EE80", "#A478AB80", "#01AEF080", "#8DC73F80", "#F8931D80",
"#FFAAAA80", "#8C8C8C", "#D38D5F80", "#23238E80", "#77B9B780",
"#C8373780", "#EEDD8280", "#A4D3EE80", "#A478AB80", "#01AEF080",
"#8DC73F80", "#F8931D80", "#FFAAAA80", "#8C8C8C", "#D38D5F80",
"#23238E80", "#77B9B780", "#C8373780", "#EEDD8280", "#A4D3EE80",
"#A478AB80", "#01AEF080", "#8DC73F80", "#F8931D80", "#FFAAAA80",
"#8C8C8C", "#D38D5F80", "#23238E80", "#77B9B780", "#C8373780",
"#EEDD8280", "#A4D3EE80", "#A478AB80", "#01AEF080", "#8DC73F80",
"#F8931D80", "#FFAAAA80", "#8C8C8C", "#D38D5F80", "#23238E80",
"#77B9B780", "#C8373780", "#EEDD8280", "#A4D3EE80", "#A478AB80",
"#01AEF080", "#8DC73F80", "#F8931D80", "#FFAAAA80", "#8C8C8C",
"#D38D5F80", "#23238E80", "#77B9B780", "#C8373780", "#EEDD8280",
"#A4D3EE80", "#A478AB80", "#01AEF080", "#8DC73F80", "#F8931D80",
"#FFAAAA80", "#8C8C8C", "#D38D5F80", "#23238E80", "#77B9B780",
"#C8373780", "#EEDD8280", "#A4D3EE80", "#A478AB80", "#01AEF080",
"#8DC73F80", "#F8931D80", "#FFAAAA80", "#8C8C8C", "#D38D5F80",
"#23238E80", "#77B9B780", "#C8373780", "#EEDD8280")), .Names = c("mydate",
"variable", "value", "fill"), row.names = c(NA, 96L), class = "data.frame")

myvals <- mymelt[mymelt$mydate == mymelt$mydate[nrow(mymelt)],] ## last date in mymelt should always be same as plotenddate as we subset earlier
mymelt <- within(mymelt, variable <- factor(variable, as.character(myvals[order(myvals$value, decreasing = T),]$variable), ordered = TRUE))

p <- ggplot(mymelt, aes(x = mydate, y = value)) +
     geom_area(aes(fill = variable), position = "stack") +
     facet_wrap(~ variable, ncol = 4) +
     theme(axis.text.x = element_text(size = 8, angle = 90, colour = "grey50")) +
     theme()
print(p)

Normally I would advise you to stack the panels horizontally so that every time series would have common x-axis. However, that's not going to work if you don't want to change the scales as @GavinSimpson suggested. In this case it's probably better to place the panels next to each other, but removing some unnecessary data-ink (see Tufte, 2001).

Generally, you don't need legend, because panel name already tells you the name of your variables. That also removes the need for rainbow colours. I would also avoid using geom_area and use geom_line instead - your effects still stands out without overfilling the plot with the heavy geometric area. After that there are small fine details - you remove minor grid to decrease grid density, you change axis text size, you decrease the width of the geom_line. You change theme to theme_bw to remove all the gray crap. Finally, it helps if the height of the plot will be more or less 50% of its width in this specific case. The only issue with this solution is that the date labels on x axis are quite small.

p <- ggplot(mymelt, aes(x = mydate, y = value)) +
  geom_line(lwd=0.3) +
  facet_grid(. ~ variable) +
  theme_bw() +
  theme(axis.text.x = element_text(size = 5, angle = 90),
        axis.text.y = element_text(size = 8),
        axis.title.x = element_text(vjust = 0),
        axis.ticks = element_blank(), 
        panel.grid.minor = element_blank())
print(p)
ggsave(plot=p, filename="plot.png", width = 8, height = 4)   

enter image description here

Is there any way to create an emboss effect for a tablix, or any other "cool" effects for a tablix in SSRS 2008 R2?

Emboss is not available as an out of the box option with SSRS. IF you really need that effect, you can simulate it by placing rectangles in a table, and then placing a textbox in those. You can then shade different sides of the textbox different colors.

I personally would avoid embossed looking things. I think they usually look bad, especially when printed. If you want to look cool, and keep things easy in SSRS, I recommend these things to go in a more "metro" direction.

Use Color
Choose and use a consistent color palette for a project, across multiple reports. If you aren't a designer-type, go to the company website, and sample some of the colors there.

Typically you can find:

  • a dark color to use for title text
  • a lighter color for subheadings
  • a medium color for occasional accent, such as behind the title, or a line below the title
  • a much lighter color (used in sidebars on websites) that you can set as the background to your table headers.

Pay attention to Type
Use the same font for all pieces of the report. You might get away with using a different font for the title, but usually it will look bad. Use Italics and Bold as needed. Italics type should be used for incidental data details, that are not the focus of the report. Bold should be used for titles, sub headings, and a key data element, especially if the data is more than one line long.

Consistent spacing
Report elements that are close in size to each other should be the exact same size. This means that you won't be able to cram as much data in, but the report will be much more pleasant and professional to look at. I try to use either a half inch or .75 inch grid and make everything multiples of that size. BIDS doesn't really support this, so this can involve some typing in of sizes.

If there are graphics, line those up with the grid as much as possible.

If you want to get serious about this, start reading books by Edward Tufte. In particular, The Visual Display of Quantitative Information is often considered a classic of information design.

Reports that are developed with this "less is more" attitude will look fresh and usable longer than reports full of distractions, aka Chart Junk.

One of my projects needs to show users where they rank in certain calculations. I inherited the graph structure from the previous programmer and had to leave it alone while I worked on other parts of the site.

It's time to make the graphs more meaningful, so I'm looking for books/websites/etc about graphs. (Not graph theory!) Charts that convey comparisons at a glance.

Everyone suggests The Visual Display of Quantitative Information by Edward Tufte and that's spot on for what I'm looking for, so anything related to that would be great.

Naturally, personal experience about what to do or not would be helpful as well.

I found Stephen Few's book "Show Me the Numbers" very helpful.