Tutorial 6: Visualizations

DPI R Bootcamp

Jared Knowles

Overview

In this lesson we hope to learn:

Graphics Matter

Visualization Philosophy

Our data for this exercise

Base graphics

hist(df$readSS)
plot of chunk basehist

Base graphics are simple

Base graphics has some limitations

plot(df$readSS, df$mathSS)
plot of chunk basescatter
plot(df$readSS, df$mathSS)
lines(lowess(df$readSS ~ df$mathSS), col = "red")
plot of chunk scatterbaseline

Basic Plot

library(ggplot2)
qplot(readSS, mathSS, data = df)
plot of chunk plot1

Now, how?

What are the basic plot types?

plot of chunk ggplot2plottypes

What are some advanced plot types?

plot of chunk ggplot2plottypesadv

Your Turn: What are some examples of interesting visualizations we could use?

Understanding Grammar of Graphics through A Scatterplot

qplot(readSS, mathSS, data = df, alpha = I(0.3)) + theme_dpi()
plot of chunk smallscatter

Grammar of Graphics

Geoms

Geom Quiz

qplot(mathSS, readSS, data = df) + theme_dpi()
plot of chunk geomquiz
qplot(mathSS, data = df) + theme_dpi()
plot of chunk geomquiz
qplot(factor(grade), mathSS, data = df, geom = "line", group = stuid, alpha = I(0.2)) + 
    theme_dpi()
plot of chunk geomquiz

Aesthetics

ggplot(df, aes(x = readSS, y = mathSS)) + geom_point()
plot of chunk extended
# Identical to: qplot(readSS,mathSS,data=df)

Examples of Aesthetics

data(mpg)
qplot(displ, cty, data = mpg) + theme_dpi()
plot of chunk plot2
qplot(displ, cty, data = mpg, size = cyl) + theme_dpi()
plot of chunk plot2
qplot(displ, cty, data = mpg, shape = drv, size = I(3)) + theme_dpi()
plot of chunk plot2
qplot(displ, cty, data = mpg, color = class) + theme_dpi()
plot of chunk plot2

Experiment with Aesthetics

Draw some plots with different aesthetics using our student level dataset

Some Considerations with Aesthetics

Aesthetics Considerations (ordered)

qplot(mathSS, readSS, data = df[1:100, ], size = race, alpha = I(0.8)) + theme_dpi()
plot of chunk racesizemapping

Another Aesthetics Concern (ordered)

df$proflvl2 <- factor(df$proflvl, levels = c("advanced", "basic", "proficient", 
    "below basic"))
df$proflvl2 <- ordered(df$proflvl2)
qplot(mathSS, readSS, data = df[1:100, ], color = proflvl2, size = I(3)) + scale_color_brewer(type = "seq") + 
    theme_dpi()
plot of chunk proflvlcolor

Aesthetics Concern 2 (discrete and continuous)

qplot(factor(grade), readSS, data = df[1:100, ], color = mathSS, geom = "jitter", 
    size = I(3.2)) + theme_dpi()
plot of chunk badcontinuousmapping

Aesthetics Concern 2

qplot(factor(grade), readSS, data = df[1:100, ], color = dist, geom = "jitter", 
    size = I(3.2)) + theme_dpi()
plot of chunk baddiscretemap

Thinking about Aesthetics

Aesthetic Discrete Continuous
Color Disparate colors Sequential or divergent colors
Size Unique siz e for each value linear or logrithmic mapping to radius of value
Shape A shape fo r each value does not make sense

Another is ordered v. unordered

Aesthetic Ordered Unordered
Color Sequential or divergent colors Rainbow
Size Increasing or decreasing radius does not make sense
Shape **does not make sense** A shape for each value

Scales

plot of chunk scaleexample

Scales Caveats

# Scales also apply to color and fill

Layers

qplot(readSS, mathSS, data = df) + facet_wrap(~grade) + theme_dpi(base_size = 12) + 
    geom_smooth(method = "lm", se = FALSE, size = I(1.2))
plot of chunk smallfacets

We can also facet across more attributes

qplot(readSS, mathSS, data = df) + facet_grid(ell ~ grade) + theme_dpi(base_size = 12) + 
    geom_smooth(method = "lm", se = FALSE, size = I(1.2))
plot of chunk smallfacets2

A few pro tips

Colors in R

plot of chunk ggplot2colors

The R Colorspace is Huge

colwheel <- "https://dl.dropbox.com/u/1811289/colorwheel.R"
dropbox_source(colwheel)
col.wheel("magenta", nearby = 2)
plot of chunk colorwheel
##  [1] "plum"        "violet"      "darkmagenta" "magenta4"    "magenta3"   
##  [6] "magenta2"    "magenta"     "magenta1"    "orchid4"     "orchid"
col.wheel("orange", nearby = 2)
plot of chunk colorwheel
##  [1] "salmon1"       "darksalmon"    "orangered4"    "orangered3"   
##  [5] "coral"         "orangered2"    "orangered"     "orangered1"   
##  [9] "lightsalmon2"  "lightsalmon"   "peru"          "tan3"         
## [13] "darkorange2"   "darkorange4"   "darkorange3"   "darkorange1"  
## [17] "linen"         "bisque3"       "bisque1"       "bisque2"      
## [21] "darkorange"    "antiquewhite3" "antiquewhite1" "papayawhip"   
## [25] "moccasin"      "orange2"       "orange"        "orange1"      
## [29] "orange4"       "wheat4"        "orange3"       "wheat"        
## [33] "oldlace"
col.wheel("brown", nearby = 2)
plot of chunk colorwheel
##  [1] "snow1"       "snow2"       "rosybrown"   "rosybrown1"  "rosybrown2" 
##  [6] "rosybrown3"  "rosybrown4"  "lightcoral"  "indianred"   "indianred1" 
## [11] "indianred3"  "brown"       "brown4"      "brown1"      "brown3"     
## [16] "brown2"      "firebrick"   "firebrick1"  "chocolate"   "chocolate4" 
## [21] "saddlebrown" "seashell3"   "seashell2"   "seashell4"   "sandybrown" 
## [26] "peachpuff2"  "peachpuff3"

Some Practical Advice

Above and Beyond

plot of chunk premier

Scary R Code

library(grid)
p1 <- qplot(readSS, ..density.., data = df, fill = race, position = "fill", 
    geom = "density") + scale_fill_brewer(type = "qual", palette = 2)

p2 <- qplot(readSS, ..fill.., data = df, fill = race, position = "fill", geom = "density") + 
    scale_fill_brewer(type = "qual", palette = 2) + ylim(c(0, 1)) + theme_bw() + 
    opts(legend.position = "none", axis.text.x = theme_blank(), axis.text.y = theme_blank(), 
        axis.ticks = theme_blank(), panel.margin = unit(0, "lines")) + ylab("") + 
    xlab("")

vp <- viewport(x = unit(0.65, "npc"), y = unit(0.73, "npc"), width = unit(0.2, 
    "npc"), height = unit(0.2, "npc"))
print(p1)
print(p2, vp = vp)

Exercises

  1. Embed one plot in another plot in R using two different data elements from our data set. For example, plot a histogram of readSS inside a scatterplot of readSS and mathSS

  2. Explore some examples on the ggplot2 website. What are some ways to overlay more than 3 dimensions of data in a single plot?

  3. What types of data work best for what types of visualizations?

References

  1. Hadley Wickham’s JSM 2012 Presentation
  2. Hadley Wickam’s ggplot2 Intro Presentation
  3. The ggplot2 Homepage
  4. ggplot2 Documentation
  5. Quick R: Basic Graphs
  6. Quick R: Advanced Graphs

Session Info

It is good to include the session info, e.g. this document is produced with knitr version 0.8. Here is my session info:

print(sessionInfo(), locale = FALSE)
## R version 2.15.2 (2012-10-26)
## Platform: i386-w64-mingw32/i386 (32-bit)
## 
## attached base packages:
## [1] splines   grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] Hmisc_3.10-1     snow_0.3-10      gbm_1.6-3.2      survival_2.36-14
##  [5] caret_5.15-044   foreach_1.4.0    cluster_1.14.3   reshape_0.8.4   
##  [9] lme4_0.999999-0  Matrix_1.0-10    lattice_0.20-10  xtable_1.7-0    
## [13] gridExtra_0.9.1  sandwich_2.2-9   quantreg_4.91    SparseM_0.96    
## [17] mgcv_1.7-22      eeptools_0.1     mapproj_1.1-8.3  maps_2.2-6      
## [21] proto_0.3-9.2    stringr_0.6.1    plyr_1.7.1       ggplot2_0.9.2.1 
## [25] lmtest_0.9-30    zoo_1.7-9        knitr_0.8       
## 
## loaded via a namespace (and not attached):
##  [1] codetools_0.2-8    colorspace_1.2-0   compiler_2.15.2   
##  [4] dichromat_1.2-4    digest_0.5.2       evaluate_0.4.2    
##  [7] formatR_0.6        gtable_0.1.1       iterators_1.0.6   
## [10] labeling_0.1       markdown_0.5.3     MASS_7.3-22       
## [13] memoise_0.1        munsell_0.4        nlme_3.1-105      
## [16] RColorBrewer_1.0-5 reshape2_1.2.1     scales_0.2.2      
## [19] stats4_2.15.1      tools_2.15.1

Attribution and License

Public Domain Mark
This work (R Tutorial for Education, by Jared E. Knowles), in service of the Wisconsin Department of Public Instruction, is free of known copyright restrictions.