Tutorial 7: Exporting Your Work

DPI R Bootcamp

Jared Knowles

Overview

In this lesson we hope to learn:

Why does Export Matter?

Generating a basic report

  1. Include the data, source code, and output together in one package
  2. Have the source code available for raw data to finished product
  3. Present figures, tables, and code in a single document

A few terms

Beginning

Get the tools

A Simple Example

#' This is some text
#'

# + myplot, dev='svg',out.width='500px',out.height='400px'

library(ggplot2)
data(diamonds)
qplot(carat, price, data = diamonds, alpha = I(0.3), color = clarity)

#' Diamond size is clearly related to price, but not in a linear fashion.
#'

Converting the Script

o <- spin("C:/Path/To/myscript.R", knit = FALSE)
knit2html(o, envir = new.env())

Example Script II

#' This is some text that I want to explain
#' For example, this plot is important, let's look below

# + myplot,
# dev='svg',out.width='500px',out.height='400px',warning=FALSE,message=FALSE

library(ggplot2)
load("PATH/TO/MY/DATA.rda")
qplot(readSS, mathSS, data = df, alpha = I(0.2)) + geom_smooth()

#' There is not a linear relationship, but it sure is close.
#' Let's do some regression
#'

test <- lm(mathSS ~ readSS + factor(grade), data = df)
summary(test)

#' It's all statistically significant

Spin 2

o <- spin("C:/Path/To/myscript2.R", knit = FALSE)
knit2html(o, envir = new.env())
# We specify that new environment is used to carry out the analysis, not
# the current environment

Stitch

Stitch Example

## title: My Super Report ## Author: Mr. Data ##

# A plot and some text
library(ggplot2)
load("PATH/TO/MY/DATA")
qplot(readSS, mathSS, data = df, alpha = I(0.2)) + geom_smooth()
plot of chunk stitching

plot of chunk stitching


# Now a linear model
test <- lm(mathSS ~ readSS + factor(grade), data = df)
summary(test)
## 
## Call:
## lm(formula = mathSS ~ readSS + factor(grade), data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -170.47  -43.35    1.21   45.45  194.45 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     198.343      7.252   27.35  < 2e-16 ***
## readSS            0.478      0.015   31.96  < 2e-16 ***
## factor(grade)4   30.837      4.324    7.13  1.3e-12 ***
## factor(grade)5   34.225      4.112    8.32  < 2e-16 ***
## factor(grade)6   62.517      4.418   14.15  < 2e-16 ***
## factor(grade)7   72.468      4.265   16.99  < 2e-16 ***
## factor(grade)8   96.530      4.650   20.76  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 64.4 on 2693 degrees of freedom
## Multiple R-squared: 0.497,   Adjusted R-squared: 0.496 
## F-statistic:  444 on 6 and 2693 DF,  p-value: <2e-16

# Ok!

Stitch Spin

# Markdown
stitch("PATH/TO/MY/SCRIPT", system.file("misc", "knitr-template.Rmd", package = "knitr"))
knit2html("Path/To/My/Markdown.md")

# Direct 2 Html
stitch("PATH/TO/MY/SCRIPT", system.file("misc", "knitr-template.html", package = "knitr"))

# Direct to PDF Requires LaTeX
stitch("PATH/TO/MY/SCRIPT")

R Flavored Markdown

R Report Concepts

A quick example

# Start .Rmd file on next line
My Super Report on Student Testing
------------------------------------
Dr. Debateman
==============

In this report I plan to show you all the results of student testing in Myoming.

#```{r chunksetup, include=FALSE} (remove # in actual document)
load("PATH/TO/MY/DATA.rda")
source("myscript.R")
library(ggplot2)
#```

The most important thing to look at is this plot:
#```{r plot1,dev='png',fig.width=9,fig.height=6}
qplot(readSS,mathSS,data=df)
#```

And my model output can be included a few ways because it is so great.
#```{r mystatmodel,results='markup'}
mymod<-lm(readSS~mathSS+factor(grade),data=df)
summary(mymod)
#```

#```{r mystatmodel2,results='asis'}
mymod<-lm(readSS~mathSS+factor(grade),data=df)
print(xtable(summary(mymod)),type='html')
#```

And because I am awesome, I am done.

Check for Understanding

knit("PATH/TO/myscript.Rmd", envir = new.env())
knit2html("Path/To/Myscript.md")

Making the Report a Bit Prettier

<style type="text/css">
body, td {
   font-size: 14px;
}
r.code{
  font-size: 10px;
}
pre {
  font-size: 10px
}
</style>

Exporting a Single Plot

# PDF
pdf(file = "PATH/TO/MYPLOT.PDF", width = 10, heigh = 8)
print(qplot(readSS, mathSS, data = df, alpha = I(0.2)))
dev.off()
# PNG
png(file = "PATH/TO/MYPLOT.png", width = 1200, heigh = 900)
print(qplot(readSS, mathSS, data = df, alpha = I(0.2)))
dev.off()

Exporting data

Here’s an Example

write.csv(df, file = "PATH/TO/MY.csv")
write.dta(df, file = "PATH/TO/MY.dta")
# save in the R file
save(df, file = "PATH/TO/MY.rda", compress = "xz")

Exporting Tables

table(df$female, df$schid)
##    
##       6  15  45  66  75 105
##   0 219 222 225 225 234 243
##   1 231 228 225 225 216 207

We need XTABLE

6 15 45 66 75 105
0 369 387 384 378 375 390
1 81 63 66 72 75 60
schoolhigh schoolavg schoollow readSS mathSS
schoolhigh 1.00 -0.52 -0.23 -0.03 0.02
schoolavg -0.52 1.00 -0.71 0.04 -0.07
schoollow -0.23 -0.71 1.00 -0.02 0.06
readSS -0.03 0.04 -0.02 1.00 0.63
mathSS 0.02 -0.07 0.06 0.63 1.00

How?

require(xtable)
print(xtable(table(df$ell, df$schid)), type = "html")

Other options

Reproducibility Matters

Advanced Topics

References

  1. How to Use knitr
  2. CRAN Taskview: Reproducible Research
  3. A Sweave Demo
  4. Donald Knuth on Literate Programming

Session Info

It is good to include the session info, e.g. this document is produced with knitr version 0.8. Here is my session info:

print(sessionInfo(), locale = FALSE)
## R version 2.15.2 (2012-10-26)
## Platform: i386-w64-mingw32/i386 (32-bit)
## 
## attached base packages:
## [1] splines   grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] snow_0.3-10      gbm_1.6-3.2      survival_2.36-14 caret_5.15-044  
##  [5] foreach_1.4.0    cluster_1.14.3   reshape_0.8.4    lme4_0.999999-0 
##  [9] Matrix_1.0-10    lattice_0.20-10  xtable_1.7-0     gridExtra_0.9.1 
## [13] sandwich_2.2-9   quantreg_4.91    SparseM_0.96     mgcv_1.7-22     
## [17] eeptools_0.1     mapproj_1.1-8.3  maps_2.2-6       proto_0.3-9.2   
## [21] stringr_0.6.1    plyr_1.7.1       ggplot2_0.9.2.1  lmtest_0.9-30   
## [25] zoo_1.7-9        knitr_0.8       
## 
## loaded via a namespace (and not attached):
##  [1] codetools_0.2-8    colorspace_1.2-0   compiler_2.15.2   
##  [4] dichromat_1.2-4    digest_0.5.2       evaluate_0.4.2    
##  [7] formatR_0.6        gtable_0.1.1       iterators_1.0.6   
## [10] labeling_0.1       markdown_0.5.3     MASS_7.3-22       
## [13] memoise_0.1        munsell_0.4        nlme_3.1-105      
## [16] RColorBrewer_1.0-5 reshape2_1.2.1     scales_0.2.2      
## [19] stats4_2.15.1      tools_2.15.1

Attribution and License

Public Domain Mark
This work (R Tutorial for Education, by Jared E. Knowles), in service of the Wisconsin Department of Public Instruction, is free of known copyright restrictions.