Basic Statistical Concepts

R Bootcamp HTML Slides

Jared Knowles

Introduction

We will review the following statistical concepts here through the lens of R:

What is statistics?

“Statistics is the study of the collection, organization, analysis, interpretation, and presentation of data. It deals with all aspects of this, including the planning of data collection”

Key Concepts to Statistical Thinking

  1. Probability
  2. How to describe data
  3. Drawing inferences from data
  4. Causation

Probability

Probability’s uses

But…

Interdependence and Independence

Describing Data

Spread of the Data

plot of chunk barplot

Let’s try another

plot of chunk diamondplot

How about?

plot of chunk diamondplot2

Still more

plot of chunk diamondplot3

Graphical Depictions of Data

Levels of Measurement

Quiz 1

Levels of measurement matter

Describing Data with Numbers

Level of Meas. Stats
Nominal mode, Chi-squared
Ordinal median, percentile
Interval mean, std. deviation, correlation, ANOVA
Continuous geometric mean, harmonic mean, logarithms

Let’s talk about these statistics

Measures of Central Tendency

plot of chunk centraltend
library(xtable)
print(xtable(table(mpg$hwy)), type = "html")
V1
12 5
14 2
15 10
16 7
17 31
18 10
19 13
20 11
21 2
22 7
23 7
24 13
25 15
26 32
27 14
28 7
29 22
30 4
31 7
32 4
33 2
34 1
35 2
36 2
37 1
41 1
44 2

The Mean

The Median

The Mode

Measures of spread

Quantiles

Standard Deviation and Variance

Skew

Distributions of Data

Pictures of distributions

Normal

qplot(rnorm(3000), geom = "density", adjust = 2) + theme_dpi()
plot of chunk bellcurve

Uniform

qplot(runif(1e+05, min = -5, max = 5), geom = "bar") + theme_dpi()
plot of chunk uniform

Poisson

qplot(rpois(3000, lambda = 3), geom = "density", adjust = 2) + theme_dpi()
plot of chunk poisson

Binomial

qplot(rbinom(3000, 1, 0.5), geom = "bar") + theme_dpi()
plot of chunk binom

Weibull

qplot(rweibull(3000, shape = 18, scale = 1), geom = "density") + theme_dpi()
plot of chunk weibull

Distribution Demo

Sampling

Measures of association

Let’s look at Pearson’s Coefficient

We can identify other types of correlation though:

plot of chunk sophcorr

Regression and Statistical Models

Hypothesis testing

Most Common Applications

wt <- c(190.5, 189, 195.5, 187, 191, 190.4, 186, 183, 193, 188)
t.test(wt, mu = 187, alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  wt 
## t = 2.067, df = 9, p-value = 0.06866
## alternative hypothesis: true mean is not equal to 187 
## 95 percent confidence interval:
##  186.8 191.9 
## sample estimates:
## mean of x 
##     189.3

Questions

## 
##  One Sample t-test
## 
## data:  wt 
## t = 2.067, df = 9, p-value = 0.06866
## alternative hypothesis: true mean is not equal to 187 
## 95 percent confidence interval:
##  186.8 191.9 
## sample estimates:
## mean of x 
##     189.3

Let’s look at an example

DEMO

Some practical advice about statistical signficance

Session Info

It is good to include the session info, e.g. this document is produced with knitr version 1.1. Here is my session info:

print(sessionInfo(), locale = FALSE)
## R version 2.15.2 (2012-10-26)
## Platform: i386-w64-mingw32/i386 (32-bit)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] xtable_1.7-1    mgcv_1.7-22     eeptools_0.2    ggplot2_0.9.3.1
## [5] knitr_1.1       Cairo_1.5-2     shiny_0.4.0    
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-5       caTools_1.14       colorspace_1.2-1  
##  [4] dichromat_2.0-0    digest_0.6.3       evaluate_0.4.3    
##  [7] formatR_0.7        grid_2.15.2        gtable_0.1.2      
## [10] labeling_0.1       lattice_0.20-15    MASS_7.3-23       
## [13] Matrix_1.0-11      munsell_0.4        nlme_3.1-108      
## [16] plyr_1.8           proto_0.3-10       RColorBrewer_1.0-5
## [19] reshape2_1.2.2     RJSONIO_1.0-2      scales_0.2.3      
## [22] stringr_0.6.2      tools_2.15.1       websockets_1.1.7

Attribution and License

Public Domain Mark
This work (R Tutorial for Education, by Jared E. Knowles), in service of the Wisconsin Department of Public Instruction, is free of known copyright restrictions.