Tidy Inference in R

# Tidy Inference in R
## Data Science Programming
### Alison Hill
### 2018-03-09

---

# Install Packages

```r
install.packages("tidyverse")
install.packages("broom") 
# today I'll use the development version
install.packages("remotes")
remotes::install_github("andrewpbray/infer", ref = "develop")
install.packages("devtools")
devtools::install_github("sfirke/janitor")
```

---

## Load Packages

```r
library(tidyverse)
library(infer)
library(janitor)
library(broom)
```

---
class: inverse, middle, center

---

# Research Question

If you see someone else yawn, are you more likely to yawn?

In an episode of the show **Mythbusters**, they tested the myth that yawning is contagious.

---

# Participants and Procedure

- `$N = 50$` adults who thought they were being considered for an appearance on the show.

- Each participant was interviewed individually by a show recruiter ("confederate") who either
    - yawned, `$n = 34$`
    - or not, `$n = 16$`.

- Participants then sat by themselves in a large van and were asked to wait.

- While in the van for some set amount of time (unknown), the Mythbusters watched to see if the unaware participants yawned.

---

# Data

Two group design:

- `$n = 34$` saw the confederate yawn (*seed*)
- `$n = 16$` did not see the confederate yawn (*control*)

```r
group <- c(rep("control", 12), rep("seed", 24), 
            rep("control", 4), rep("seed", 10))
yawn <- c(rep(0, 36), rep(1, 14))

yawn_myth <- data_frame(subj = seq(1, 50), group, yawn) %>% 
  mutate(yawn = as.factor(yawn))

glimpse(yawn_myth)
```

```
Observations: 50
Variables: 3
$ subj  <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
$ group <chr> "control", "control", "control", "control", "control", "...
$ yawn  <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
```

---
class: middle, center

<div id="htmlwidget-02a1908619c15f21276f" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-02a1908619c15f21276f">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50"],[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50],["control","control","control","control","control","control","control","control","control","control","control","control","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed","control","control","control","control","seed","seed","seed","seed","seed","seed","seed","seed","seed","seed"],["0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","1","1","1","1","1","1","1","1","1","1","1","1","1"]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>subj<\/th>\n      <th>group<\/th>\n      <th>yawn<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":8,"columnDefs":[{"className":"dt-right","targets":1},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[8,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>

---

# Results

```r
yawn_myth %>% 
  tabyl(group, yawn) %>% 
  adorn_percentages() %>% 
  adorn_pct_formatting() %>% 
  adorn_ns()
```

```
   group          0          1
 control 75.0% (12) 25.0%  (4)
    seed 70.6% (24) 29.4% (10)
```

---
class: inverse, middle, center

## Conclusion

## *Finding: CONFIRMED*<sup>1</sup>

.footnote[
[1] http://www.discovery.com/tv-shows/mythbusters/mythbusters-database/yawning-contagious/]

---

# Really?

> "Though that's not an enormous increase, since they tested 50 people in the field, the gap was still wide enough for the MythBusters to confirm that yawning is indeed contagious."<sup>1</sup>

.footnote[
[1] http://www.discovery.com/tv-shows/mythbusters/mythbusters-database/yawning-contagious/]

---

# State the hypotheses

`$H_0$`:

> There is no difference between the seed and control groups in the proportion of people who yawned.

`$H_1$` (directional):

> More people (relatively) yawned in the seed group than in the control group.

---

# Test the hypothesis

Which type of hypothesis test would you conduct here?

- Independent samples t-test
- Two proportion test
- Chi-square test of independence
- Analysis of Variance
- I don't know!

---

# Test the hypothesis

Which type of hypothesis test would you conduct here?

- Independent samples t-test
- Two proportion test
- Chi-square test of independence
- Analysis of Variance
- I don't know!

<br>

***

Answer: 
- Two proportion test

---
class: center, middle

# Two proportion test

`$H_0: p_{seed} = p_{control}$`

`$H_1: p_{seed} > p_{control}$`

---

# The observed difference

```r
yawn_myth %>%
  group_by(group) %>%
  summarize(prop = mean(yawn == 1))
```

```
# A tibble: 2 x 2
  group    prop
  <chr>   <dbl>
1 control 0.250
2 seed    0.294
```

```r
(obs_diff <- yawn_myth %>%
  group_by(group) %>%
  summarize(prop = mean(yawn == 1)) %>%
  summarize(diff(prop)) %>%
  pull())
```

```
[1] 0.04411765
```

---
class: inverse, middle, center

## Is this difference *meaningful*?

## Different question:

## Is this difference *significant*?

---

# Modeling the null hypothesis

If...

`$$H_0: p_{seed} = p_{control}$$`

is true, then whether or not the participant saw someone else yawn does not matter: there is no association between exposure and yawning.

---
class: inverse, center, middle

![](http://directorsnotes.com/wp-content/uploads/2011/11/another_earth_03.jpg)
---

```
# A tibble: 12 x 3
    subj group   yawn 
   <int> <chr>   <fct>
 1     1 control 0    
 2     2 control 0    
 3     3 control 0    
 4     4 control 0    
 5     5 control 0    
 6     6 control 0    
 7    15 seed    0    
 8    16 seed    0    
 9    17 seed    0    
10    18 seed    0    
11    19 seed    0    
12    20 seed    0    
```

```
   group  0  1 Total
 control 12  4    16
    seed 24 10    34
   Total 36 14    50
```
]

```
# A tibble: 12 x 3
    subj group   alt_yawn
   <int> <fct>   <fct>   
 1     1 control 0       
 2     2 control 0       
 3     3 control 1       
 4     4 control 0       
 5     5 control 0       
 6     6 control 0       
 7    15 seed    0       
 8    16 seed    1       
 9    17 seed    0       
10    18 seed    0       
11    19 seed    0       
12    20 seed    1       
```

```
   group  0  1 Total
 control 12  4    16
    seed 24 10    34
   Total 36 14    50
```
]

---

# 1000 parallel universes

```
# A tibble: 1,000 x 2
   replicate    stat
       <int>   <dbl>
 1         1 -0.140 
 2         2 -0.0478
 3         3 -0.0478
 4         4 -0.0478
 5         5 -0.140 
 6         6  0.228 
 7         7 -0.140 
 8         8  0.136 
 9         9 -0.232 
10        10  0.0441
# ... with 990 more rows
```
]

```
# A tibble: 11 x 2
   replicate    stat
       <int>   <dbl>
 1       990 -0.232 
 2       991 -0.0478
 3       992 -0.232 
 4       993  0.136 
 5       994 -0.140 
 6       995  0.136 
 7       996 -0.232 
 8       997  0.0441
 9       998  0.0441
10       999 -0.232 
11      1000 -0.140 
```
]

---
## The parallel universe distribution

![](inference_slides_files/figure-html/unnamed-chunk-12-1.png)

The distribution of 1000 differences in proportions, if the null hypothesis were *true* and yawning was not contagious.

In how many of our "parallel universes" is the difference as big or bigger than the one we observed (0.0441176)?

---
## Calculating the p-value

![](inference_slides_files/figure-html/unnamed-chunk-13-1.png)

That proportion is the p-value!

```
# A tibble: 1 x 3
  n_as_big n_total p_value
     <int>   <int>   <dbl>
1      512    1000   0.512
```

---
class: middle, center

[![Only One Test](../images/downey.png)](http://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html)

---

# The tidy way

# Use the infer package

---

![](../images/infer.073.jpeg)