---
title: "Statistics Lab"
output: html_document
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, include = FALSE}
library(tidyverse)
library(broom)
library(jhur)
library(psych)
library(corrr)
```
# Part 1
Read in the following infant mortality data from http://johnmuschelli.com/intro_to_r/data/indicatordeadkids35.csv
```{r}
mort = read_csv("http://johnmuschelli.com/intro_to_r/data/indicatordeadkids35.csv")
```
or use
```{r}
mort = read_mortality()
```
Then change the first column name to `country`"
```{r}
mort = mort %>%
rename(country = X1)
```
or
```{r}
colnames(mort)[1] = "country"
```
1. Compute the correlation between the `1980`, `1990`, `2000`, and `2010` mortality data.
No need to save this in an object. Just display the result to the screen. Note any `NA`s. Then compute using `use = "complete.obs"`.
```{r}
```
2. a. Compute the correlation between the `Myanmar`, `China`, and `United States` mortality data. Store this correlation matrix in an object called `country_cor`
b. Extract the Myanmar-US correlation from the correlation matrix.
```{r}
```
3. Is there a difference between mortality information from `1990` and `2000`?
Run a t-test and a Wilcoxon rank sum test to assess this.
Hint: to extract the column of information for `1990`, use `mort$"1990"`
```{r}
```
# Part 2
Read in the Kaggle cars auction dataset from http://johnmuschelli.com/intro_to_r/data/kaggleCarAuction.csv with the argument `col_types = cols(VehBCost = col_double())`:
```{r}
cars = read_csv("http://johnmuschelli.com/intro_to_r/data/kaggleCarAuction.csv",
col_types = cols(VehBCost = col_double()))
```
4. Fit a linear regression model with vehicle cost (`VehBCost`) as the outcome and
vehicle age (`VehicleAge`) and size (`Size`) as predictors as well as their interaction. Save the model fit in an object called `lmfit_cars` and display the summary table.
```{r}
```
5. Create a variable called `expensive` in the `cars` data that indicates if the
vehicle cost is over `$10,000`. Use a chi-squared test to assess if there is a
relationship between a car being expensive and it being labeled as a "bad buy" (`IsBadBuy`).
```{r}
```
6. Fit a logistic regression model where the outcome is "bad buy" status and predictors are the `expensive` status and vehicle age (`VehicleAge`).
Save the model fit in an object called `logfit_cars` and display the summary table.
Use summary or `tidy(logfit_cars, conf.int = TRUE, exponentiate = TRUE)`
or `tidy(logfit_cars, conf.int = TRUE, exponentiate = FALSE)` for log odds ratios
```{r}
```
7. Randomly sample `10,000` observations (rows) with replacement from the `cars` dataset and store this new dataset in an object called `cars_subsample`. Hint: `sample_n` function
```{r}
```
8. Fit the same logistic regression model as in problem 6 above
with this sample and display the summary table using the `tidy` function. How do the results compare? Call this model `logfit_cars_sub`:
```{r}
```