--- title: "Statistics Lab" output: html_document editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk\$set(echo = TRUE) ``` ```{r, include = FALSE} library(tidyverse) library(broom) library(jhur) library(psych) library(corrr) ``` # Part 1 Read in the following infant mortality data from http://johnmuschelli.com/intro_to_r/data/indicatordeadkids35.csv ```{r} mort = read_csv("http://johnmuschelli.com/intro_to_r/data/indicatordeadkids35.csv") ``` or use ```{r} mort = read_mortality() ``` Then change the first column name to `country`" ```{r} mort = mort %>% rename(country = X1) ``` or ```{r} colnames(mort) = "country" ``` 1. Compute the correlation between the `1980`, `1990`, `2000`, and `2010` mortality data. No need to save this in an object. Just display the result to the screen. Note any `NA`s. Then compute using `use = "complete.obs"`. ```{r} ``` 2. a. Compute the correlation between the `Myanmar`, `China`, and `United States` mortality data. Store this correlation matrix in an object called `country_cor` b. Extract the Myanmar-US correlation from the correlation matrix. ```{r} ``` 3. Is there a difference between mortality information from `1990` and `2000`? Run a t-test and a Wilcoxon rank sum test to assess this. Hint: to extract the column of information for `1990`, use `mort\$"1990"` ```{r} ``` # Part 2 Read in the Kaggle cars auction dataset from http://johnmuschelli.com/intro_to_r/data/kaggleCarAuction.csv with the argument `col_types = cols(VehBCost = col_double())`: ```{r} cars = read_csv("http://johnmuschelli.com/intro_to_r/data/kaggleCarAuction.csv", col_types = cols(VehBCost = col_double())) ``` 4. Fit a linear regression model with vehicle cost (`VehBCost`) as the outcome and vehicle age (`VehicleAge`) and whether it's an online sale (`IsOnlineSale`) as predictors as well as their interaction. Save the model fit in an object called `lmfit_cars` and display the summary table. ```{r} ``` 5. Create a variable called `expensive` in the `cars` data that indicates if the vehicle cost is over `\$10,000`. Use a chi-squared test to assess if there is a relationship between a car being expensive and it being labeled as a "bad buy" (`IsBadBuy`). ```{r} ``` 6. Fit a logistic regression model where the outcome is "bad buy" status and predictors are the `expensive` status and vehicle age (`VehicleAge`). Save the model fit in an object called `logfit_cars` and display the summary table. Use summary or `tidy(logfit_cars, conf.int = TRUE, exponentiate = TRUE)` or `tidy(logfit_cars, conf.int = TRUE, exponentiate = FALSE)` for log odds ratios ```{r} ``` 7. Randomly sample `10,000` observations (rows) with replacement from the `cars` dataset and store this new dataset in an object called `cars_subsample`. Hint: `sample_n` function ```{r} ``` 8. Fit the same logistic regression model as in problem 6 above with this sample and display the summary table using the `tidy` function. How do the results compare? Call this model `logfit_cars_sub`: ```{r} ```