In this assignment, we will be working with the infant mortality data set, found here: http://johnmuschelli.com/intro_to_r/data/indicatordeadkids35.csv.
The packages listed below are simply suggestions, but please edit this list as you see fit.
## you can add more, or change...these are suggestions library(tidyverse) library(readr) library(dplyr) library(ggplot2) library(tidyr)
Read the data using
read_csv() and name it
mort. Rename the first column to
country using the
rename() command in
dplyr. Create an object
year variable by extracting column names (using
colnames()) and make it to an integer
as.integer() ), excluding the first column either with string manipulations or bracket subsetting or subsetting with
Reshape the data so that there is a variable named
year corresponding to
year (key) and a column of the mortalities named
mortality (value), using the
tidyr package and its
gather() function. Name the output
long and make
year a numeric variable.
Hint: remember that -COLUMN_NAME removes that column, gather all the columns but country.
Read in this the tab-delim file and call it
pop: http://johnmuschelli.com/intro_to_r/data/country_pop.txt. The file contains population information on each country. Rename the second column to
"Country" and the column
"% of world population", to
Determine the population of each country in
arrange(). Get the order of the countries based on this (first is the highest population), and extract that column and call it
pop_levels. Make a variable in the
long data set named
sorted that is the
country variable coded as a factor based on
Parts a, b, and c below are only broken up here for clarity, but all three components can be addressed in one chunk of code/as one function, using
%>% as necessary.
long based on years 1975-2010, including 1975 and 2010 and call this
& or the
b. Further subset
long_sub for the following countries using
dplyr::filter() and the
%in% operator on the sorted country factor (
c("Venezuela", "Bahrain", "Estonia", "Iran", "Thailand", "Chile", "Western Sahara", "Azerbaijan", "Argentina", "Haiti").
c. Lastly, remove missing rows for
Hint: Be sure to assign your final object created from a through c as
long_sub so you can use it in questions 6 and 7.
Plotting: create “spaghetti”/line plots for the countries in
long_sub, using different colors for different countries, using
sorted. The x-axis should be
year, and the y-axis should be
mortality. Make the plot using a.
qplot and b.
Bonus: load the
plotly package (
library(plotly)) and assign the plot from question 6 to
g and run