Read in Data

library(readr)
death = read_csv(
  "http://johnmuschelli.com/intro_to_r/data/indicatordeadkids35.csv")
death[1:2, 1:5]
# A tibble: 2 x 5
  X1          `1760` `1761` `1762` `1763`
  <chr>        <dbl>  <dbl>  <dbl>  <dbl>
1 Afghanistan     NA     NA     NA     NA
2 Albania         NA     NA     NA     NA

Read in Data: jhur

jhur::read_mortality()
# A tibble: 197 x 255
   X1    `1760` `1761` `1762` `1763` `1764` `1765` `1766` `1767` `1768`
   <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 Afgh…     NA     NA     NA     NA     NA     NA     NA     NA     NA
 2 Alba…     NA     NA     NA     NA     NA     NA     NA     NA     NA
 3 Alge…     NA     NA     NA     NA     NA     NA     NA     NA     NA
 4 Ango…     NA     NA     NA     NA     NA     NA     NA     NA     NA
 5 Arge…     NA     NA     NA     NA     NA     NA     NA     NA     NA
 6 Arme…     NA     NA     NA     NA     NA     NA     NA     NA     NA
 7 Aruba     NA     NA     NA     NA     NA     NA     NA     NA     NA
 8 Aust…     NA     NA     NA     NA     NA     NA     NA     NA     NA
 9 Aust…     NA     NA     NA     NA     NA     NA     NA     NA     NA
10 Azer…     NA     NA     NA     NA     NA     NA     NA     NA     NA
# … with 187 more rows, and 245 more variables: `1769` <dbl>,
#   `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>,
#   `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>,
#   `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>,
#   `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>,
#   `1790` <dbl>, `1791` <dbl>, `1792` <dbl>, `1793` <dbl>, `1794` <dbl>,
#   `1795` <dbl>, `1796` <dbl>, `1797` <dbl>, `1798` <dbl>, `1799` <dbl>,
#   `1800` <dbl>, `1801` <dbl>, `1802` <dbl>, `1803` <dbl>, `1804` <dbl>,
#   `1805` <dbl>, `1806` <dbl>, `1807` <dbl>, `1808` <dbl>, `1809` <dbl>,
#   `1810` <dbl>, `1811` <dbl>, `1812` <dbl>, `1813` <dbl>, `1814` <dbl>,
#   `1815` <dbl>, `1816` <dbl>, `1817` <dbl>, `1818` <dbl>, `1819` <dbl>,
#   `1820` <dbl>, `1821` <dbl>, `1822` <dbl>, `1823` <dbl>, `1824` <dbl>,
#   `1825` <dbl>, `1826` <dbl>, `1827` <dbl>, `1828` <dbl>, `1829` <dbl>,
#   `1830` <dbl>, `1831` <dbl>, `1832` <dbl>, `1833` <dbl>, `1834` <dbl>,
#   `1835` <dbl>, `1836` <dbl>, `1837` <dbl>, `1838` <dbl>, `1839` <dbl>,
#   `1840` <dbl>, `1841` <dbl>, `1842` <dbl>, `1843` <dbl>, `1844` <dbl>,
#   `1845` <dbl>, `1846` <dbl>, `1847` <dbl>, `1848` <dbl>, `1849` <dbl>,
#   `1850` <dbl>, `1851` <dbl>, `1852` <dbl>, `1853` <dbl>, `1854` <dbl>,
#   `1855` <dbl>, `1856` <dbl>, `1857` <dbl>, `1858` <dbl>, `1859` <dbl>,
#   `1860` <dbl>, `1861` <dbl>, `1862` <dbl>, `1863` <dbl>, `1864` <dbl>,
#   `1865` <dbl>, `1866` <dbl>, `1867` <dbl>, `1868` <dbl>, …
death = death %>% rename(country = X1)
death[1:2, 1:5]
# A tibble: 2 x 5
  country     `1760` `1761` `1762` `1763`
  <chr>        <dbl>  <dbl>  <dbl>  <dbl>
1 Afghanistan     NA     NA     NA     NA
2 Albania         NA     NA     NA     NA

Data are not Tidy!

Tidying data: reshape the data

After reshaping the data to long, we can plot the data with one data.frame:

library(tidyverse)
long = gather(death, key = year, value = deaths, -country)
long = long %>% filter(!is.na(deaths))
head(long);   # note class year
# A tibble: 6 x 3
  country        year  deaths
  <chr>          <chr>  <dbl>
1 Sweden         1760    2.21
2 United Kingdom 1760    2.20
3 Sweden         1761    2.30
4 United Kingdom 1761    2.35
5 Sweden         1762    2.79
6 United Kingdom 1762    2.32
long = long %>% mutate(year = as.numeric(year))

Plot the long data

swede_long = long %>% filter(country == "Sweden")
qplot(x = year, y = deaths, data = swede_long)

Plot the long data only up to 2012

qplot(x = year, y = deaths, data = swede_long, xlim = c(1760,2012))

ggplot2

ggplot2 is a package of plotting that is very popular and powerful (using the grammar of graphics). qplot (“quick plot”), similar to plot

library(ggplot2)
qplot(x = year, y = deaths, data = swede_long)

ggplot2

The generic plotting function is ggplot, which uses aesthetics:

ggplot(data, aes(args))
g = ggplot(data = swede_long, aes(x = year, y = deaths))

g is an object, which you can adapt into multiple plots!

ggplot2

Common aesthetics:

  • x
  • y
  • colour/color
  • size
  • fill
  • shape

If you set these in aes, you set them to a variable. If you want to set them for all values, set them in a geom.

ggplot2

You can do this most of the time using qplot, but qplot will assume a scatterplot if x and y are specified and histogram if x is specified:

q = qplot(data = swede_long, x = year, y = deaths)
q

g is an object, which you can adapt into multiple plots!

ggplot2: what’s a geom?

g on it’s own can’t be plotted, we have to add layers, usually with geom_ commands:

  • geom_point - add points
  • geom_line - add lines
  • geom_density - add a density plot
  • geom_histogram - add a histogram
  • geom_smooth - add a smoother
  • geom_boxplot - add a boxplots
  • geom_bar - bar charts
  • geom_tile - rectangles/heatmaps

ggplot2: adding a geom and assigning

You “add” things to a plot with a + sign (not pipe!). If you assign a plot to an object, you must call print to print it.

gpoints = g + geom_point(); print(gpoints) # one line for slides

ggplot2: adding a geom

Otherwise it prints by default - this time it’s a line

g + geom_line()

ggplot2: adding a geom

You can add multiple geoms:

g + geom_line() + geom_point()