Data Days

What is R?

An OPEN-SOURCE programming LANGUAGE and free software ENVIRONMENT for STATISTICAL COMPUTING and GRAPHICS

Learn more at https://www.r-project.org/

Download R at https://cran.r-project.org/

The R Interface

> R uses a ...

> command line interface |

How does R work?

Basic Usage

"R doesn’t protect you from yourself: you can easily shoot yourself in the foot. As long as you don’t aim the gun at your foot and pull the trigger, you won’t have a problem."

- Hadley Wickham (Advanced R; 2014)

Hadley Wickham https://github.com/hadley

RStudio https://www.rstudio.com/

The RStudio IDE

Functional Approach

R is a functional programming language. Every operation is a function call.

1 + 2
`+`(1, 2)
[1] 3
[1] 3

"To understand computations in R, two slogans are helpful:

  • Everything that exists is an object
  • Everything that happens is a function call"

- John Chambers

Package Functions

Don't reinvent the wheel!

R is highly extensible through packages

Many amazing packages come installed with R. See the Packages pane in RStudio or run the following command in the R console: installed.packages()

To access the functions (and other objects) from a package, first load the package using the library() function.

qplot(x = Petal.Width, y = Petal.Length, data = iris)
Error: could not find function "qplot"
library(ggplot2)
qplot(x = Petal.Width, y = Petal.Length, data = iris)

* Can also use the double-colon operator, for example: ggplot2::qplot()

Help!

Most R functions come packaged with robust documentation, which you can access by using the help() function:

# pull up help page for the mean function
help(mean)

Also find terrific community help online:

Google

GitHub

StackOverflow

Package Overview

Fundamental unit of reproducible R code

Includes:

  • R functions
  • Documentation
  • Sample data

Standards for creating

May depend on other packages

Typically domain specific

CRAN

The Comprehensive R Archive Network

A collection of sites ("mirrors") that carry identical material, consisting of the R distribution(s), the contributed extensions, documentation for R and binaries.

(i.e. main repository for R and packages)

Master site: https://CRAN.R-project.org/

Download the latest version of R

Over 7,800 packages available

Install packages from CRAN:

install.packages("dplyr")

Essential Packages: dplyr

Grammar for Data Manipulation

library(dplyr)
mtcars %>% 
    filter(mpg >= 15) %>% 
    group_by(cyl) %>% 
    summarise(numCARS = n(),
              avgMPG = mean(mpg),
              avgHP = mean(hp),
              medWT = median(wt),
              pctMANUAL = mean(am)) %>% 
    arrange(cyl)
# A tibble: 3 × 6
    cyl numCARS   avgMPG     avgHP medWT pctMANUAL
  <dbl>   <int>    <dbl>     <dbl> <dbl>     <dbl>
1     4      11 26.66364  82.63636 2.200 0.7272727
2     6       7 19.74286 122.28571 3.215 0.4285714
3     8       9 16.47778 198.77778 3.570 0.2222222

https://github.com/hadley/dplyr

Essential Packages: ggplot2

Publication Quality Graphics

library(ggplot2)
ggplot(data = mtcars, aes(x = hp, y = mpg)) + 
    geom_point() + 
    stat_smooth(method = lm)

https://github.com/hadley/ggplot2

Essential Packages: rmarkdown

Dynamic Documents, Presentations and Reports

  • Combine markdown with R code/output
  • Fully reproducible output
  • Many output formats (HTML, PDF, etc…)

Reference Guide

Cheat Sheet

https://github.com/rstudio/rmarkdown

Essential Packages: leaflet

Interactive HTML Maps

library(leaflet)
leaflet() %>% 
    addTiles() %>% 
    setView(lng = -81.6925,
            lat = 41.50132,
            zoom = 17) %>% 
    addMarkers(lng = -81.695174,
               lat = 41.501313,
               popup = paste0("<b>HIMSS Innovation Center</b>",
                              "<br>4th floor of the Global Center for Health Innovation",
                              "<br>1 St Clair Ave NE",
                              "<br>Cleveland, OH 44114"))

https://rstudio.github.io/leaflet/

Essential Packages: DT

Interactive HTML Tables

library(DT)
datatable(iris,
          extensions = "Scroller",
          options = list(
              scrollY = 320,
              scrollCollapse = TRUE
          ),
          rownames = FALSE)

http://rstudio.github.io/DT/

Essential Packages: dygraphs

Interactive HTML Time Series Plots

library(dygraphs)
lungDeaths <- cbind(mdeaths, fdeaths)
dygraph(lungDeaths) %>% 
    dyLegend(width = 300,
             show = "always") %>%
    dyRangeSelector(dateWindow = c("1974-01-01",
                                   "1979-12-31"))

http://rstudio.github.io/dygraphs/

Essential Packages: RODBC

ODBC Database Access

library(RODBC)
ch <- odbcConnect("Adhoc")
x <- sqlQuery(ch, "select * from dbo.MyTable;")
close(ch)
head(x)
  id   random1     random2
1  1 0.3550420 -0.17346265
2  2 0.4017450  0.02835218
3  3 0.2567329  0.79680834
4  4 0.6899611  1.39265608
5  5 0.8510193 -0.30869383
6  6 0.6148350 -0.27166393

RODBC Documentation

http://www.unixodbc.org/

Essential Packages: XLConnect

Book1.xlsx

Excel Connector for R

library(XLConnect)
x <- readWorksheetFromFile(file = "../xlsx/Book1.xlsx", 
                           sheet = "Sheet1", 
                           startRow = 2, 
                           startCol = 2, 
                           endRow = 3, 
                           endCol = 3, 
                           header = FALSE)
y <- x + 4
wb <- loadWorkbook("../xlsx/Book1.xlsx")
setStyleAction(object = wb, 
               type = XLC$"STYLE_ACTION.NONE")
writeWorksheet(object = wb,
               data = y,
               sheet = "Sheet1",
               startRow = 5,
               startCol = 4,
               header = FALSE)
saveWorkbook(wb)

XLConnect Documentation

Thank You!