+ - 0:00:00
Notes for current slide
Notes for next slide

I love this diagram from the R for Data Science book.

Concisely captures the main components.

Creating Interactive Graphics for the Web using R

Carson Sievert

Slides: https://talks.cpsievert.me

Twitter: @cpsievert
GitHub: @cpsievert
Email: cpsievert1@gmail.com
Web: https://cpsievert.me/

Slides released under Creative Commons

1 / 30

Data Science Workflow

2 / 30

I love this diagram from the R for Data Science book.

Concisely captures the main components.

Expository vis
















The web is the preferred medium for communicating results.

Assuming you know exactly what you want to visualize, many good JavaScript frameworks exist!

3 / 30

Exploratory vis
















JavaScript lacks tools for modeling/transformation

Too often, analysts juggle several technolgies (R, Python, JavaScript)

4 / 30

JavaScript lacks tools for iteration (necessary for exploration/discovery!)

It is all too easy for statistical thinking to be swamped by programming tasks. -- Brian D. Ripley

5 / 30

So, this is me, in my 2nd year of grad school, deciding to learn D3 & JavaScript.

It took me 6+ months to implement a single interactive visualization.

And let me tell you, you guys, no joke, believe me, I arose from the swamp, and decide I alone will...

☝ 🍊

6 / 30

How to drain the swamp?

R package(s) for creating interactive web graphics which:

  1. Don't require knowledge of web technologies (start-up cost)
  2. Produce standalone HTML whenever possible (hosting/maintenance cost)
  3. Work well with other "tidy" tools in R (iteration cost)
  4. Link to external vis libraries (startover cost)
  5. Easy to use int. techniques that support data analysis tasks (discovery cost)
7 / 30

How to drain the swamp?

R package(s) for creating interactive web graphics which:

  1. Don't require knowledge of web technologies (start-up cost)
  2. Produce standalone HTML whenever possible (hosting/maintenance cost)
  3. Work well with other "tidy" tools in R (iteration cost)
  4. Link to external vis libraries (startover cost)
  5. Easy to use int. techniques that support data analysis tasks (discovery cost)
We can all get behind this, right?
8 / 30

How to drain the swamp?

R package(s) for creating interactive web graphics which:

  1. Don't require knowledge of web technologies (start-up cost)
  2. Produce standalone HTML whenever possible (hosting/maintenance cost)
  3. Work well with other "tidy" tools in R (iteration cost)
  4. Link to external vis libraries (startover cost)
  5. Easy to use int. techniques that support data analysis tasks (discovery cost)
As opposed to a client-server framework
9 / 30
10 / 30

How to drain the swamp?

R package(s) for creating interactive web graphics which:

  1. Don't require knowledge of web technologies (start-up cost)
  2. Produce standalone HTML whenever possible (hosting/maintenance cost)
  3. Work well with other "tidy" tools in R (iteration cost)
  4. Link to external vis libraries (startover cost)
  5. Easy to use int. techniques that support data analysis tasks (discovery cost)
Let me give you an example with plotly
11 / 30
12 / 30
library(tidyverse)
library(ggplot2)
# read and clean data
d <- read_csv('GEOSTAT_grid_POP_1K_2011_V2_0_1.csv') %>%
rbind(read_csv('JRC-GHSL_AIT-grid-POP_1K_2011.csv') %>%
mutate(TOT_P_CON_DT = '')) %>%
mutate(
lat = as.numeric(gsub('.*N([0-9]+)[EW].*', '\\1', GRD_ID))/100,
lng = as.numeric(gsub('.*[EW]([0-9]+)', '\\1', GRD_ID)) * ifelse(gsub('.*([EW]).*', '\\1', GRD_ID) == 'W', -1, 1) / 100
) %>%
filter(lng > 25, lng < 60) %>%
group_by(lat = round(lat, 1), lng = round(lng, 1)) %>%
summarize(value = sum(TOT_P, na.rm = T)) %>%
ungroup() %>%
tidyr::complete(lat, lng)
# visualize
ggplot(d, aes(lng, lat + 5*(value / max(value, na.rm = T)))) +
geom_line(
aes(group = lat, text = paste("Population:", value)),
size = 0.4, alpha = 0.8, color = '#5A3E37', na.rm = T
) +
coord_equal(0.9) +
ggthemes::theme_map()
13 / 30
library(tidyverse)
library(plotly)
d <- read_csv('GEOSTAT_grid_POP_1K_2011_V2_0_1.csv') %>%
rbind(read_csv('JRC-GHSL_AIT-grid-POP_1K_2011.csv') %>%
mutate(TOT_P_CON_DT = '')) %>%
mutate(
lat = as.numeric(gsub('.*N([0-9]+)[EW].*', '\\1', GRD_ID))/100,
lng = as.numeric(gsub('.*[EW]([0-9]+)', '\\1', GRD_ID)) * ifelse(gsub('.*([EW]).*', '\\1', GRD_ID) == 'W', -1, 1) / 100
) %>%
filter(lng > 25, lng < 60) %>%
group_by(lat = round(lat, 1), lng = round(lng, 1)) %>%
summarize(value = sum(TOT_P, na.rm = T)) %>%
ungroup() %>%
tidyr::complete(lat, lng)
# make each latitude "highlight-able"
sd <- crosstalk::SharedData$new(d, ~lat)
ggplot(sd, aes(lng, lat + 5*(value / max(value, na.rm = T)))) +
geom_line(
aes(group = lat, text = paste("Population:", value)),
size = 0.4, alpha = 0.8, color = '#5A3E37', na.rm = T
) +
coord_equal(0.9) +
ggthemes::theme_map()
ggplotly()
14 / 30
15 / 30

How to drain the swamp?

R package(s) for creating interactive web graphics which:

  1. Don't require knowledge of web technologies (start-up cost)
  2. Produce standalone HTML whenever possible (hosting/maintenance cost)
  3. Work well with other "tidy" tools in R (iteration cost)
  4. Link to external vis libraries (startover cost)
  5. Easy to use int. techniques that support data analysis tasks (discovery cost)
Let's see an example with plotly and leaflet
16 / 30

plotly & leaflet

17 / 30
library(plotly)
library(leaflet)
library(crosstalk)
# use uniform/standard data structures!
sd <- SharedData$new(quakes)
p <- plot_ly(sd, x = ~depth, y = ~mag) %>%
add_markers(alpha = 0.5) %>%
highlight("plotly_selected", dynamic = TRUE)
map <- leaflet(sd) %>%
addTiles() %>%
addCircles()
bscols(widths = c(6, 6), p, map)
18 / 30

How to drain the swamp?

R package(s) for creating interactive web graphics which:

  1. Don't require knowledge of web technologies (start-up cost)
  2. Produce standalone HTML whenever possible (hosting/maintenance cost)
  3. Work well with other "tidy" tools in R (iteration cost)
  4. Link to external vis libraries (startover cost)
  5. Easy to use int. techniques that support data analysis tasks (discovery cost)
Hard problem -- statisticians should be more involved here!
19 / 30

Interactive graphics software must be opiniated

Not enough statisticians influence design/implementation

We used to be better at this!!! http://stat-graphics.org/movies/

20 / 30

Interactive graphics software must be opiniated

Not enough statisticians influence design/implementation

We used to be better at this!!! http://stat-graphics.org/movies/

Should we be teaching more JavaScript and less C? 🙀

21 / 30

Techniques that support data analysis Cook et al 1996

22 / 30

Finding Gestalt & posing queries

23 / 30

Making comparisons

24 / 30

Finding Gestalt & "making comparisons"

25 / 30

Important design questions




















What kind of visualizations should be possible/easy?

26 / 30

Important design questions




















How do we help users find the right view?

27 / 30

Important design questions




















How to best enable dynamic answers to (statistical) questions via interactivity?

28 / 30

In summary

  • There is a lack of tools for exploratory data visualization on the web.

  • JavaScript is not designed to do statistical computing.

  • Lets create R interfaces that leverage the computing resources of R with the interactivity of JavaScript!

29 / 30

Thanks! Questions?

Slides: https://talks.cpsievert.me

Learn more

Plotly book: https://plotly-book.cpsievert.me
PhD thesis: https://github.com/cpsievert/phd-thesis

Contact

Twitter: @cpsievert
GitHub: @cpsievert
Email: cpsievert1@gmail.com
Web: https://cpsievert.me/

30 / 30

Data Science Workflow

2 / 30

I love this diagram from the R for Data Science book.

Concisely captures the main components.

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow