+ - 0:00:00
Notes for current slide
Notes for next slide

Augmenting data exploration with interactive graphics

Carson Sievert

Slides: https://talks.cpsievert.me

@cpsievert
@cpsievert
cpsievert1@gmail.com
https://cpsievert.me/

Slides released under Creative Commons

1 / 32

About me

  • PhD in statistics with Heike Hofmann & Di Cook (Dec 2016)

  • CEO of Sievert Consulting LLC (Jan 2017)

    • Clients: plotly, NOAA, Sandia Labs, O'Reilly
  • I ❤️ interactive data visualization

    • Maintain/author R 📦s: plotly, LDAvis, animint
2 / 32

Data science workflow

3 / 32

Expository vis
















4 / 32

Exploratory vis


















Problem: analysts juggle many languages (R, JavaScript, python)

5 / 32

Interactive graphics can augment exploratory analysis, but are only practical when we can iterate quickly

6 / 32

Interactive graphics augment exploration1


[1]: Statisticians were building (very advanced!) int graphics systems decades ago -- http://stat-graphics.org/movies/

[2]: Worried about inference? See visual (Majumder et al 2013) and post-selection (Berk et al 2013) inference frameworks.

7 / 32

Interactive graphics augment exploration

8 / 32
9 / 32

Generally useful for comparing within/across panels!

10 / 32

An example with Texas housing prices

library(dplyr)
library(plotly)
tx <- txhousing %>%
select(city, year, month, median) %>%
filter(city %in% c("Galveston", "Midland", "Odessa", "South Padre Island"))
tx
#> # A tibble: 748 x 4
#> city year month median
#> <chr> <int> <int> <dbl>
#> 1 Galveston 2000 1 95000
#> 2 Galveston 2000 2 100000
#> 3 Galveston 2000 3 98300
#> 4 Galveston 2000 4 111100
#> 5 Galveston 2000 5 89200
#> 6 Galveston 2000 6 108600
#> 7 Galveston 2000 7 99000
#> 8 Galveston 2000 8 96200
#> 9 Galveston 2000 9 104000
#> 10 Galveston 2000 10 118800
#> # ... with 738 more rows
11 / 32

Compare within and across cities

TX <- crosstalk::SharedData$new(tx, ~year)
p <- ggplot(TX, aes(month, median, group = year)) + geom_line() +
facet_wrap(~city, ncol = 2)
highlight(ggplotly(p), dynamic = TRUE, selectize = TRUE)
12 / 32

Share (default) selections

highlight(ggplotly(p), defaultValues = 2006)
13 / 32

Produces a standalone webpage!

14 / 32

Easier to share, scale, & maintain

15 / 32

Have lots of panels?

Check out TrelliscopeJS with Plotly

16 / 32

Beyond trellis (i.e. facet) displays

17 / 32

Query missing values by city

demo("crosstalk-highlight-pipeline", package = "plotly")
18 / 32

The 'data pipeline'

19 / 32

Control how selections are rendered (code)

20 / 32

The implementation

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"))
# shared data will make the polygons "query-able"
ncsd <- SharedData$new(nc)
p <- ggplot(ncsd) +
geom_sf(aes(fill = AREA, text = paste0(NAME, "\n", "FIPS: ", FIPS))) +
ggthemes::theme_map()
# use highlight function to draw polygon outline on hover
ggplotly(p, tooltip = "text") %>%
highlight(
"plotly_hover",
opacityDim = 1,
selected = attrs_selected(line = list(color = "black"))
)
21 / 32

Works with 'aggregated' traces

22 / 32

The implementation

d <- SharedData$new(mpg)
dots <- plot_ly(d, color = ~class, x = ~displ, y = ~cyl)
boxs <- plot_ly(d, color = ~class, x = ~class, y = ~cty) %>% add_boxplot()
bars <- plot_ly(d, x = ~class, color = ~class)
subplot(dots, boxs) %>%
subplot(bars, nrows = 2) %>%
layout(barmode = "overlay") %>%
highlight("plotly_selected")

plotly.js dynamically recomputes summary stats as a function of selection

23 / 32

Interactive graphics augment exploration!

24 / 32

See relationships evolve over time (made via ggplot2)

25 / 32

Interactively plot models in data space (code)

26 / 32

Works both without and with shiny!

27 / 32

Summary

Interactive graphics can augment exploratory analysis, but are only practical when we can iterate quickly

Quickly pose queries about data and make comparisons with plotly + crosstalk

31 / 32

About me

  • PhD in statistics with Heike Hofmann & Di Cook (Dec 2016)

  • CEO of Sievert Consulting LLC (Jan 2017)

    • Clients: plotly, NOAA, Sandia Labs, O'Reilly
  • I ❤️ interactive data visualization

    • Maintain/author R 📦s: plotly, LDAvis, animint
2 / 32
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow