2017-01-19

Slides available at http://bit.ly/tcrug

This work is released under Creative Commons

About me

Past

  • PhD in statistics from Iowa State University (December, 2016)
  • From TC area – hoping to stay here long term

Present

  • Freelance software engineer, data scientist, and educator
    • Maintaining plotly (~ 2 years!)
      • Looking for "real-world applications".
    • Developing rerddap (interface to NOAA data).
    • Authoring O'Reilly Oriole (online tutorial mixing video, text, code).

Why interactive graphics?

Why interactive graphics on the web?

  • Portable (i.e., cross-platform)
  • Simple to share (especially self-contained HTML)
  • Encourages composability (i.e., reports, dashboards, etc)
  • Enables integration of multiple systems (1 + 1 > 2)

The problem with web graphics for data analytics

  • Great for conveying information (viz is known), but impractical for exploration (viz is unknown).

  • The reality: analysts have to learn/juggle many technologies.
  • My goal: An R interface that makes 80% of techniques seamless, quick, and easy (w/o knowledge of web technology).

A demo of the workflow

2016 Election Outcomes by County (Politico)

     County |   State | TotalVotes| Clinton| Johnson| Stein| Trump| Population|    Area
    --------| --------| ----------|--------|--------|------|------|-----------|--------
1   autauga | alabama |      24661|   0.240|   0.022| 0.004| 0.734|      54571|  594.44
2   baldwin | alabama |      94090|   0.196|   0.026| 0.005| 0.774|     182265| 1589.78
3   barbour | alabama |      10390|   0.467|   0.009| 0.002| 0.523|      27457|  884.88
4      bibb | alabama |       8748|   0.214|   0.014| 0.002| 0.770|      22915|  622.58
5    blount | alabama |      25384|   0.085|   0.013| 0.004| 0.899|      57322|  644.78
6   bullock | alabama |       4701|   0.751|   0.005| 0.002| 0.242|      10914|  622.81
7    butler | alabama |       8685|   0.428|   0.007| 0.001| 0.563|      20947|  776.83
8   calhoun | alabama |      47376|   0.279|   0.024| 0.006| 0.692|     118572|  605.87
9  chambers | alabama |      13778|   0.418|   0.012| 0.003| 0.566|      34215|  596.53
10 cherokee | alabama |      10503|   0.145|   0.014| 0.002| 0.839|      25989|  553.70
# ... with 3,101 more rows
  • Is there a relationship between population density and voting preference?

Choropleth map of voter turnout

Linking proportions with geography

Querying missing values

Querying seasons directly/indirectly

Brushing plus animation

Linked Tree Brushing

The bigger picture

  • All these examples:
    • Are self-contained HTML (easy to share/deploy/embed!)
    • 3 types of manipulation: focusing, arranging, and/or linking views.
  • Cook, Buja, & Swayne 1996: {focusing, arranging, linking} => {find Gestalt, pose queries, make comparisons}
    • Linking has different interpretations (database query)
    • Focusing is deceivingly general (some things would require shiny)
    • Arrangment must (currently) be specified at runtime.

Future work

  • Keep adding documentation and examples to plotly for R book
  • Further advance plotly's support for linking views (without shiny).
  • Support for more popular ggplot2 extension packages such as ggrepel and ggraph.
    • Integrating plotly's support for linking tree-structures with ggraph would be particularly interesting.

Thank you