class: center, middle, inverse, title-slide # Graphical Insights from Data:
Motivation and Introduction ### Heike Hofmann --- class: middle,center,inverse # Motivation --- class:inverse background-color: #000  - Communication - Ability to see the data can lead to new insights ??? When drawn well, statistical graphics help us understand our data and see things that we didn't know were there. The relationship between star magnitude, star color, and spectral class wasn't well understood until someone created a chart like this that showed the color index (or spectral class) against the absolute brightness. Then, it was much easier to see that the chart described a life-cycle - stars start out in the main sequence, and then become giants, dwarfs, or slowly change spectral class over time as they cool down in temperature. Well designed graphs can help us understand the natural phenomenon behind the raw numerical data we've collected. --- class: center, middle, inverse # Overview: Types of Graphs --- ## Basic .pull-left[ **One Variable** - Discrete - Bar Chart - Pie Chart - Continuous - Stem and Leaf Plots - Histograms ] .pull-right[ **Two variables** - Continuous X, Continuous Y - Scatterplots (w/ and w/o trend lines) - Maps ] <!-- --> --- # More "Fancy" .pull-left[ - Parallel Coordinate Plots - Mosaic Plots - Radar Charts - Heat Maps ].pull-right[ - Density Plots - Violin Plots - Social Network Plots ] <!-- -->  --- ## `ggplot2`: Grammar of Graphics in R - Wildly popular package for statistical graphics: over 2.5 million downloads from CRAN in 2017 (several thousand times per day), now daily updates are up to 125k!!! have a look at [CRAN statistics](https://ipub.com/dev-corner/apps/r-package-downloads/) - Developed by Hadley Wickham (An ISU Alumni and one of my students :)) - Designed to adhere to good graphical practices - Constructs plots using the concept of layers - Supports a wide variety plot types and extensions - Python clone: [plotnine](https://monashdatafluency.github.io/python-workshop-base/modules/plotting_with_ggplot/) (link to online workshop slides) ### References - ggplot2: Elegant Graphics for Data Analysis (book) or [online version](https://ggplot2-book.org/) for reference - [Cheat Sheet for ggplot2](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf) - 2 page reference card --- ## Grammar of Graphics A graphical representation (plot) consists of: 1. **mappings** (aesthetics; aes): data variables are mapped to graphical elements 2. **layers**: geometric elements (geoms, such as points, lines, rectangles, text, ...) and statistical transformations (stats, are identity, counts, bins, ...) 3. **scales**: map values in the data space to values in an aesthetic space (e.g. color, size, shape, but also position) 4. **coordinate system** (coord): normally Cartesian, but pie charts use e.g. polar coordinates 5. **facetting**: for small multiples (subsets) and their arrangement 6. **themes**: fine-tune display items, such as font and its size, color of background, margins, ... --- ## A short example, dissected .pull-left[ ```r library(classdata) library(ggplot2) ggplot(data = fbiwide, # Aesthetic mappings aes(x = Burglary, y = Murder)) + # layer - plot using points geom_point() ``` Other quantities (scales, stats, coordinate systems) are chosen **automatically** using smart defaults which are usually visually appealing ] .pull-right[ <!-- --> ] --- ## Adding additional information .pull-left[ ```r library(classdata) library(ggplot2) ggplot(data = fbiwide, # Aesthetic mappings aes(x = Burglary, y = Murder, color = Year)) + # layer - plot using points geom_point() ``` ] .pull-right[ <!-- --> ] --- ## Adding additional information .pull-left[ ```r library(classdata) library(ggplot2) ggplot(data = fbiwide, # Aesthetic mappings aes(x = Burglary, y = Murder, color = State)) + # layer - plot using points geom_point() + # Skip the legend for now guides(color = F) ``` Scales are chosen automatically for continuous and discrete color variables ] .pull-right[ <!-- --> ] --- class: inverse ## Your turn ```r library(classdata) library(ggplot2) ggplot(data = fbiwide, # Aesthetic mappings aes(x = Burglary, y = Murder, color = State)) + # layer - plot using points geom_point() + # Skip the legend for now guides(color = F) ``` - Try mapping variables to other aesthetics such as shape and size - What questions can you answer by modifying this graph? --- # What is a Layer? - it determines the physical representation of the data - a plot may have multiple layers - usually all the layers on a plot have something in common, i.e. different views of the same data - a layer is composed of four parts: 1. data and aesthetic mapping 2. a statistical transformation (stat) 3. a geometric object (geom) 4. a position adjustment ??? --- # ggplot2: A layered grammar <img src="00-ggplot-intro_files/figure-html/plots-4-1.png" /> <style> .column-left{ float: left; width: 32%; text-align: left; } .column-center{ display: inline-block; width: 34%; text-align: left; } .column-right{ float: right; width: 32%; text-align: left; } </style> .column-left[ data: diamonds layer: - mapping: x = cut, y = count, fill = cut - geom: bar coordinates: Cartesian ] .column-center[ data: diamonds layer: - mapping: x = 1, y = count, fill = cut - geom: fill-bar coordinates: Cartesian ] .column-right[ data: diamonds layer: - mapping: x = 1, y = count, fill = cut - geom: fill-bar coordinates: Polar ] --- ## Summary In ggplot2, every graph is composed of layers of: - aesthetic mappings (represented by geoms) with scales and coordinate systems - plot annotations (legend, theme, title, etc.) <img alt="Building a ggplot with aesthetics, geoms, scales, coordinate systems, and annotations" src="images/build1.png" style="min-width:72%;min-height:150px"/><img alt="Final result" src="images/build2.png" style="min-width:24%;min-height:150px"/>