class: center, middle, inverse, title-slide # Data Visualization ## Using ggplot2 ### Haley Jeppson and Sam Tyner --- class: center, middle # MOTIVATION --- # Why visualize? - The sole purpose of visualization is communication - Visualization offers an alternative way of communicating numbers ![](images/Minard.png) --- # Tables and lists vs. Charts and graphs <table> <thead> <tr> <th style="text-align:right;"> Sepal.Length </th> <th style="text-align:right;"> Sepal.Width </th> <th style="text-align:right;"> Petal.Length </th> <th style="text-align:right;"> Petal.Width </th> <th style="text-align:left;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5.1 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 3.2 </td> <td style="text-align:right;"> 1.3 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.6 </td> <td style="text-align:right;"> 3.1 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 3.6 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.4 </td> <td style="text-align:right;"> 3.9 </td> <td style="text-align:right;"> 1.7 </td> <td style="text-align:right;"> 0.4 </td> <td style="text-align:left;"> setosa </td> </tr> </tbody> </table> --- # Tables and lists vs. Charts and graphs ![](images/table-modified.png) --- # Tables and lists vs. Charts and graphs <img src="1-GraphicsIntro_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> Visualizations can aid communication and make the data easier to perceive --- class: center, middle, inverse # Survey of types of graphs --- # Beginner .pull-left[ **One Variable** - Discrete - Bar Chart - Pie Chart - Continuous - Stem and Leaf Plots - Histograms ] .pull-right[ **Two variables** - Continuous X, Continuous Y - Scatterplots - Maps ] <br/> ![](1-GraphicsIntro_files/figure-html/beg-1.png)<!-- --> --- # Intermediate - Parallel Coordinate Plots - Mosaic Plots - Radar Charts - Heat Maps <br/> <br/> <br/> <br/> ![](1-GraphicsIntro_files/figure-html/int-1.png)<!-- --> --- # Advanced - Density Plots - Violin Plots - Social Network Plots <br/> <br/> <br/> <br/> ![](1-GraphicsIntro_files/figure-html/adv-1.png)<!-- --> --- class: center, middle, inverse # Grammar of Graphics --- # Grammar of Graphics What is the grammar of graphics? - Developed by Leland Wilkinson, is a set of grammatical rules for creating perceivable graphs - Rather than thinking about a limited set of graphs, think about graphical forms - Charts are instances of much more general objects - An abstraction which makes thinking, reasoning, and communicating graphics easier ![](images/grammar.png) ??? We often talk about types of graphs – bar plots, pie charts, scatterplots – as though they are unrelated, but most graphs share many aspects of their structure. We can think of graphs as visual representations of (possibly transformed) data, along with labels (like axes and legends) that make the meaning clear. Much like the grammar of a language allows you to combine words into meaningful sentences, a grammar of graphics provides a structure to combine graphical elements into figures that display data in a meaningful way. The grammar of graphics was developed in order to produce a flexible system that can create a rich variety of charts as simply as possible, without duplication of methods. A grammar provides a strong foundation for understanding a diverse range of graphics. A grammar may also help guide us on what a well-formed or correct graphic looks like, but there will still be many grammatically correct but nonsensical graphics --- # Grammar of Graphics Different types of graphs may appear completely distinct, but in actuality share many common elements. By making different visual choices, you can use graphs to highlight different aspects of the same data. For example, here are three ways of displaying the same data: <img src="1-GraphicsIntro_files/figure-html/plots-3-1.png" /> ??? add in description of how they are different Different types of graphs may, at first glance, appear completely distinct. But in fact, graphs share many common elements, such as coordinate systems and using geometric shapes to represent data. By making different visual choices (Cartesian or polar coordinates, points or lines or bars to represent data), you can use graphs to highlight different aspects of the same data. For example, here are three ways of displaying the same data: --- # Grammar of Graphics Statistical graphic specifications are expressed in six statements: 1) **DATA**: a set of data operations that create variables from datasets 2) **TRANS**: variable transformations 3) **SCALE**: scale transformations 4) **COORD**: a coordinate system 5) **ELEMENT**: graphs (points) and their aesthetic attributes (color) 6) **GUIDE**: one or more guides (axes, legends, etc.) ??? The internal processes that constitute the syntax of the grammar of graphics <br/> <br/> ![](images/grammar.png) a data flow diagram that shoes what the stages are, how they must be ordered, and waht data are required along the way provides us the ingredients and the dependencies among them, but it does not tell us how to assemble the ingredients The Recipe: 1. Create variables – extract data into variables 2. apply algebra – 3. apply scales – ex. categorical 4. compute statistics 5. construct geometry 6. apply coordinates - ex. for polar: send (x,y) to (r, theta) - by postponing the coordinnates operation as late in the pipeline as possible, we have made the system more flexible 7. compute aesthetics - aesthetic functions translate a graph into a graphic, which is a set of drawing instructions for a renderer ex. position, color, label --- # Limitations to the Grammar - tells us what words make up our graphical “sentences,” but offers no advice on how to write well - is not about good taste, practice, or graphic design - while very useful, the grammar is not all encompassing - does not include interactive graphics - does not include a few interesting and useful charts --- class: center, middle, inverse # ggplot2 ## A layered grammar of graphics --- ## A layered grammar vs The Grammar of Graphics ggplot2 is based on The Grammar of Graphics ![](images/layered-grammar.png) In both grammars, the components are independent, meaning that we can generally change a single component in isolation --- # What is a graphic? ggplot2 uses the idea that you can build every graph with graphical components from three sources 1. the **data** , represented by **geoms** 2. the **scales** and **coordinate system** 3. the **plot annotations** - to display values, map variables in the data to visual properties of the geom (**aesthetics**) like **size**, **color**, and **x** and **y** locations ![](images/build1.png)![](images/build2.png) --- # ggplot2: A layered grammar The layered grammar defines the components of a plot as: 1. a default dataset and set of mappings from variables to aesthetics 2. one or more layers, each layer having one geometric object, one statistical transformation, one position adjustment, and optionally, one dataset and set of aesthetic mappings 3. one scale for each aesthetic mapping used 4. a coordinate system 5. the facet specification .pull-right[ ![](images/l.png) ] ??? --- # What is a Layer? - it determines the physical representation of the data - a plot may have multiple layers - usually all the layers on a plot have something in common, i.e. different views of the same data - a layer is composed of four parts: 1. data and aesthetic mapping 2. a statistical transformation (stat) 3. a geometric object (geom) 4. a position adjustment ??? --- # ggplot2: A layered grammar <img src="1-GraphicsIntro_files/figure-html/plots-4-1.png" /> <style> .column-left{ float: left; width: 32%; text-align: left; } .column-center{ display: inline-block; width: 34%; text-align: left; } .column-right{ float: right; width: 32%; text-align: left; } </style> <div class="column-left"> data: diamonds </br> layer: </br> mapping: x = cut, y = count, fill = cut </br> geom: bar </br> coordinates: Cartesian </div> <div class="column-center"> data: diamonds </br> layer: </br> - mapping: x = 1, y = count, fill = cut </br> - geom: fill-bar </br> coordinates: Cartesian </br> </div> <div class="column-right"> data: diamonds </br> layer: </br> - mapping: x = 1, y = count, fill = cut </br> - geom: fill-bar </br> coordinates: Polar </br> </div> --- class: center, middle, inverse # Make your first figure --- # We begin with the data ```r ggplot(data = diamonds) ``` ![](1-GraphicsIntro_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- # Then we specify the aesthetic mappings ```r ggplot(data = diamonds, aes(x = carat, y = price)) ``` ![](1-GraphicsIntro_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ??? or what relationships we want to see --- # Then we choose a geom ```r ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point() ``` ![](1-GraphicsIntro_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- # And add an aesthetic ```r ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point(aes(colour = cut)) ``` ![](1-GraphicsIntro_files/figure-html/unnamed-chunk-13-1.png)<!-- --> --- # And add another layer... ```r ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point(aes(colour = cut)) + geom_smooth() ``` ![](1-GraphicsIntro_files/figure-html/unnamed-chunk-15-1.png)<!-- --> --- # Mapping aesthetics vs setting aesthetics ```r ggplot(data = diamonds, aes(x = carat, y = price) + geom_point(aes(colour = cut), size = 2, alpha = .5) + geom_smooth((aes(fill = cut), colour = "lightgrey")) ``` ![](1-GraphicsIntro_files/figure-html/unnamed-chunk-17-1.png)<!-- --> --- # Coordinate transformations can be specified ```r ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point(aes(colour = cut), size = 2, alpha = .5) + geom_smooth(aes(fill = cut)) + scale_y_log10() ``` ![](1-GraphicsIntro_files/figure-html/unnamed-chunk-19-1.png)<!-- --> --- # As can facet variables ```r ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point(aes(colour = cut), size = 2, alpha = .5) + geom_smooth() + scale_y_log10() + facet_wrap(~cut) ``` ![](1-GraphicsIntro_files/figure-html/unnamed-chunk-21-1.png)<!-- --> --- class: inverse # Today's Outline BASICS 1. Why is data visualization important? 2. Data Types, Formats, and Structures 3. Formating your data: A tidy data discussion BUILDING PLOTS 1. Geoms, Stats, Coordinates, and Faceting PERCEPTION 1. Basics of cognitive visual perception 2. What makes a good graphic? 3. Aesthetics and scales in ‘ggplot2‘ POLISHING PLOTS 1. Setting themes 2. Modifying elements of a plot 3. Making plots interactive! 4. Saving your work