2022-05-16

The Happy data from GSS

The General Social Survey (GSS) has been run by NORC every other year since 1972 to keep track of current opinions across the United States.

An excerpt of the GSS data called happy is available from the classdata package:

remotes::install_github("heike/classdata")
library(classdata)
head(happy)
##           happy year age    sex       marital         degree       finrela
## 1 not too happy 1972  23 female never married       bachelor       average
## 2 not too happy 1972  70   male     separated lt high school above average
## 3  pretty happy 1972  48 female     separated    high school       average
## 4 not too happy 1972  27 female     separated       bachelor       average
## 5  pretty happy 1972  61 female     separated    high school above average
## 6  pretty happy 1972  26   male never married    high school above average
##      health polviews          partyid wtssall
## 1      good     <NA>     ind,near dem  0.4446
## 2      fair     <NA> not str democrat  0.8893
## 3 excellent     <NA>      independent  0.8893
## 4      good     <NA> not str democrat  0.8893
## 5      good     <NA>  strong democrat  0.8893
## 6      good     <NA>     ind,near dem  0.4446

You can find a codebook with explanations for each of the variables at https://gssdataexplorer.norc.org/

Your Turn

Load the happy data from the classdata package.

  • how many variables, how many observations does the data have? What do the variables mean?

  • Plot the variable happy. Introduce a new variable nhappy that has values 1 for not too happy, 2 for pretty happy, 3 for very happy and NA for missing values. There are multiple ways to get to that. Avoid for loops.

  • Based on the newly introduced numeric scores, what is the average happiness of respondents?

Your turn

  • how does average happiness change over the course of a life time? Is this relationship different for men and women? Draw plots.

  • are people now happier than ten years ago? How is happiness related to time?

Your Turn

  • Are Republicans or Democrats happier? Compare average happiness levels over partyid.

  • How are financial relations associated with average happiness levels? Is this association different for men and women?

  • Find a plot that shows the differences for each one of the summaries.

Your turn: asking questions

  • What other variable(s) might be associated with happiness? Plot it.

Helper functions (1)

  • n() provides the number of rows of a subset:
library(dplyr)
happy %>% group_by(sex) %>% summarise(n = n())
## # A tibble: 2 × 2
##   sex        n
##   <fct>  <int>
## 1 female 34904
## 2 male   27562
  • tally() is a combination of summarise and n
happy %>% group_by(sex) %>% tally()
## # A tibble: 2 × 2
##   sex        n
##   <fct>  <int>
## 1 female 34904
## 2 male   27562

Helper functions (2)

  • count() is a further shortcut of group_by and tally:
happy %>% count(sex, degree)
##       sex         degree     n
## 1  female lt high school  7500
## 2  female    high school 18419
## 3  female junior college  2047
## 4  female       bachelor  4731
## 5  female       graduate  2112
## 6  female           <NA>    95
## 7    male lt high school  5825
## 8    male    high school 13598
## 9    male junior college  1425
## 10   male       bachelor  4279
## 11   male       graduate  2357
## 12   male           <NA>    78
  • count() doesn’t introduce any grouping

Grouping and Ungrouping

  • ungroup removes a grouping structure from a data set

  • necessary to make changes to a grouping variable (such as re-ordering or re-labelling)