© Stanford University Board of Trustees, 1943

Four years on the farm

I thought it would be fun to try visualizing my time at Stanford with some personal data. I already know this data well, of course, but I’ve generally found that running a quick analysis is a good way to see things from a new perspective.

I started by organizing data about my courses and dorms in a Google Sheets document, which I queried with the googlesheets pacakge by Jennifer Bryan. Luckily, I already had a spreadsheet which recorded the course numbers and unit counts of all the classes I took, so it was mostly a matter of remembering which building each class was in and how often it met.

I used ggmap::geocode() to query the Google Maps API for the longitude and latitude of each of my classrooms and was pretty pleased with the results. But due to some trouble with the API’s query limit, I had to hack together the complete data frame in parts — it wasn’t too difficult, but the code isn’t very elegant, so I won’t include it here. I simplified things by writing the final data frame into a .csv and copying the lon and lat data into my Google sheet.

First steps

Let’s start by loading in some libraries and reading in the data.

library(tidyverse)
library(ggmap)
library(sf)
library(googlesheets)
library(leaflet)
library(leaflet.extras)
source("~/Desktop/Summer Projects/customize_script.R")
map <- get_map(location = "Stanford, CA", zoom = 15, maptype = "watercolor")

sheet_key <- extract_key_from_url(url)
gs <- gs_key(sheet_key)

data_geo <- 
  gs %>% 
  gs_read(ws = "Academic") %>% 
  rename_all(fix_name) %>% 
  mutate(
    gpa = case_when(
      grade == "A+" ~ 4.3,
      grade == "A" ~ 4.0,
      grade == "A-" ~ 3.7,
      grade == "B+" ~ 3.3,
      grade == "B" ~ 3.0
    )
  ) %>% 
  select(quarter, course, units, days, grade, gpa, location, lon, lat)

data_res <- 
  gs %>% 
  gs_read(ws = "Residential") %>% 
  rename_all(fix_name)

The first thing I was curious about was which building I spent the most time in for class. I wasn’t surprised to see that Wallenberg was the top destination by a wide margin, especially since my four quarters of Chinese language all met there five times a week. I was surprised (and a little saddened), however, to see how much time I spent in the basement of Sloan Hall, aka the Math Corner of Stanford’s Main Quad. It’s not a cheery place by any means.

data_geo %>% 
  mutate(days = days * 10) %>% 
  count(location, wt = days) %>% 
  arrange(desc(n)) %>% 
  rename(days = n) %>% 
  knitr::kable()
location days
Wallenberg Hall 280
Sloan Hall 120
Hewlett Teaching Center 100
Jordan Hall 90
Lane Hall 80
Lathrop Library 70
Encina Hall 50
Gilbert Biological Sciences 50
Cemex Auditorium 40
Pigott Hall 40
Bishop Auditorium 20
Braun Music Building 20
Cubberly Education Library 20
Geology Corner 20
Hume Center 20
Landau Economics Building 20
Margaret Jacks 20
McMurtry Building 20
Robert Crown Library Building 20
Braun Music Buidling 10
Cedro, Wilbur 10
Dinkelspiel Auditorium 10
French House 10
Gates Building 10
St. Hugh’s College, Oxford 10
Stanford House, Oxford 10

Visualization

Next, I used the leaflet and sf packages to create an interactive map of my life on campus. Although it would have been nice to retain the street/building labels, I’m a big fan of Stamen Design’s watercolor tiles.

The size of the red bubbles reflects my estimate of the time I spent in each building for class meetings. You can see that most of my time was spent around the northern wing of the Main Quad. The blue dots are where I lived each year.

data_geo %>% 
  filter(quarter != "Fall 2016") %>% 
  st_as_sf(coords = c("lon", "lat"), crs = 4326, remove = FALSE) %>% 
  leaflet(
    options = leafletOptions(minZoom = 14, maxZoom = 16)
  ) %>% 
  addProviderTiles(provider = providers$Stamen.Watercolor) %>% 
  addCircles(
    radius = ~ days * 10,
    color = NA,
    weight = 2,
    fillColor = "red",
    fillOpacity = 0.7,
    label = ~ course
  ) %>% 
  addCircles(
    radius = 25,
    color = NA,
    weight = 1,
    fillColor = "blue",
    fillOpacity = .7,
    label = ~ residence,
    data = data_res %>% filter(quarter != "Fall 2016")
  ) %>% 
  addLegend(
    position = "topright", 
    colors = c("red", "blue"), 
    labels = c("Class", "Dorm")
  )
## Assuming "lon" and "lat" are longitude and latitude, respectively

I also took a look at how my courseload and performance changed over the years. Winter quarters were also typically pretty heavy compared to the fall and spring. You can also see that I was a little overzealous as a sophomore, nearly hitting the 20-unit mark each quarter. I rewarded myself with a 12-unit quarter abroad during my junior fall in Oxford, and with a lighter schedule senior year (although sadly, I filled that extra time by writing a thesis and hunting for jobs).

data_geo %>% 
  separate(quarter, into = c("quarter", "year")) %>% 
  mutate(
    month = case_when(
      quarter == "Fall" ~ 10,
      quarter == "Winter" ~ 1,
      quarter == "Spring" ~ 4
    ),
    date = str_c(1, month, year, sep = "/") %>% lubridate::dmy(),
    ac_year = case_when(
      quarter == "Fall" ~ str_c(year, as.character(as.numeric(year) + 1), sep = "-"),
      TRUE ~ str_c(as.character(as.numeric(year) - 1), year, sep = "-")
    ),
    class_year = case_when(
      ac_year == "2014-2015" ~ "Freshman",
      ac_year == "2015-2016" ~ "Sophomore",
      ac_year == "2016-2017" ~ "Junior",
      TRUE ~ "Senior"
    ),
    class_year = fct_relevel(class_year, "Freshman", "Sophomore", "Junior")
  ) %>% 
  count(class_year, date, wt = units) %>% 
  ggplot(aes(date, n, group = class_year, color = class_year)) +
  geom_hline(aes(yintercept = median(n)), size = .3, lty = 2, color = text_color) +
  geom_line() + 
  geom_point() + 
  scale_y_continuous(
    breaks = c(5L, 10L, 15L, 20L),
    labels = c(5L, 10L, 15L, 20L)
  ) +
  scale_color_manual(values = custom_palette[c(1:3, 5)]) +
  custom_theme + 
  labs(
    color = NULL,
    x = NULL,
    y = "Course units",
    title = "Course load by quarter"
  ) 

You can also see what deciding to study statistics and data science in my sophomore fall did to my GPA.

data_geo %>% 
  group_by(quarter) %>% 
  summarise(
    qtr_gpa = weighted.mean(gpa, wt = units, na.rm = TRUE)
  ) %>% 
  mutate(
    gpa = cummean(qtr_gpa) 
  ) %>% 
  separate(quarter, into = c("quarter", "year")) %>% 
  mutate(
    month = case_when(
      quarter == "Fall" ~ 10,
      quarter == "Winter" ~ 1,
      quarter == "Spring" ~ 4
    ),
    date = str_c(1, month, year, sep = "/") %>% lubridate::dmy(),
    ac_year = case_when(
      quarter == "Fall" ~ str_c(year, as.character(as.numeric(year) + 1), sep = "-"),
      TRUE ~ str_c(as.character(as.numeric(year) - 1), year, sep = "-")
    ),
    class_year = case_when(
      ac_year == "2014-2015" ~ "Freshman",
      ac_year == "2015-2016" ~ "Sophomore",
      ac_year == "2016-2017" ~ "Junior",
      TRUE ~ "Senior"
    ),
    class_year = fct_relevel(class_year, "Freshman", "Sophomore", "Junior")
  ) %>% 
  left_join(
    data_geo %>% 
      separate(quarter, into = c("quarter", "year")) %>% 
      count(quarter, year, wt = units),
    by = c("quarter", "year")
  ) %>% 
  ggplot(aes(date, gpa, color = class_year)) + 
  geom_step(lty = 2, size = .3, color = text_color) + 
  geom_point(aes(y = qtr_gpa, size = n)) + 
  guides(size = guide_legend(override.aes = list(color = text_color))) +
  scale_color_manual(values = custom_palette[c(1:3, 5)]) +
  custom_theme + 
  labs(
    color = "Year",
    size = "Units",
    title = "GPA progression by quarter",
    y = "GPA (Four point scale)",
    x = NULL
  )

And here’s my take at a visual transcript, using the lovely viridis color palette (“P” means I passed a credit/no-credit course, which didn’t count toward my GPA).

data_geo %>% 
  mutate( 
    quarter = fct_relevel(
      quarter,
      "Fall 2014", "Winter 2015", "Spring 2015",
      "Fall 2015", "Winter 2016", "Spring 2016",
      "Fall 2016", "Winter 2017", "Spring 2017",
      "Fall 2017", "Winter 2018", "Spring 2018"
    ) %>% 
      fct_rev(),
    grade = fct_relevel(
      grade, "A+", "A", "A-", "B+", "B", "P"
    )
  ) %>% 
  group_by(quarter) %>% 
  mutate(
    row_num = row_number()
  ) %>% 
  ggplot(aes(row_num, quarter, fill = grade)) + 
  geom_tile() + 
  viridis::scale_fill_viridis(discrete = TRUE, direction = -1) + 
  custom_theme + 
  theme(axis.text.x = element_blank()) + 
  labs(
    x = NULL, 
    y = NULL,
    fill = NULL,
    title = "Quarter-by-quarter performance"
  )

But despite my occassional underperformance, I enjoyed most of my courses. One of the things I loved most about Stanford as an undergrad was the flexibility I had each quarter to design my schedule freely — there were very few restrictions on what classes I could take, and political science had relatively few requirements, as well. Here you can see what fields and departments I spent most of my time exploring.

data_geo %>% 
  separate(course, into = c("dept", "num"), sep = " ") %>% 
  count(dept, wt = units) %>% 
  mutate(
    discipline = case_when(
      dept %in% c("STATS", "CS", "MATH", "CEE", "ENGR") ~ "STEM",
      dept %in% c("POLISCI", "DDRL", "PSYC", "ECON") ~ "Social Sciences",
      TRUE ~ "Humanities"
    )
  ) %>% 
  ggplot(aes(reorder(dept, n), n, fill = discipline)) + 
  geom_bar(stat = "identity") + 
  scale_y_continuous(
    breaks = seq(0, 50, by = 10),
    labels = seq(0, 50, by = 10)
  ) +
  scale_fill_manual(values = custom_palette) +
  coord_flip() + 
  custom_theme + 
  theme(legend.position = c(.75, .6)) + 
  labs(
    x = NULL,
    y = NULL,
    fill = NULL,
    title = "Distribution of units across fields and departments"
  )

Conclusion

I really enjoyed putting this together — remarkably, I think it did help me see these past four years in a new light. Of course, there’s a lot about my time at Stanford that doesn’t fit into a spreadsheet, but it’s worth noting that even the limited things that do have helped me reflect more thoughtfully on my college experience.

While this was a very personal project, I think the lesson is universal: if there’s data in your life that you think might be meaningful, take the time to write it down! You never know what you might find.