D2M Syllabus

From Data to Manuscript (D2M) in R

Winter 2024: CHDV 20550/30550, MACS 30550, MAPS 30550, PSYC 20550/30550

Overview

Description

This course tackles the basic skills needed to build an integrated research report with the R programming language. We will cover every step from data to manuscript including: Using R’s libraries to clean up and re-format messy datasets, preparing data sets for analysis, running statistical tools, generating clear and attractive figures and tables, and knitting those bits of code together with your manuscript writing. The result will be a reproducible, open-science friendly report that you can easily update after finishing data collection or receiving comments from readers. Never copy-paste your way through a table again! The R universe is large, so this course will focus specifically on: The core R libraries, the tidyverse library, and R Markdown. Students will also learn about the use of GitHub for version control.

Weekly meetings:

  • Tuesdays and Thursdays 11am - 12:20pm
  • 1155 E 60th St, Room 289B

Instructor:

  • Dr. Natalie Dowling
  • Email: ndowling@uchicago.edu
  • GitHub: @nrdowling
  • Office: 1155 Building, Room 404
  • Office hours: Tues/Thurs 4pm - 5pm - sign up via GCal

Teaching assistant:

  • Grace Huang
  • Email: ysh@uchicago.edu
  • GitHub: @ysh627
  • Office: Green 208
  • Office hours: Wednesdays 12-1:30pm (email for alternative appointment)

Hubs:

  • Course site & syllabus (you are here!)
  • Piazza
  • Canvas
  • Example repo: schelling-games
    • Duplicate this repo to poke around a (mostly) functional github pages site created entirely using R Markdown. Most of the .Rmd and .R files include extensive commenting, including some ideas for how you can practice coding by revising or adding to those R scripts directly. (I recommend duplicating rather than mirroring. Set up a new repo via “import” using the same url copied from the green code tab. You’ll have a completely different repo – no need to worry about messing up the original. If you just clone it, it will still be attached to the original, but your changes still won’t push because you aren’t a collaborator.)
  • Class repo: d2m-2024
    • Begins empty at the start of the quarter. Will be updated with in-class demos and data as we go.

Course materials

Dataset

Nearly all assignments in this course will involve working with a dataset of your choice. We will begin with setting up a repo for your data and R scripts, then move to tidying the data, then to analyzing and plotting the data, and finally to interpreting the results based on those data. In other words, having an appropriate dataset as early as possible is essential. You will get the most out of this class if you work with your own data for a project you will continue to be invested in after the end of the quarter, like a BA or MA thesis, grant proposal, or manuscript intended for publication. If you are not currently working on this type of project, you should consider taking this class next year (or whenever you do have data). Whether or not you use your own data, you are responsible for providing your dataset. Read the dataset selection guidelines for more information.

Grading

Students enrolled in this course will be graded on the following basis:

  • Short assignments: 60% (graded complete/incomplete, 3% each)
  • Scientific report: 20% (including 2 ungraded preparatory tasks)
  • Participation: 20% (including self-assessment)

Assignments

There is a short assignment for each lecture. Each assignment will involve working with your own data, data provided for that assignment, or both and is designed to give you hands-on practice with using R. Assignments typically begin in class and may be completed after class (each one is highlighted in orange) on a pass/fail basis. Students must push their completed assignment to the appropriate GitHub repository, tagging and Dr. Dowling (@nrdowling) and Grace (@ysh627) with the assignment number so that they receive a notification. There are 20 assignments total, each worth 3% of your final grade.

Assignment details are included at the end of the lecture slides.

When are the assignments due? We will usually start assignments in class. Anything you don’t finish during class time should be completed before the start of the next class meeting. If an assignment is listed under “pre-class preparation,” it should (hopefully unsurprisingly) be completed in preparation for the start of that class.

Need help? We will do much of this work in class, so ask your questions while we’re together. Each student will be assigned to a support group of other students to help answer questions (in addition to access to help from Dr. Dowling and Grace during office hours). A large portion of the course is developing your search skills when it comes to debugging your code (look for advice and solutions from others who have faced similar problems on sites like stackoverflow and github)! Week 2 slides will include a formal procedure to follow for troubleshooting help.

Scientific report

Your final assignment will be a scientific report in R Markdown that is developed via a GitHub repository and includes:

  1. data read-in, pre-processing, and analysis
  2. at least 2 figures and 1 table with captions and in-text references
  3. at least 2 in-text R code references
  4. BibTeX citations
  5. at least 1500 words of the manuscript in at least four subsections (Introductions, Methods, Results, Discussion; unless otherwise agreed with Dr. Dowling).

Additional details about report requirements will be provided throughout the quarter.

Items on the syllabus in green indicate preparatory tasks for your final report: receiving approval for your dataset (by Tuesday of Week 2) and negotiating a manuscript plan (by Friday of Week 7). Each requires you to submit something informal via Canvas. These tasks are not graded, but grading for your final report will be contingent on completing both on time. Failing to meet with me for a report meeting will result in an automatic 10 points off your final report grade.

Participation

Students earn participation credit through their attendance and participation in their support group and discussion of course content in office hours. Students are generally expected to come to class prepared to practice using R together. That means you should come with your charged laptop, prepared with any data you are using and any installed software that is required (see the ‘preparation’ note for each class). Be prepared to share your screen with your support group and with the instructors. That said, please stay in touch about your limitations regarding in-person participation and/or technology access.

This grade is not an attendance grade. It is much more important that you care for your mental and physical health and the health of those around you than it is to check the “came to class” box. Please do not come to class if you are ill, but DO contact Dr. Dowling and Grace at least 24 hours in advance of an absence, when possible. Missing classes without notifying us may result in a lower participation grade. If your attendance is low enough to impact your work (with or without notifying us) you will need to meet with Dr. Dowling to discuss how it may affect your grade.

Self-assessment

At the end of the quarter you will write a brief reflection (1-2 paragraphs) on your participation in this class and assign yourself a grade (out of 20).

Your self-assigned grade may or may not end up as your final participation grade, but it will be strongly taken into consideration.

Grading

Participation grades will be posted to this assignment at the same time your final scientific report is graded. Once assigned, participation grades are final. If you are concerned about your grade you should reach out to Dr. Dowling or Grace before the end of the quarter, who can give you guidance on ways to increase this part of your grade.

Late work & attendance

Life happens. Family emergencies, exams for other courses, COVID-19, roommates in crisis…

If you need an extension, just ask. Send an email to Dr. Dowling (cc Grace) as far in advance as possible explaining what’s going on (vague!) and how much extra time you need (specific). It is never necessary to provide details about your physical or mental health. You are always welcome to have an open discussion about what you’re dealing with in office hours, but in your emails about absences it is more than sufficient to just say you’re sick or having a personal emergency.

With prior notice and instructor approval, late work will not be penalized. Students who register for the class after one or more assignments were already due should contact Dr. Dowling to arrange an extended deadline.

Without notice and/or approval, or after the agreed upon extended deadline:

  1. Short assignments will be graded as incomplete, receiving 0 (of 3) points.
  2. Final scientific reports will lose 1 (of 20) points per 12 hours late.

If you will be missing class for any reason, email both Dr. Dowling and Grace before the start of class. Please do not come to class if you are feeling at all ill. You do not need to be specific about why you will be absent, but please do give a guess as to when you’ll be back in class and whether you’ll be able to work remotely while you’re out.

Stressed out by emailing professors? Here are some good examples of what is helpful for your instructors.

I have exams in two other classes 5th week. Is it ok if I take an extra 48 hours to turn in Assignment 5?

I just found out I need to quarantine after a COVID exposure. No positive test as of now, so I’ll plan on missing classes Tuesday and Thursday but completing the assignments at home. I’ll let you know if anything changes.

I have a family emergency and need to travel home. I won’t be able to work while I’m home and I’m not sure how long I’ll need to be away. Can we meet over Zoom to make a plan to handle late assignments and absences?

Course schedule

This schedule is subject to change. Refer to this page for syllabus updates (including new and updated links, files, etc.) throughout the quarter. Slides will typically be posted the morning of class.

For each day of class the schedule below includes an overview of what we’ll cover in class and any key information you’ll need for that day.

The accordion boxes contain detailed information about all tasks you should complete. These boxes will describe things you need to do both before and after that class, so you should always read details at least one class ahead. Tasks are color coded:

  • Ungraded assignments, preparatory tasks, readings, and resources are in burgundy. Required readings are marked with as asterisk and should be completed before start of class.
  • Graded assignments due before the start of next class (unless otherwise noted) are in orange.
  • Preparatory tasks for the final project are in green.
  • Extra information necessary for that class period is in blue.

Module 1: Fundamentals of R and GitHub

Thursday, January 4 2024 (1.1+1.2)

Pre-quarter preparation

Because Winter 2024 begins on a Wednesday, we will not meet for the first lecture (which would otherwise be Tuesday Jan 2). It is critical that you come to the first class meeting on Thursday January 4th having completed the getting started tasks – start by watching the short recorded lecture! – and Assignment 1.

Class meeting

slides

  • Introductions & Course Structure
  • Lecture: R Scripts & Packages, and Markdown
  • Groups: Assign support groups & discuss datasets
  • Assignment: (2) Initialize a papaja .Rmd

Submit your dataset for approval by Friday Jan 5 at 11:59pm. This should be done via email, not GitHub. Friday is the absolute deadline, but the earlier the better! After you receive approval, complete the “Select dataset and get approval” assignment on Canvas to receive completion credit.

Week 1 Details

  • Set up your GitHub repo w/ RStudio
    • Create a local R project by cloning a GitHub repo
    • Invite @nrdowling (Natalie) and @ysh627 (Grace) as collaborators to your repo
      • If you intend to use a private repo with protected data, invite collaborators now, but wait on uploading the data.
      • Once you are assigned to a support group you will also invite your group members as collaborators.
    • Make 1+ change to the README, commit changes with an informative message, and push to GitHub
    • REVERSE a change on GitHub, pull to your local project/repo, and review the changes
    • Create a GitHub issue (not a commit!) with the text:
    Title:Assignment 1.1
    Body:@nrdowling @ysh627 I have completed the following:
    [x] I have invited you both as repo collaborators
    [x] I have cloned this repo to create an R Studio Project
    [x] I have made at least 1 commit
  • Install the tidyverse and papaja packages
  • Create an R script file (.R) that:
    • Loads the tidyverse and papaja packages
    • REQUIRES 1+ other package of your choice
    • Includes 1+ lines of commented text
    • Assigns 1+ numeric variable and 1+ string/character variable
    • BONUS: Create a dataframe
  • Create an R Notebook with the Papaja template
    • Change the title and author fields at the top
    • source() your R script in the first code chunk
  • FOR THIS AND ALL GRADED ASSIGNMENTS: Commit (with an informative message) and push to GitHub tagging Dr. Dowling and Grace
  • Ungraded final step (no submission): knit .Rmd to Word doc or PDF and troubleshoot as needed
  • Above-and-beyond: Experiment with creating .Rmd files without papaja
    • We’ll only use papaja in this class, but knowing how to make use of .Rmds with other outputs might be very useful for you. For example if you want to make an html page (like this one!) or produce manuscripts in non-APA styles.
  • Papaja installation is trickier than other packages for a few reasons:
    • It is not currently available through CRAN and so cannot be installed with the install.packages() function. Papaja is now available through CRAN! Hooray! That means you can install it just like anything else, with the function install.packages("papaja"). These directions for installing it directly from the source will still work, but the CRAN installation should be much simpler.
    • It requires manually setting up a TeX distribution – like TinyTex (recommended), MikTeX, MacTeX, or TeX Live – which can have their own installation issues.
    • Because the installation is a relatively involved process, it can be difficult to determine exactly where in the process errors are occurring.
  • So what are you supposed to do about it?
    • Follow the installation guide in papaja’s detailed documentation. I recommend starting with this guide rather than going to it when you hit a problem. Past students have encountered an issue where once their TeX distribution was installed in an incompatible way, they couldn’t undo it.
    • If it doesn’t install smoothly, set a timer for 1 hour and do your best to troubleshoot the issue. If you can’t fix it in that time, makes notes on what you’ve done so far and what you think might still be going wrong and then stop messing with it.
  • Adjust your assignment submission as follows:
    • Create an .Rmd with the default Notebook template (& update the YAML title field)
    • Include the line of code that would load the papaja package, then comment out that line
    • Add your notes from above about the state of the installation
  • Review the dataset guidelines
  • Once you are confident your dataset meets the requirements, request approval via email by Friday Jan 5 at 11:59pm:
    • Send the email to Dr. Dowling (ndowling @ uchicago.edu) and cc Grace (ysh @ uchicago.edu)
    • Subject line should be: (Your first & last name) Dataset Selection
  • Your email should include:
    • A brief (2-4 sentences) description of your data
    • The data itself, in whole or part; this may be an attached .csv file, a link to a shared document, or instructions for where to find the data on the github repo you have already invited us to
      • If you are unable to share the data with us via email (due to IRB limitations or anonymization concerns), explain this issue and a tentative plan for how you will get the data into a sharable state (i.e., any data necessary to run your analyses is included in a private github repo) within one week
    • At least one research question you tentatively plan to ask that can be answered with this dataset in the state it is currently in
      • Specifically, your question should be answerable with an outcome variable in your data right now
      • It’s ok if it’s a boring question, and it’s ok if you do not end up answering this question in your final project
  • Once you receive a reply approving your data from either Dr. Dowling or Grace, complete the Canvas assignment to receive completion credit
    • Due before class on Tuesday Jan 9th 11am, meaning you will need to have gotten approval over the weekend
    • In the text box, include a brief but descriptive name for your dataset and the sender (Dr. Dowling/Grace) and timestamp of the approval email
  • NOTE: If you received approval for your dataset when you requested enrollment consent, you do not need to request it again. Simply complete the Canvas assignment to receive credit.

Tuesday, January 9 2024 (2.1)

Pre-class preparation

Take the weekend to catch up on any pre-quarter preparation tasks not yet completed. It is critical that you are fully caught up by today or you will not be able to follow along with demos and assignments. Additionally, confirm that your dataset approval is marked as complete on Canvas (contact Dr. Dowling if it is not marked as complete by start of class).

Class meeting

slides

  • Lecture: Best Practices & Troubleshooting
  • Assignment: (3) Create and edit your README.md and .gitignore files

Week 2 / Class 1 Details

  • Part 1: README
    • Create a README.md file in the top level of your repo if it doesn’t exist
    • Edit your README to include:
      • A brief description of your data
      • A brief overview of your planned project
      • A real or hypothetical file tree
        • Format of your choosing
        • This can be extremely simple, but should demonstrate you’ve thought about how plan to keep your repo organized.
    • At a minimum, your README should include the following markdown elements:
      • 1 header (any level)
      • bold italics or strikethough
  • Part 2: .gitignore
    • Create a .gitignore file in the top level of your repo if it does not exist
      • You can start with GitHub’s default R .gitignore template
    • Edit the .gitignore to include:
      • A dedicated directory called localonly
      • All files of specific types that you want on your computer but don’t want to upload to GitHub
      • One specific file that is not in an ignored folder and is not an ignored filetype
    • Add a comment for each

NOTE: A primary goal of this assignment is for you to practice answering your own questions. We have not formally covered either of these files in detail. It’s you job to figure out the point and the implementation of each. There will be a slide in the W2C2 lecture with “verbose” instructions for the .gitignore. Do your best at this point, then make adjustments as needed after next class.

Thursday, January 11 2024 (2.2)

Pre-class preparation
  • Confirm tidyverse and papaja packages are successfully installed and load without errors.
    • Remember that papaja in particular often requires troubleshooting!
  • Read:
    • Intr2R Chapters 7 & 8
    • R Markdown Definitive Guide Sections 2.5, 2.6, & 3.2
Class meeting

slides

  • Lecture: Fundamentals of R & Markdown
  • Assignments:
    • (4) Create hello_world() function
    • (5) Update .Rmd YAML, markdown, and code chunk
    • Note that these are 2 separate assignments! You will reference the .R file you create in Assignment 4 in your Assignment 5 .Rmd file, but they will count as two assignment credits because each will be as much effort as the single assignment most other days. The hello_world() function in (4) may be particularly challenging if you are brand new to programming. Plan accordingly!!

Week 2 / Class 2 Details

  • Create a hello_world() function in an .R script dedicated to defining functions
    • This should be the script you eventually source in your .Rmd with project-specific functions (if necessary)
  • Your function should include:
    • 1+ object assignment
    • 1+ conditional statement
    • CHALLENGE: 1+ for or while loop
  • Your functions should take at least 1 argument, such as:
    • name (string)
    • time_of_day (numeric or POSIX)
    • is_morning (boolean)
    • return_n_greetings (integer)
  • Your function should have at least 2 possible return values, such as:
    • hello, class
    • Good morning, Dr. Dowling!
    • Sup?
    • bonjour mes amis
  • Feel free to get creative here! This will feel less tedious if you can make yourself laugh while you do it. (It will make grading more enjoyable, too.)
  • Update the YAML header of your .Rmd, minimally:
    • Title
    • Short title
    • Author(s)
  • Call your hello_world() function within your .Rmd:
    • source() the R script where you defined the function
    • Add a chunk that calls the function
    • Don’t forget to follow best practices for naming and placing chunks!
  • Add/edit markdown to include at least (lorem ipsum is ok):
    • 1st, 2nd, & 3rd headers (1 of each)
    • 1 unordered list & 1 ordered list
    • Bold & italicized text
    • 1 linked URL
    • 1 HTML-style comment

Module 2: The Tidyverse

Tuesday, January 16 2024 (3.1)

Pre-class preparation
  • Read chapters 7 (data import) and 3 (data transformation) in r4ds.
  • Download and preview the cheatsheets for readr and dplyr.
    • Grab the ones for ggplot2, tidyr, forcats, and stringr if you want to look ahead!
  • Ensure that your data - or a portion of it - is available in a tabular format (like a .csv file or 1 sheet of an Excel file or GoogleSheets document)
  • Have RStudio and your tabular data file available to work with at the start of class!
Class meeting

slides

  • Lecture: Welcome to the Tidyverse
  • Assignment: (6) Data read-in and -out, tidy data evaluation

Week 3 / Class 1 Details

  • Create (or update) a code chunk in your .Rmd for data read-in
    • Use readr to read in your data (and any supplementary data files you may have)
    • Don’t forget to give your chunk a unique and informative name!
  • Examine your dataset in RStudio. Answer these questions with comments in your read-in chunk
    • Is it tidy? Remember that “tidy” means:
      • Each column is a single variable
      • Each row is a single observation
      • Each cell is a single measurement
    • If not, what data wrangling is necessary to tidy it?
    • Which variables, observations, and measurements in your data are absolutely necessary for your plans? Which are (likely) extraneous?
  • Create (and name!) a code chunk to create an intermediate dataset
    • Use 2+ dplyr functions
    • Use readr to write your intermediate dataset to a .csv, .xlsx, or other tabular filetype
  • Examine your new intermediate data file in Excel (or similar)

Thursday, January 18 2024 (3.2)

Pre-class preparation
  • Review the cheatsheets for core tidyverse packages
  • Read chapters 5, 14, & 16 (& optionally ch15) in r4ds
  • Pull the most recent version of the d2m-2024 repo to try out the tidyverse functions demo code in demo-snippets.R.
Class meeting

slides

  • Lecture: Data Preparation in the Tidyverse
  • Assignment: (7) Mock up a preliminary dataset & start data prep

Week 3 / Class 2 Details

  • Conceptualize the dataset you’ll need to create (this is just to get started, not a submission). Some ideas (not tasks!) to help get you thinking:
    • Draw out a couple mock figures you’d like to include and/or articulate your research questions and hypotheses
    • How will you need to organize your variables (and within-variable groups) to produce those figures and/or analyses? What data types will your variables need to be? - Does anything that currently looks like string, numeric, or logical data need to be handled as factor?
  • Mockup a preliminary dataset & begin prep - PUSH TO YOUR REPO:
    • A tabular mock-up of your ideal data structure. Create this in Excel (or similar) either by manipulating your actual data manually or creating fully mock data
      • You can do this in R if you want, but the goal is more to get a solid concept of data structure rather than actually executing the data wrangling
    • Building on or replacing the code you wrote in assignment (6), create an .R script called data-prep.R:
      • Use comments to write out in plain English the prep steps you expect to carry out in order to make your existing data look like your mock-up, including data import, wrangling, and exporting intermediate datasets
      • Make note of anything that you know you’ll need to do conceptually but don’t yet know how to execute in R
      • Begin to fill in the parts of your data prep you feel comfortable starting, e.g., import/export, filtering/selecting, object assignment

Tuesday, January 23 2024 (4.1)

Pre-class preparation
  • No new material this week! We’ll walk through a data-prep demo and work in groups on Assignment (8) (and your own data prep time permitting).
    • Tiny bit of new material: combining data with binds and joins
  • Come prepared to make the most of your time by being thoughtful and thorough in completing Thursday’s assignment (7) & having quick access to the tidyverse documentation and cheatsheets from last week
  • Office hours appointments for the required final report planning meetings begin this week and continue through Friday of 7th week (Feb 16). Book a 20 minute slot here. See box below for more details.
Class meeting

slides

  • Demo: Data Manipulation in the Tidyverse
    • Plus super-quick overview of combining data in R
  • Assignments:
    • (8) Recreate this dataset
    • Continue your data preparation

Week 4 / Class 1 Details

  • Office hours appointments for the required final report planning meetings begin this week and continue through Friday of 7th week (Feb 16).
  • Book a 20 minute slot here. See box below for more details.
  • The intention of these meetings is to set clear expectations for your final project, based both on general course requirements and your specific project goals. You should come prepared to discuss your project’s aims and current status, but it is not necessary to bring in any concrete work. Please sign up for regular office hours outside these appointments if you’d like more hands-on support for your project.
  • Note that this appointment schedule includes literally all my availability. Do not delay booking an appointment, as I will not be able to accommodate requests to meet outside these times.
  • Recreate the modified starwars dataset:
    • Copy the file starwars.R in the d2m-2024 repo into your own project repo
    • Follow the instructions in the file to import and examine the “goal” tibble
    • Manipulate the built-in starwars tibble to create a new object called sw.wrangled
      • When you type sw.wrangled into the console, you should see exactly what’s in the image: d2m-2024/images/starwars-goal.png
      • Don’t forget to check number of rows and column data types
    • If you can’t figure everything out, add comments explaining what you would like to do but aren’t sure how

Continue prepping your data! We will finish data wrangling on Thursday. Your data should be fully prepped and ready for analysis and visualization in the tidyverse by Week 5.

Thursday, January 25 2024 (4.2)

Pre-class preparation
  • No new material! Aside from a brief discussion about final projects, the whole class will be time to work on data preparation with support from your group and instructors. Review and adjust your plans from assignment (7) ahead of time to make the best use of your time.
Class meeting
  • Brief discussion of final report requirements
  • Lab: Data Manipulation in the Tidyverse
  • Assignments:
    • (9) Finish prepping your data!
      • Before your final commit, be sure that your scripts include sufficient and informative comments
    • (10) Imagine two plots you would like to see of your data. Add comment blocks into your rmd describing (in plain English, not code) what elements will be necessary to create each plot.

Week 4 / Class 2 Details

Finish preparing your data for analysis and presentation! Use tidyverse tools to create the mock-up data you created in Assignment (7). Be sure your data frames(s) is/are in a tidy, long format. In particular, as you imagine plots you’d like to create in Assignment (10), be sure that any measures you want to include on x or y axes are each in one column and that any groups you might want to compare against each other are in one column

e.g.: You want to compare number of gestures used by parents and children. Your original wide, untidy dataset includes 2 measurement columns: parent_gestures and child_gestures. Your long, tidy has one speaker column with values parent and child and one n_gestures column with the number of gestures produced by that subject at the observation.

  • Mock up 2 plots (of 2 different types) you could include in your final report
    • Add code chunks to your .Rmd for each plot and give each a unique, informative name
    • Add plain-English comments within your chunks explaining the goal for the plot. To start thinking about this, ask yourself questions like:
      • What variables are on the axis/axes?
      • What kind of plot is it?
      • Will you need to compare across groups?
      • What would you anticipate seeing if your hypotheses were borne out in your analyses?
  • Create a visual of each mock-up that render in your .Rmd when you knit. This can be either:
    • A first attempt at the ggplot code itself if you are already familiar with ggplot
    • .jpg/.gif/.png files created outside R (e.g., in Excel, in MS paint, sketched by hand or on an iPad) and called in the text below the corresponding code chunk using R markdown
      • Look back at the slides or the R Markdown cheatsheet if you don’t remember how to insert an image file!
  • Due by 5pm Friday Week 7: Feb 16, 2024
  • After your report plan meeting with Dr. Dowling, write up a brief summary of what you discussed and submit to the “Report plan” assignment on Canvas. Your write-up should include:
    • The specific plans for you analysis, figure, and table chunks
    • A general description for how you will distribute the 1500+ words across 4 sections
    • At least 2 elements of code produced in code chunks that you can referent in text (e.g., a p-value from the model you run in your analysis chunk, a value within the table you created in the table chunk, a summary value like mean or median that you calculated and stored as a variable)
    • 2-4 sentences explaining your personal goals for the final report. What do you want to have accomplished or learned by the time you submit your report at the end of the quarter?

Module 3: Presentating Data

Tuesday, January 30 2024 (5.1)

Pre-class preparation
  • Remember to book your final report meeting with Dr. Dowling and submit the meeting summary to Canvas (see details in W4C2 box above)
  • Read chapters on data visualization and ggplot in r4ds: 1* and 9, 10, 11*
  • Review the ggplot2 cheatsheet
Class meeting

slides

  • Walkthrough of Assignment (8) Recreate this dataset
    • Follow along with script in d2m-2024 repo
  • Lecture: Introduction to Data Visualization and ggplot2
  • Assignment: (11) Create basic plots (2 parts: recreate these plots & add 1 plot to .Rmd)
    • submit BOTH to receive credit for Assignment (11)! Commit your final submission of both at the same time in just one commit

Week 5 / Class 1 Details

  • submit BOTH tasks to receive credit for Assignment (11)! Commit your final submission of both at the same time in just one commit
  • Part 1: Recreate these plots (basic)
    • In your existing starwars.R file from Assignment (8), replicate the following three plots using dplyr and ggplot2 functions (these can be found in the d2m-2024 repo)
    • You do not need to save these as image files
    • If plotting the dataset you created in Assignment (8) isn’t working as expected, reproduce it with the code provided during the class walkthrough
  • submit BOTH tasks to receive credit for Assignment (11)! Commit your final submission of both at the same time in just one commit
  • Part 2: Add a plot to your .Rmd
    • Add code to one chunk created in Assignment (10)
    • This can be very basic for now, but should at an absolute minimum contain the appropriate geom layer with correctly specified aesthetics
    • Add/edit comments to make note of what you want to do to “fine tune” or “pretty up” your plot (e.g., changing theme, adding labels, changing how axis and legend labels appear, grouping by color, faceting, etc.)

Thursday, February 1 2024 (5.2)

Pre-class preparation
  • Refer to the ggplot2 resources from last class!
Class meeting
  • Lecture: Aesthetics & Layers
    • Going rogue: Messing around with ggplot live on air
  • Assignment: (12) Create intermediate plots (2 parts: recreate these plots & add 1 (more) plot to .Rmd)
    • submit BOTH to receive credit for Assignment (12)! Commit your final submission of both at the same time in just one commit

Week 5 / Class 2 Details

  • submit BOTH tasks to receive credit for Assignment (12)! Commit your final submission of both at the same time in just one commit
  • Part 1: Recreate these plots (intermediate)
    • In your existing starwars.R file from Assignments (8) & (11), replicate the following three plots using dplyr and ggplot2 functions (these can be found in the d2m-2024 repo)
    • You do not need to save these as image files
    • If plotting the dataset you created in Assignment (8) isn’t working as expected, reproduce it with the code provided during the class walkthrough
  • submit BOTH tasks to receive credit for Assignment (12)! Commit your final submission of both at the same time in just one commit
  • Part 2: Tune up your plot and add another to your .Rmd
    • Tune up the plot you made in Assignment (11), for example think about:
      • Did you use the best kind of plot to tell the story you’re hoping to tell?
      • Did you have to do some wonky wrangling to plot what you wanted?
      • Are there adjustments you could make to scale, coord, or labs layers that would make things easier to read?
      • Are you effectively making use of groups? Could you communicate your point better by grouping in a different way (e.g., color, shape, fill, facet)? Add code to one chunk created in Assignment (10)
      • Add/edit comments to make note of what you want to do to make your plot “pretty” like customizing colors or fonts, tweaking the theme, or relabeling in the legend
    • Add in the second plot you planned in Assignment (10). At a minimum, replicate what you did in Assignment (11) for the first plot. Add some “tune-ups” if you can!

Tuesday, February 6 2024 (6.1)

Pre-class preparation
  • Refer to the ggplot2 resources from last week
  • Read up on ggplot theme layer customization in the documentation and ggplot2-book
    • These are going to be essential references when you do Assignment (13)!
  • If you haven’t signed up for a report meeting, do that right now. Slots are filling up fast, and report meeting slots during week 7 are competing with MAPSS pre-registration meeting slots. I cannot meet outside the listed appointment times. Failing to meet with me for a report meeting will result in an automatic 10 points off your final report grade.
Class meeting

slides

  • Lecture: “Pretty Plots” & .Rmd Figure Chunks
  • Assignment: (13) Create advanced plots (2 parts: recreate this hideous plot & tune up your own plots
    • submit BOTH to receive credit for Assignment (13)! Commit your final submission of both at the same time in just one commit

Week 6 / Class 1 Details

  • submit BOTH tasks to receive credit for Assignment (13)! Commit your final submission of both at the same time in just one commit
  • Part 1: Recreate this plot
    • In your existing starwars.R file from Assignments (8), (11), & (12), replicate this using dplyr and ggplot2 functions (these can be found in the d2m-2024 repo)
    • You do not need to save this as an image file
    • If plotting the dataset you created in Assignment (8) isn’t working as expected, reproduce it with the code provided during the class walkthrough
    • There’s a lot going on visually in this plot. Start by creating a base plot with just the required aes() mappings and geoms, then making a commented list of all the different modifications you can spot, whether or not you have any idea how to replicate them. Some hints:
      • You’ll need 13 elements in your theme() layer after setting one of the built-in themes
      • Install, load, and investigate the ggsci package to choose a scale layer
      • Look for a list of “web-safe font families” to find the right fonts
      • Neither the limits nor breaks arguments are required in your axis scales
      • Remember that facet text can’t be modified in the labs() layer
      • You can use a color selector (i.e., eye-dropper) tool like this one to identify precise colors
        • If something doesn’t look right even with the precise hex code, it could be an effect of transparency (“alpha”) somewhere along the way
    • Try to match as much as you can, but don’t let it drive you crazy. Set a timer to keep working, then add/edit comments to describe what is still left to be done. Your completion grade is going to be based on putting in a good-faith effort, not getting a perfect replication
  • submit BOTH tasks to receive credit for Assignment (13)! Commit your final submission of both at the same time in just one commit
  • Part 2: Tune up your plots & .Rmd code chunks
    • Tune up the plots you made in Assignments (11) & (12), for example think about:
      • Did you use the best kind of plot to tell the story you’re hoping to tell?
      • Did you have to do some wonky wrangling to plot what you wanted?
      • Are there adjustments you could make to scale, coord, or labs layers that would make things easier to read?
      • Are you effectively making use of groups? Could you communicate your point better by grouping in a different way (e.g., color, shape, fill, facet)?
    • “Prettify” your plots with scale_*(), theme(), and labs() layers, as well as “aesthetic” non-aesthetics (i.e., visual attributes outside the aes() and so not mapped to your data)
    • Use code chunks in your .Rmd to render publication-ready plots when knit that:
      • Are in their own uniquely named chunks
      • Have a caption defined and referenced in the chunk’s fig.cap option
      • Include appropriate and reader-friendly labels and text (no inscrutable axis or level names!)
      • Include a theme layer with 1+ customized element (e.g., font, legend placement, background color)
    • Bonus: Add a chunk that reads in one image file, like a photo, diagram, or saved ggplot
      • Give it a chunk caption just like with your plot chunks

Thursday, February 8 2024 (6.2)

Pre-class preparation

Class meeting

slides

  • Lecture: Tables & Kables
  • Assignment:
    • Recreate this kable
    • (14) Add a table/kable to your .Rmd

Week 6 / Class 2 Details

  • Add a table/kable chunk to your .Rmd. Include:
    • A unique chunk name
    • An informative caption
    • Reader-friendly column names
    • At least 3 style modifications, for example (but not limited to):
      • grouped header rows
      • font or color changes
      • table alignment or positioning
      • scaled table size
  • Reference your table and both your figures from your recent assignments in the text of your .Rmd. Be sure that when you knit the document you see the figures numbered in the order you expect.

Tuesday, February 13 2024 (7.1)

Pre-class preparation
  • (15) Imagine (at least) two analyses you would like to run with your data, including 1+ descriptive analysis and 1+ hypothesis testing analysis. Add comment blocks into your rmd describing (in plain English, not code) the analyses you intend to run, for example listing the independent and dependent variables, explaining your hypothesized results, and identifying precisely which tests are appropriate.
    • This is a good place to check in with your co-authors or advisor before moving forward!
  • REMINDER: You must meet with Dr. Dowling by the end of this week (i.e., before end of day Friday, February 16th) to negotiate your plan for the scientific report assignment, which is due on Tuesday of Finals Week.
    • Don’t delay! More than 25 of these meetings need to happen; don’t count on available meeting times at the end of 7th week. You can sign up for office hours any time in weeks 5-7 to make your report plan. Extra slots will be added to my office hours sign-up schedule for these weeks. Look out for a Canvas announcement sometime in week 3 or 4 for when those times are posted.
    • Failing to meet with me for a report meeting will result in an automatic 10 points off your final report grade.
Class meeting

No slides this week! Demo only.

  • Demo: Data Analysis Part 1
    • Refer to stats-demo.Rmd in the d2m-2024 repo
  • Assignment: (16) Add 2 analysis chunks - 1 descriptive, 1 hypothesis testing
    • And complete Assignment (15) in the pre-class prep if you didn’t before class!

Week 7 / Class 1 Details

Imagine (at least) two analyses you would like to run with your data, including 1+ descriptive analysis and 1+ hypothesis testing analysis. Add comment blocks into your rmd describing (in plain English, not code) the analyses you intend to run, for example listing the independent and dependent variables, explaining your hypothesized results, and identifying precisely which tests are appropriate.

  • Based on your plans from Assignment (15) (go complete that if you haven’t already!) fill in your planning chunks with basic analyses
    • One chunk should include descriptive statistics and another should include a hypothesis test
      • More than one of either or both works too! Just make sure you have at least one of each
    • For anything we covered in class today (descriptives and basic hypothesis testing), create as complete and polished code as possible
    • For anything more advanced, explore the psych, lme4, and stats packages and give it your best shot based on the info you can find. You’ll clean it up after Thursday’s class.

Thursday, February 15 2024 (7.2)

Pre-class preparation

Class meeting

No slides this week! Demo only.

  • Demo: Data Analysis Part 2
  • Assignment: (17) Create a partial draft of methods/results with 5 in-line code references

Week 7 / Class 2 Details

  • Complete the code chunks for the two analysis chunks you have been working on in Assignments (15) and (16)
  • In the narrative of your R Markdown manuscript, include at least 5 in-line code references
    • By “in-line code” I mean using R Markdown to interpret some text outside of a code chunk as R code, so that it runs it as code and displays the resulting output as fully in-line in your regular old text
      • Look at the stats-demo.Rmd file in the d2m-2024 repo for lots of examples! Anything that starts with r is what I mean here.
      • Remember that just using backticks without the r is just formatting, not code. (Everything on this page that looks like code in monospaced font is done that way.)
    • At least 1 of these in-line code refs should “do something.” In other words, it should be more than just a value reference; it should include a function.
      • Examples of value calls:
        • `r knitr::inline_expr("model$p.value")`
        • `r knitr::inline_expr("median_mpg")`
        • `r knitr::inline_expr('mass_height_corr3$ci2[["r"]]')`
      • Examples of in-line function code:
        • `r knitr::inline_expr("apa_p(model$p.value)")`
        • `r knitr::inline_expr("nrow(filter(mtcars, mpg > median(mpg)))")`
        • `r knitr::inline_expr('round(mass_height_corr3$ci2[["r"]], 3)')`

Module 4: Creating a Manuscript

Tuesday, February 20 2024 (8.1)

Pre-class preparation
  • Review the readings & references box below to get a head start installing Zotero, Better BibTeX, and the citr package.
  • Pull the most recent version of the d2m-2024 repo and browse the bibtex-demo directory.
Class meeting

slides

  • Lecture: BibTeX References, citr, & Zotero
  • Workshop: Set up your ideal citation enviornment and start citing
  • Assignment: (18) Create a .bib file with 5+ references

Week 8 / Class 1 Details

  • Finish downloading and setting up Zotero, Better BibTeX, and the citr package add-on
  • Create a .bib file
    • Since you’ve been working in a papaja document all quarter, it’s likely a .bib file was automatically generated or that you made a blank one so that your .Rmd would knit properly. If so, you can continue to use that one or create a new one, just make sure it is named appropriately and referenced correctly in the .Rmd where needed
    • Use the papaja default (r-references.bib) or give it some other sensible name
  • Add 5+ relevant references for your manuscript
    • Try to do different entry types if you can (journal article, book, book chapter, conference paper, etc.)
    • Already wrote a lit review for your project/proposal? Great! You don’t need to rewrite it. You can copy and paste your existing work into your RMarkdown notebook, edit formatting as needed, and then swap out the in-text citations you wrote out manually for dynamic bibtex references
  • Knit your .Rmd to .pdf and confirm that the citations render as expected
    • Error messages? Add/edit/check the name of your .bib file in the YAML header
    • Unexpected formatting?
      • Check your .bib file to see if the problem is in that file (in which case, correct it)
      • If not, debug by pinpointing the problem and following the troubleshooting guidelines from W2

Thursday, February 22 2024 (8.2)

Pre-class preparation

Come to class with all your BibTeX tools from last class installed and fully functional! Troubleshoot any knitting issues with Assignment (18) before the beginning of class.

Class meeting

slides - same slides as last class

  • Lecture: BibTeX References, citr, & Zotero (continued)
  • Assignment: (19) Add 10+ in-text references to your manuscript in at least 3 different formats

Week 8 / Class 2 Details

Building on Assignment (18), continue adding BibTeX references to your literature review (or anywhere in your narrative text). Your repo should now include (among other things):

  • A .bib file with at least 10 BibTeX entries
    • With at least 3 different source types (e.g., article, book, conference paper, policy memo, etc.)
  • A papaja .Rmd with at least 10 in-text bibtex references
    • With at least 3 different formats (e.g., adding extra text, using year only, including chapters, etc.)
    • You can rewrite or remove unnecessary extra formats later for your final project, but use them in this assignment for the sake of practice
  • A .pdf file created when you knit your papaja notebook rendering all bibtex references error free
    • Everything is correctly APA formatted both in the text (e.g., alphabetical order, no random text, last names only, etc.) and the entries in the references section
    • There are no formatting or input errors in the bibtex entries (e.g., broken special characters, incorrect capitalization, missing or misattributed fields, etc.)

Tuesday, February 27 2024 (9.1)

Pre-class preparation

Class meeting

slides

  • Housekeeping: Final reports, self-evaluation, last class planning
  • Lecture: Putting it Together
  • Assignment:
    • (20) Clean up your repo & revise your .README as needed
    • Revise your .Rmd (if needed) to include required elements

Week 9 / Class 1 Details

  • Clean up your repo
    • Revise your README to reflect the current state of the project:
      • Narrative description of project purpose, may include project abstract
      • Representation of current directory structure
    • Remove or archive old files (add to .gitignore as needed)
    • Review and update your .gitignore
  • Revise your .Rmd so that it knit to PDF without errors and minimally includes:
    • 1 table
    • 1 figure
    • 1 extracted value reference
    • 1 in-line function (or math)
    • 10 bibtex citations

Thursday, February 29 2024 (9.2)

Pre-class preparation

Class meeting
  • Lecture: Polishing & Publishing
  • Assignment: Final reports due next week!

Final report & participation self-assessment

Tuesday, March 5 2024

Scientific report due. Push all materials to GitHub AND submit a .pdf (.doc acceptable if you absolutely insist) of your knit report to Canvas.

Participation self-assessments due, submitted via Canvas.

Course policies

Accessibility and Accommodation

I am committed to making this course accessible to students of all backgrounds, identities, and abilities. If there are circumstances that make aspects of this course difficult for you to access, please contact me so we can discuss how to accommodate your needs. This includes, but is not limited to, accommodations around the format of course materials, the use of Canvas and other digital resources, the classroom and other physical resources, and the structure of assignments.

I will work with you to create an accessible learning environment whether or not you disclose your disability or personal circumstances. If you choose to disclose personal information with me, I will keep those discussions confidential. For certain accommodations you may need to contact Student Disability Services at (773) 702-6000 or disabilities@uchicago.edu. If you have a documented disability (or think you may have a disability) and, as a result, need a reasonable accommodation to participate in class, complete course requirements, or benefit from the University’s programs or services, I encourage you to contact Student Disability Services.

Diversity, Inclusion, and Community

We will commit as a class to creating a welcoming, respectful, and productive classroom. We will expect each other to be mutually respectful of our meaningful identities. I expect that when we engage with each other in discussion we are considerate of the diversity of our classroom with regards to gender, sexuality, disability, race, ethnicity, religion, socioeconomic status, immigration status, and language. It is critical that we maintain respectful dialogue in the classroom, which includes using correct names and pronouns. If a member of our community – including myself – is creating an unwelcome space for you, I hope you will bring this to my attention immediately.