From Data to Manuscript in R
Winter 2025 Home
- Syllabus
- Grading Policies
- Objective-based Assessment
- Communication Policy
- Classroom Climate
- AI*2: Academic Integrity and Artificial Intelligence
Professor
- Dr. Natalie Dowling
- Email: ndowling@uchicago.edu
- GitHub: @nrdowling
- Office: 1155 Building, Room 404
Section 1
- Weekly meetings: Mondays and Fridays 1:30pm - 2:250pm
- 1155 E 60th St, Room 289B
- TA: Mian Li
- Email: lim1an@uchicago.edu
- GitHub: @lim1an
Section 2
- Weekly meetings: Mondays and Fridays 3:00pm - 4:20pm
- 1155 E 60th St, Room 289B
- TA: Yuchen Jin
- Email: yuchenjin@uchicago.edu
- GitHub: @regenchen
Office hours
Students are welcome to attend any office hours for help with ongoing work or general support, regardless of section. For discussion about grades, you should meet with your section TA or Dr. Dowling. TAs cannot discuss grades with students outside of their section.
- Dr. Dowling: Thursday 2pm - 4pm; 1155 E 60th St, Room 404
- sign up via GCal or email me to request an alternative time
- Mian Li: Friday 9am - 10:30am; 1155 E 60th St, MAPSS 4th Floor Lounge (no signup required)
- Yuchen Jin: Monday 4:30pm - 6pm; 1155 E 60th St, MAPSS 4th Floor Lounge (no signup required)
Hubs
View the main syllabus for details.
- Course site & syllabus
- Slack workspace - moderated, requires UChicago email
- Canvas
- Centralized assessment repo
- Example repo: schelling-games
Course schedule
This schedule is subject to change. Refer to this page for syllabus updates (including new and updated links, files, etc.) throughout the quarter. Slides will typically be posted the morning of class and remain accessible through the end of the academic year.
The accordion boxes contain detailed information about each class meeting. Tasks are color coded:
- Ungraded assignments, preparatory tasks, readings, and resources are in blue boxes.
- Class meeting plans, lecture topics, and links to slides/materials are in orange boxes.
- Reminders and extra information relevant to that class period are in red boxes.
- General to-dos and optional exercises to complete following each class are in teal boxes.
Unit 1: Fundamentals of GitHub, R, and Quarto
Week 1: Getting Started
Setting Up; Solving Problems
Week 1 | Class 1
Monday, January 6, 2025
Readings & Resources:
- Resources page: Get a head start by browsing some of the listed resources. The D2MR guides are going to be espeically helpful for the first week or two of class.
- Syllabus (you are here): We’ll go over the syllabus in our first class, but the structure of this class is pretty unusual. You’ll want to look over the syllabus policies (especially grading and assessment) in advance.
Tasks:
- Create a GitHub education account
- Complete the pre-quarter survey
- Join the Slack workspace (invites sent to your .edu address)
Materials:
Class plans:
- Introductions
- Syllabus Review
- Lecture: Say Hello to RStudio, GitHub, and Quarto
- Workshop: Get started on “getting started”
Don’t procrastinate on selecting your dataset! Ideally you will have entered the class with a dataset ready to go. If you didn’t, that needs to be your top priority. The whole course is centered around progressing on your final research project, and you can’t do anything without a dataset.
When you have a dataset in mind, review the dataset selection guidelines to be sure it meets the requirements. If you have questions, send an email to Dr. Dowling and cc your section TA.
There’s a lot of setup tasks you need to complete before our second class meeting. We started in class, but you’ll need to finish on your own.
What exactly do you need to do? Look ahead to the “Before class” tab for next class. This is what you should always do to see what you’re responsible for doing before that class meeting.
Week 1 | Class 2
Friday, January 10, 2025
Readings & Resources:
- Best practices for programming in R (in D2MR)
- Tidyverse style guide
- Natalie’s D2MR style guide
- Using GitHub with RStudio
- Setting Up RStudio
- Quarto and apaquarto
- More resources on best practices, troubleshooting, and debugging in the info box below.
Tasks: - Download and install R and RStudio - Join GitHub & the class Slack workspace - Slack invites are sent to your UChicago email, email Dr. Dowling if you don’t have one - say hi in the intro channel (at a minimum, we need you to say your full name, GitHub username, and section number; please give more of an intro if you’re up for it!) - Connect GitHub and RStudio (guide) - You can use your final project repo for this, but consider using a “tester” to get the hang of it first. - Create a GitHub repo and RStudio project for your final project that uses Quarto and the apaquarto
extension (guide) - There are (at least) 3 ways to do this! Refer to the guide to decide which method you want to use: 1. Start totally 100% from scratch 2. Add just the required apaquarto extension files to an existing project 3. Add all apaquarto files to an existing project - confirm you can pull from and push to your remote repo on github - add Dr. Dowling (@nrdowling) & section TA (S1 Mian: @lim1an; S2: Yuchen: @regenchen) as collaborators - Fork the assessments repo (guide in the repo’s README) - clone to an RStudio project - confirm you can pull from the upstream repo and push to your fork - add Dr. Dowling (@nrdowling) & section TA (S1 Mian: @lim1an; S2: Yuchen: @regenchen) as collaborators
Materials:
- Slides (v1 1/9/25 @4:15pm)
Class plans: - Lecture: Best Practices & Troubleshooting - Activities: (time permitting) - Troubleshooting setup tasks - Setup and experiment with copilot in RStudio
Reminders:
- Dataset: If you don’t yet have a dataset, this needs to be a top priority for you right now. Review the dataset selection guidelines and email Dr. Dowling with any questions.
Even more resources:
- Best practices:
- Troubleshooting:
- Relevant subreddits:
To-do:
- Finish any setup tasks you didn’t complete before class
- Look ahead to next Monday’s class materials. Lectures next week are long and dense, so reviewing resources ahead of time is going to be very important.
Recommended exercises:
- Create a README.md and .gitignore for your final project repo. Practice using the problem-solving strategies from today’s class to figure out 1) what these things are, 2) why they’re important, and 3) how to create them. Keep best practices in mind as you do this, especially regarding comments and transparency.
- Set up Copilot in RStudio and experiment a little. What do you think? How could it be useful to you? Any immediate benefits or drawback jump out at you? Is it something you want to make use of in your work?
Week 2: Fundamentals
GitHub & R Essentials; Quarto & Markdown Essentials
Week 2 | Class 1
Monday, January 13, 2025
Readings & Resources:
- R/RStudio
- An Introduction to R, Douglas et al. (2024)
- GitHub
- An Introduction to R, Douglas et al. (2024)
- R for the Rest of Us: How to Use Git/GitHub with R
- Bryan J. 2017. Excuse me, do you have a moment to talk about version control? PeerJ Preprints
- .gitignore Documentation
*Required reading
Materials:
- Slides (v2 1/13/25 4:30pm)
- Programmer’s groceries demo (this is in the d2mr-assessment repo, so make sure you can pull from upstream!)
Class plans:
- Lecture: GitHub & R Essentials
- Note: This is a long, dense lecture that we probably won’t make it all the way through, so be sure to review the slides and read the required readings before class!
- Activities:
- Probably none
- If by some chance we get through the whole lecture with time to spare we’ll work on some simple programming exercises
Announcements:
- Office hours: For the time being I need to limit my office hours to a maximum of 2 appointments per quarter. I am hopeful that at some point in February I will be able to add more hours each week so that I don’t have to have a per-quarter cap. I’ll keep you posted if that’s the case!
Reminders:
- Copilot: Using copilot in RStudio (or any AI/LLM) is fully optional for this course.
- I do recommend setting it up and trying it out, but you are not required to do so, and it’s not worth driving yourself up a wall for. You do not need Copilot for any material on the syllabus, and it does not have to be part of your troubleshooting/debugging workflow. The integrated Copilot in RStudio isn’t designed for debugging anyway, since it’s essentially a fancy autocomplete and not something you can prompt with questions.
- The one thing I’ll add here is a link to RStudio’s documentation for using Copilot. If you’re just getting started that’s the place to go first. (Also, if you started there but have since hit issues and tried a bunch of other things, it never hurts to just download a fresh copy of RStudio and start from scratch! The “have you tried turning it off and on again” approach solves more problems that you might expect.)
- Please continue to ask for help setting it up on Slack, but know that we will not cover this in class.
- Assessment repo: Be sure to pull from the assessments repo frequently. This is where I will add/update both mini-project guidelines and materials for in-class demos.
- Exercises: The slides for most lectures include at least 1 suggested exercise related to the lecture’s topic. These vary in scope and complexity. They are not required and there is nothing to submit.
- All exercises include a box on the slide listing the numbers of the assessed and unassessed learning objectives I intend for the exercise to primarily address. This is just for your reference, so you can decide what to prioritize and you can get a sense of what kind of things demonstrate each objective. You can see which numbers go with which objectives either by rendering the
assessment.md
file or by referring to the website. - Some exercises additionally reference an associated mini-project, which are usually additional guidelines to complete a moderately more complex version of the exercise. For exercises that do not have an associated mini-project, you are welcome to expland the exercise into a full mini-project. Follow the “off-the-menu” instructions to do so.
- All exercises include a box on the slide listing the numbers of the assessed and unassessed learning objectives I intend for the exercise to primarily address. This is just for your reference, so you can decide what to prioritize and you can get a sense of what kind of things demonstrate each objective. You can see which numbers go with which objectives either by rendering the
To-do:
- Programmer’s groceries demo
- The
groceries.qmd
notebook walks through examples of writing conditionals and defining functions based on a very old and cliche joke. The notebook includes both extensive markdown explanation and code in chunks you can run individually. - Start by working through the notebook and executing the code, then experiment a little with the code to see if you can make things more efficient or introduce new scenarios.
- The
- Next class on R Markdown & Quarto will be another dense lecture, so be sure to review the slides and read the required readings and preview the slides before class. Slides will be posted by Wednesday.
Recommended exercises: View slides for additional details for each exercise.
- GitHub required files:
- Revisit the README.md and .gitignore files you created in the W1C2 exercises. With your new knowledge of the whats and whys of GitHub and these files, revise the README to make it more informative and revise the .gitignore to include any files or folders that fall into the categories we discussed in class.
- Simple .R script:
- Write a simple .R script that (minimally) loads the
tidyverse
packages, requires a package, includes commented text, and assigns at least one numberic and one string variable. To go further, use vectors to create a data frame or tibble and/orsource()
your R script into a code chunk in an R Markdown document.
- Write a simple .R script that (minimally) loads the
- hello_world() function:
- Define a
hello_world()
function in an .R script that includes object assignment, conditional logic, and (ideally) afor
orwhile
loop. The function should take at least one argument and return at least two possible values. - Associated mini-project:
06_r-programming/01_hello-world
- Define a
- Skeleton repo:
- Create a GitHub repo connected to an R project for either a planned or fictional research project. In the R Project, create a “skeleton” that includes all the files and folders you anticipate needing for the project. Structure your repo to exclude all files and directories that should (for whatever reason) be excluded. As you work, use git best practices for version control, including frequent informative commit messages and functional, up-to-date README and .gitignore files.
- Associated mini-project:
07_git-github/01_skeleton-repo
Week 2 | Class 2
Friday, January 17, 2025
Readings & Resources:
- Intro2R, Ch8 Reproducible Reports with R Markdown*
- R for Data Science, ch28: Quarto*
- Quarto Documentation:
- Markdown basics
- Output (YAML options): PDF & Word
- Quarto Getting Started tutorial
- Note: this tutorial uses the visual editor, which you should not use in this class!
- Quarto cheatsheet
- Quarto for R Markdown Users
- YAML:
- Quarto Authoring > > Front Matter
- YAML Tutorial (YAMLine)
Materials:
Class plans:
- Lecture: Markdown & Quarto Fundamentals
- Activities
- Probably none, but if by some chance we get through the whole lecture with time to spare…
- Review the
groceries
notebook from last class - Walk through creating the example
hello_world()
function
Reminders:
- No class on Monday, January 20th (MLK Day)
- When you submit mini-projects, make sure you read the assessment document instructions carefully. Specifically, know when to mark “objective attempt” and “unique objective attempt” in your submission. (You’ll never mark “objective met”, which is for graders.)
Additional information:
- The no-deadlines-do-what-you-want assessment style of this class is supposed to make it easier, not harder, to learn. If you’re feeling overwhelmed, please let us know so we can help you get back on track! I suggest coming up with a loose timeline for when you’d like to submit mini-projects and research project drafts throughout the quarter, anticipating changes as we go.
- Recommendations to get you started:
- Submit your first mini-project by the end of Week 3, and plan on 1 every 2 weeks after that
- Submit a first (very incomplete!) draft of your research project by the end of Week 4, then additional drafts every 2-3 weeks as needed
- Your “draft” can be very, very simple. By the end of Week 4, you should be able to confidently demonstrate some skills across multiple categories – GitHub, R programming, markdown, and tidyverse, at least! You do not (and probably should not) wait to submit your first draft until it has all the components you want in the final project. Treat this as iterative and cumulative work.
To-do:
- Double check that (1) you invited Dr. Dowling and your section’s TA to your repo(s) and (2) they accepted the invitation. If we didn’t accept, the invite link may have expired and you’ll need to re-invite us.
Recommended exercises: View slides for additional details for each exercise.
- Create apaquarto repos: Create directories for 3 (temporary) apaquarto projects using each of the 3 setup methods. Connect each to a (temporary) github repository. Play around to find where there is flexibility and where things can break.
- Simple .qmd notebook: Create a new quarto notebook (.qmd), and follow all best practices for notebooks and code chunks as you work. Add a YAML header, at least 1 setup code chunk, at least 1 non-setup code chunk, and narrative text formatted with markdown.
- Associated mini-project:
05_data-communication/02_simple-qmd
- Associated mini-project:
Unit 2: Managing data in the tidyverse
Week 3: Welcome to the Tidyverse
Tidy Data & Core Tidyverse Packages
Week 3 | Class 1
Monday, January 20, 2025
All UChicago classes are cancelled for Martin Luther King Jr. Day.
Week 3 | Class 2
Friday, January 24, 2025
Feeling lost with the basics? The open-access book Introduction to R has a fantastic “getting started” overview. Review the Preface and Chapter 1 for the “how do I literally do anything?!” stuff and Chapter 2 for the “I’m doing things but I have no idea what the things I’m doing are” stuff.
Readings & resources
- Tidyverse guides & cheatsheets
- r4ds intro chapters (recommend you read in this order - 7, 3, 5):
- 7 - Data import*
- 3 – Data transformation*
- 5 - Data tidying : skim this for now, read it thoroughly before W4C1
- For R converts coming from other data analyses:
- Stata to R :: cheatsheet
- R for Stata Users (Muenchen & Hilbe)
- R for SAS & SPSS Users (Muenchen)
- From SPSS to R (Wolf)
- Switching from Matlab to R (Richards)
- R for MATLAB users (cheatsheet-ish)
Materials:
- Slides (v1 1/23/25 9:40am)
- dplyr pipeline demo:
00_in-class-materials/demo-snippets.R
Class plans:
- Lecture: Welcome to the Tidyverse Part 1: readr, tibble, dplyr
- In-class exercise: reading and writing practice
- Demo (time permitting): dplyr pipelines
Reminders:
Get your forked d2mr-assessment
repo set up and functional!! Top priority!! Attend TA office hours if you’re still having trouble.
Additional information:
- Mini-projects should be fully distinct from your final project
- If you want to use the same data in a very different way ask for an exception via the mini-project proposal process
- Even for exceptions, your MP should not be contained within your final project repo
- In most cases, demonstrating the GitHub skills via a mini-project will require creating a new repo for the purposes of the project
- MPs should have a “home” in your assessment repo that contains (minimally) the assessment file
- Expect your MP grades to be lower earlier on, as the quarter goes on you’ll be able to show more and more skills within a single project
- Final project repo names – make them informative!! This is part of git best practices.
- “early-childhood-shrugs” not “final-project-d2mr”
To-do: Download and store the tidyverse cheatsheets
Recommended exercises: View slides and mini-project repo files for additional details for each exercise.
- Read/write practice:
- In a new notebook, use
readr
and related functions to read in a tabular dataset (.csv/.tsv), transform the data in some way, then write to a new .csv file. Replicate the process in an .R script, then source that script in the notebook. - Find or create a variety of datasets to repeat the process, like different file types (e.g., .csv, google sheets, .xlsx, .sav), different “extras” (e.g, multiple tabs, rich formatting, non-data text, header rows), and different combinations of file locations (e.g., notebook and script in same folder folder, one in root folder and one in a subdirectory, writing/reading files in a lower- or higher-level subdirectory).
- Pay attention to which packages are necessary to run which functions, which functions include which arguments (and the default argument values, and which differences in relative paths depending on function and file locations.
- In a new notebook, use
- Data cleaning structured exercises:
- Walkthrough: Clean the iris Dataset
- Demo only! Walks you step by step through a cleaning process, setting you up to complete the Level 1 or 2 mini-projects.
- Associated mini-project:
01_data-cleaning/00_cleaning-walkthrough
- Level 1: Clean the mtcars Dataset
- Work with a simple dataset, focusing on basic data cleaning tasks. Most tasks are outlined in the assignment script.
- Associated mini-project:
01_data-cleaning/01_cleaning-level-1
- Level 2: Clean the midwest Dataset
- Work with a more complex dataset and perform more involved cleaning tasks. General guidance is given without much structured support.
- Associated mini-project:
01_data-cleaning/02_cleaning-level-2
- Walkthrough: Clean the iris Dataset
- Group “uncleaning”:
- Work together with a classmate to each “unclean” a dataset for the other(s) to clean. Can be done in partners or small groups.
- Associated mini-project:
01_data-cleaning/03_group-unclean
Week 4: Manipulating Data
Tidying & Wrangling Data; Data Preparation Workshop
Week 4 | Class 1
Monday, January 27, 2025
Readings & Resources:
- dplyr joins
- tidyr
- forcats
- stringr
- purrr (advanced, low priority for d2mr)
Materials:
- Slides (updated 1/27/25 @12:15pm)
Class plans:
- Lecture: Tidying & wrangling data
- Demo code to be run along with the lecture (combining data, tidyr, stringr, and forcats) found in the
demo-snippets.R
script
- Demo code to be run along with the lecture (combining data, tidyr, stringr, and forcats) found in the
Reminders:
- Beginning this week, we will not spend class time addressing questions about class logistics, assignments, or other administrative topics. You will still have questions, and you will still get answers! But they will be asked and answered on Slack only.
- I will make announcements on Slack in the dedicated
#announcements
channel. You are responsible for all the information communicated this way, even if I never bring it up in class. Be sure you have notifications turned on for Slack. At a minimum, you should get a notification when someone tags you or @everyone, when someone directly replies to a thread you’re in, and when someone posts in the#announcements
channel. - There is/was an out of date version of the assessment doc in the
cleaning-level-1
mini-project. Please read the announcement in the Slack channel for more info and what to do if you have completed or intend to complete that project.
To-do: Prepare for Friday’s workshop
Note: The “conceptualize your dataset” exercise is designed to set you up for the workshop. I strongly recommend completing that exercise before Friday and spending your class time working on the to-do/to-learn list you generate, but you’re free to work on other projects if you prefer.
- Select something to work on for the data preparation workshop on Friday. This can be a mini-project that involves cleaning and wrangling (not necessarily from the cleaning/wrangling categories) or part of the data prep needed for your final research project.
- Create a to-do list for what you’d like to accomplish in the 80 minute workshop. Note which parts you can do on your own and which parts you’d like help with. You’ll have Dr. Dowling, your section TA, and your classmates to collaborate with; the more specific you can be identifying what you need help with ahead of time the more efficiently you’ll be able to get that help.
Recommended exercises: View slides and mini-project repo files for additional details for each exercise.
- Conceptualize your dataset:
- Conceptualize the dataset you’ll need to create for your final research project. Imagine what kinds of figures you’d like to create or analyses you’d like to run. What variables and observations would you need for each? Articulate your research questions and hypotheses and see how they can map onto your dataset’s structure.
- Create a tabular mockup of your goal dataset(s) using Excel, pencil and paper, or any non-programmy thing you like. Don’t worry about whether you know how to get there yet. Focus on structure, not code.
- In your final research project repo, create an .R script called
data-prep.R
(or similar). Use comments to write out in plain English the steps you’ll need to take to turn your current dataset into your goal dataset. - Identify parts of the comments you can fill in now with R code you already know. Create a to-learn list for parts that you’ll need to learn new functions or techniques.
- Data wrangling structured exercises:
Walkthrough: Clean the iris Dataset- Walkthrough doesn’t actually exist. At least not right now. Between the cleaning walkthrough, the cleaning projects, the in-class demos, and the highly structured format, you should be able to approach Level1 wrangling on your own. I may add one later if I see there is a need for it.
- Level 1: Recreate a starwars Dataset
- Work with a medium-sized dataset (starwars) and transform to match a goal dataset. Most tasks are outlined in the assignment script.
- Associated mini-project:
02_data-wrangling/01_recreate-level-1
- Level 2: Recreate a gapminder Dataset
- Work with a more complex dataset and perform more involved wrangling and transformation tasks. General guidance is given without much structured support.
- Associated mini-project:
02_data-wrangling/02_recreate-level-2
- Wrangling group make-a-mess:
- Work together with a classmate to each dramatically transform/wrangle a dataset for the other(s) to recreate. Each partner will choose a publicly available dataset and transform it using tidyverse functions. You will then swap transformed datasets with your partner and attempt to recreate their work.
- Associated mini-project:
02_data-wrangling/03_group-make-a-mess
- Create a data wrangling function:
- Define a function to execute procedural data wrangling tasks. Your final product should be an .R script that defines your function, then runs your function in a variety of test cases
- “Wrangling” is loosely defined here. It should be a function that serves a useful and plausible purpose for data preparation.
- Note: You may use your final research project data to guide the development of your function in this mini-project, but code you submit for this mini-project will not also count toward demonstrating objectives on your final project.
- Associated mini-project:
06_r-programming/03_wrangling-function
- Recreate an existing function:
- Recreate one or more existing functions from base R or a package of your choice. Your final product should be an .R script that defines your function, then compares the output with that of the original function.
- The goal is not to literally recreate the actual function, but to define a function that conceptually matches the function and achieves a similar result.
- Note: This may sound intimidating, but it really is an any-level project. You can work towards a fully equivalent function that does produces exactly the same outpu in all cases, or you can take a function and just break it down fully conceptually without writing any code at all. Or (I expect in most cases) anywhere in between.
- Associated mini-project:
06_r-programming/04_recreate-function
Week 4 | Class 2
Friday, January 31, 2025
(same as “after class” above, just worth saying twice!)
- Select something to work on in class. This can be a mini-project that involves cleaning and wrangling (not necessarily from the cleaning/wrangling categories) or part of the data prep needed for your final research project.
- Create a to-do list for what you’d like to accomplish in the 80 minute workshop. Note which parts you can do on your own and which parts you’d like help with. You’ll have Dr. Dowling, your section TA, and your classmates to collaborate with; the more specific you can be identifying what you need help with ahead of time the more efficiently you’ll be able to get that help.
Class plans:
- Catch-up: If needed, we’ll catch up on any slides we didn’t get through on Monday.
- Workshop: Data preparation - work solo, together with classmates, and with the support of Dr. Dowling & the TAs to make progress on either the data prep tasks necessary for your final project or a mini-project relating to (at least in part) data cleaning and wrangling.
Reminders:
Additional information:
To-do:
Recommended exercises:
Unit 3: Analyzing and visualizing data
Week 5: Data Visualization with ggplot2
A Grammar of Graphics; Visualization Workshop
Week 5 | Class 1
Monday, February 3, 2025
Readings & Resources:
- Chapters on data visualization and ggplot in r4ds: 1 and 9, 10, 11
- ggplot2 cheatsheet
- ggplot2-Book (note: 3e is currently “under construction” but it’s still a great resource!)
Materials:
- Slides (v1 2/3/25 12:30pm)
- Cloning the assessment repo
- ggplot2 walkthrough part 1
- You can access this and run it as a stand-alone document without the rest of the repo
Class plans:
- Lecture: Introduction to Data Visualization with ggplot2 (A grammar of graphics)
- Activities:
- Fix the fork problem
- ggplot2 walkthrough part 1 (together with slides)
Reminders:
- Assessment repo changes: We are moving to a new system where you will clone the assessment repo and then set the upstream repo to the main class repo. This is an alternative to forking that gives us (instructors) more control over the upstream repo and you more control over your own repo. See the guide here.
- This change is mandatory. I know this is coming at a bad time and will be very frustrating for some people. I’m here to help! Please post issues in the Slack so we can resolve things quickly.
- What do I do with my fork? No matter what, create a local backup first. I suggest deleting your fork when you are confident the new clone system is set up and working correctly. That will help avoid confusion (on your end and mine) about which repo is the “real” one, it will prevent accidental pull requests to the main repo, and it will ensure you’re the only one with full control over your work.
- I’m working on a mini-project now in my fork, what do I do? If you clone your forked repo, all your ongoing work will be in the new repo. Double check that the cloned files are up to date, then continue working in the clone and abandon the work in your fork.
- What if I already submitted mini-projects with my fork? Not a problem. You have two options: 1) Leave your fork up for now until your projects have been graded (and finalized by Dr. Dowling). When your grade is finalized, copy the graded assessment.md file into your new clone so that you have a record of your progress in your in-use repo. 2) Resubmit your mini-project to Canvas with a link to the directory in your new clone repo. In either case, once there are no outstanding submitted Canvas assignments with links to your fork, I recommend deleting your fork.
- Friday’s workshop: Friday’s class will be a data visualization workshop. Unlike the Week 4 workshop, this won’t be a “free work” period. You should come prepared to discuss your needs and goals for visualizing data for your final research project. We’ll work as a class to concetpualize what your plots should look like and prepare to make them in ggplot.
- If needed, we will devote about 15 minutes to resolving issues with the GitHub change, but we have to limit that time. Please use Slack and your troubleshooting resources to resolve as many issues as possible before Friday.
To-do:
- Download the ggplot2 cheatsheet
- Migrate from the github forked assessment repo to a new cloned repo, following the directions in this guide and getting support on Slack if needed
- Prepare for Friday’s workshop by picking 2 tentative plans to visualize data for your final research project. Be ready to share in small groups and as a whole class!
Recommended exercises:
- Data visualization structured exercises:
- Walkthrough: Introduction to data visualizaton with ggplot2 (pt1)
- Demo only! This is the demo the lectures follow along with. Part 2 will be added next week.
- Associated mini-project:
03_data-viz/00_viz-walkthrough
- Level 1: Visualize the mtcars Dataset
- Work with a simple dataset to recreate 3 basic plots (using only data, aes, and geoms) and 3 intermediate plots that add other commonly used elements.
- Associated mini-project:
03_data-viz/01_viz-level-1
- Level 2: Visualize the starwars Dataset
- Work with a more complex dataset to recreate 1 advanced (and hideous) plot using many layers and theme elements.
- Associated mini-project:
03_data-viz/02_viz-level-2
- Walkthrough: Introduction to data visualizaton with ggplot2 (pt1)
Week 5 | Class 2
Friday, February 7, 2025
Readings & resources:
- from Data to Viz (interactive guide for selecting appropriate visualization)
- Be Awesome in ggplot2: A Practical Guide to be Highly Effective
- Designing effective data visualization (Johns Hopkins Libraries)
- Ten Simple Rules for Better Figures
Tasks:
- Plan to workshop 1 or 2 plots from your final research project data in class, in both large group and small group settings. The “from Data to Viz” link above can be a good starting point.
- Take a look at the new mini-projects that have been posted this week (there are a lot!). In addition to projects about data visualization, there are instructions for most MPs in the “miscellaneous” category and a handful of others in other categories. If you see a project that is interesting to you but see that you don’t know everything you would need to complete 100% of the project, remember it’s totally ok to simplify the projects or focus just on the skills do you know instead of waiting until you’ve got everything mastered.
Class plans:
- Activities: (no lecture)
- (if needed) 15 minutes support for github clone setup
- Large group visualization workshop
- Small group visualization collaboration
Reminders:
- Mid-quarter engagement grades: Provisional community engagement grades are posted Friday Week 5. This is to give you a sense of what you can expect your final engagement grade to be if you continue participating in class at the same level you have been so far. Remember:
- Grades are assigned by Dr. Dowling, with input from TAs about how they have seen students engage that Dr. Dowling may not see
- These grades will be fully replaced at the end of the quarter, and no matter what your grade is now there is the potential to gain all 10 points by the end of the quarter.
- The easiest way to bump up your grade is by participating more on Slack. You can see more suggestions for how to raise this component of your grade here.
- Feel like your grade doesn’t accurately reflect your engagement thus far? Two possibilities:
- You could be wrong. Not that you’re not trying, but that you may be overestimating your engagement relative to the course expectations. Take a minute to reflect a bit on your participation thus far, read over the suggestions for ways to engage, and consider that this grade starts at 0, not 10 (you earn points; you don’t lose them).
- You could be right! If you think you’re engaging in ways that Dr. Dowling and the TAs haven’t noticed, you can let us know! The “Community Engagement” assignment on Canvas (where you see the provisional grade) has an optional submission component. You can use this submission as a way to show/tell us how you’ve been an active part of the class community in ways we aren’t aware of. You can submit and resubmit to this assignment as many times as you like over the rest of the quarter, but you won’t receive a re-evaluated numeric grade until the end of the quarter.
- Aside from giving us more information about your own engagement, please use the assignment submission to let us know how you have seen your classmates positively engage with the community! Someone start a study group? Help you debug your code for 2 hours on a Sunday night out of the goodness of their heart? Give you particularly useful insight during a class workshop? Use this as a place to give them a shout out so they can get the points they deserve.
To-do: Make a submission plan & timeline
You’re (more than) half way through the quarter — how far along are you toward earning the 50 mini-project points and 40 research project points? Given that projects allow you to show cumulatively built skills, it’s reasonable to expect that it will be easier to earn more points in the second half of the quarter once you’ve got a lot of the basics under your belt.
Take a few minutes to make plan for how you’re going to earn the points you need for the grade you want. Are the mini-projects overwhelming? Maybe you want to turn in a lot of small projects instead of a couple big ones. Are they boring? Design your own! Confused by the grading? The sooner you turn in projects, the sooner you get a sense of how grading works, and the more time you leave yourself to earn all the points you want.
Same goes for the final research project. You’ve got 4 tries to earn the full 40 points, and you do not have to aim to meet all 40 at once. In fact, I strongly suggest you do not wait until you’re at a point where you can reasonably get everything at once. By that point we’ll be very late in the quarter, and you probably won’t have time to submit more than one try.
Recommended exercises:
- Data visualization exercises:
- Data visualization Level 1 and Level 2 mini-projects (posted after last class)
- Associated mini-project:
03_data-viz/01_viz-level-1
- Associated mini-project:
03_data-viz/02_viz-level-2
- Associated mini-project:
- Group/pairs swap plots
- Work with a partner to create and recreate data visualizations using ggplot2. Each partner will create a complex visualization from a dataset, then challenge their partner to recreate it using only the dataset and a static image of the final plot.
- Associated mini-project:
03_data-viz/04_group-swap-plots
- Create a plotting function
- Define a function that creates customized plots using ggplot2. Your final product should be an .R script that defines your function, then demonstrates its use with various datasets and argument combinations. The function should serve a useful purpose in data visualization while incorporating tidyverse principles.
- Associated mini-project:
06_r-programming/02_plotting-function
- Data visualization Level 1 and Level 2 mini-projects (posted after last class)
- Other recently published exercises:
- Data wrangling
- (Off the syllabus) Wrangling non-rectangular and nested data -
02_data-wrangling/04_ots-nonrect-nested
- (Off the syllabus) Wrangling non-rectangular and nested data -
- Data communication
- Transpose an existing document to quarto -
05_data-communication/01_transpose-to-quarto
- Create a demo for a class topic -
05_data-communication/03_class-topic-demo
- Transpose an existing document to quarto -
- R Programming
- (Off the syllabus) Create and host a package -
06_r-programming/05_ots-create-package
- (Off the syllabus) Create and host a package -
- Unassessed/miscellaneous
- Create and maintain a debugging journal -
08_unassessed-misc/02_debugging-journal
- Communicating concepts -
08_unassessed-misc/04_concepts
- Explore and apply LaTeX styling -
08_unassessed-misc/05_latex
- Contribute to StackOver or other crowdsource sites -
08_unassessed-misc/06_contribution
- Create a tutorial on a topic of your choice -
08_unassessed-misc/07_tutorial
- Create and maintain a debugging journal -
- Data wrangling
Week 6: Presenting data for publication
Pretty plots; Tables
Week 6 | Class 1
Monday, February 10, 2025
Readings & Resources:
- General ggplot2 data visualization:
- Data Viz with ggplot2: facets
- ggplot2 documentation: themes
- ggplot2-Book:
- Layers (especially ch3, 4, 5)
- Scales
- The Grammar
- The R Graph Gallery
- Figures in APA & Quarto
Tasks:
Access the d2mr-apaquarto
repository on GitHub. Optionally, clone the repo to play around with the files. You can set up a relationship to the upstream repo to pull in changes if you want, but it’s not necessary if you don’t want the git headache.
Materials:
- Slides (v1 2/20/25 @11am)
- Data viz walkthrough part 2 (
03_data-viz/00_viz-walkthrough-2
) - Example apaquarto manuscript (d2mr-apaquarto repo)
Class plans:
- Lecture: Pretty plots
- Small groups: .qmd figures
Grading notes
- Meeting objectives: To make sure you get credit for the objectives you attempt to demonstrate, read through the assessment doc before submission to see if you’ve done the suggested tasks for each. For example:
- (5) Find, install, require, and load R packages: Did you use more than one function?
- (6) Use arithmetic, comparison, and logical operators: Did you use all 3 types?
- (8) Parse and write conditional statements and/or loops: Do your conditionals use more than one function in more than one context?
- (9) Use
readr
functions to read in and write out data: Will your script read in data for your grader (use relative paths!)? - (22) Create and effectively use code chunks following best practices: Do all your chunks have unique and informative labels?
- (26) Create and maintain a quarto document YAML header: Does your header include options for an APA styled document?
- You can still get credit for an objective without exactly matching the recommended tasks, but you’ll need to show effort and capability that is essentially the equivalent. If you believe you’ve done so, make a note of how you’ve shown comparable skills in your assessment document.
- Objective categories: When you’re attempting objectives, be conscious of the category each is under. Keep the broader context in mind. For example:
- Git and GitHub: These objectives should show your abilities to create and maintain projects using git. Working in a git repository that you didn’t make or editing a readme file that you didn’t create doesn’t showcase your skills (it showcases mine). Effectively using version control doesn’t just mean you pushed to github, it means you’re using the git and github systems as an effective replacement for not using the git systems.
- Notebooks and code chunks: These objectives should demonstrate your ability to effectively use markdown notebooks to execute skills demonstrated elsewhere. Maybe you made beautiful ggplots that met all plotting objectives, but to get (24) you need to integrate those figures into a notebook, including referencing it in the text. It’s not about the plotting skills; it’s about the markdown and notebook skills.
- R Markdown and Quarto: These objectives should show your ability to create and maintain publication quality documents — the manuscript part of “Data to Manuscript.” Like with Git/GitHub, you’ll need to show your abilities to create and maintain (not mine), and you’ll need to show how the skills work in a publication-styled document. For example, to show “(27) Use quarto R Markdown to compose an academic manuscript,” it’s not enough to just use bold formatting and create lists; you need to show the markdown formatting in an academic manuscript.
- Ask us! If you’re feeling uncertain about what “counts” or you don’t understand your grades, ask Dr. Dowling and your TA for help understanding.
Tasks:
- Using public data? Download everything you need locally. The politics of research right now is unpredictable and horrifying. Control what little you can by creating a local version of data or any other materials needed for your work.
Recommended exercises:
- Data visualization walkthrough (part 2)
- Continuation of the ggplot demo. Part 2 covers more advanced skills, including the scales, facets, and theme layers.
- Associated files:
03_data-viz/00_data-viz-walkthrough-2
- Create a ggplot2 theme
- Create a custom ggplot2 theme that reflects your personal visualization style while maintaining clarity and professionalism (or not). Your theme should be reusable across different types of plots and demonstrate understanding of ggplot2’s theming system, including effective uses of the
theme()
layer and (optionally) non-data-dependent visual elements created as arguments of non-theme layers. - Associated mini-project:
03_data-viz/03_ggplot-theme
- Create a custom ggplot2 theme that reflects your personal visualization style while maintaining clarity and professionalism (or not). Your theme should be reusable across different types of plots and demonstrate understanding of ggplot2’s theming system, including effective uses of the
- (OTS) Non-ggplot figures
- Create a data visualization project that demonstrates your ability to create publication-quality plots using R packages other than ggplot2. You may use alternatives to ggplot2 or extensions.
- Associated mini-project:
03_data-viz/06_ots-non-ggplot
Week 6 | Class 2
Friday, February 14, 2025
Readings & documentation:
- R Graph Gallery’s guide to table packages
- Pandoc tables
- r4ds: 28.8 Tables
- apaquarto: Tables
- Quarto: Tables
- RMarkdown Cookbook: Ch10 Tables
Packages for table-making:
- knitr (kables) and kableExtra (better kables)
- stargazer package for statistical reporting & tutorial
- flextable package and extensive guide
- gt (Grammar of Tables), gtsummary, & gtextras
Check-in: How’s your final research project going? How many drafts have you submitted? How many points off the full 40 are you? How many points do you think you’d get if you turned in what you have right now? How confident are you that you know what you need to get the grade you want? If you’re feeling at all unsure about your final project, turn something in now. If it’s a mess and you would get a bad grade, wouldn’t you rather find that out now while you have several weeks to turn that bad grade into a perfect score?
Tasks:
- Before Monday, decide on at least 1 descriptive analysis you would like to conduct with your data. It’s time to make a preliminary analysis plan for your research project data. On Monday we’ll talk about executing simple descriptive statistics in R, and on Friday we’ll cover simple hypothesis testing. Neither of these sessions is intended to be a stats class. We’ll learn how to execute analyses in R and we’ll go over the absolute basics of what different kinds of tests are used for, but it is up to you to know what you need to do in the best interests of your project.
Week 7: Basics of data analysis (for psychologists+)
Descriptive Statistics; Hypothesis Testing
Week 7 | Class 1
Monday, February 17, 2025
Readings & resources:
- Stats and R
- Correlogram in R: how to highlight the most correlated variables in a dataset
- Correlation coefficient and correlation test in R
- What statistical test should I do?
- With a much bigger and better flowchart than mine!!
- Descriptive statistics in R
- A Practical Extension of Introductory Statistics in Psychology using R
- Ch 6 Correlation
- Ch 7 One-sample t-test
- Ch 8 Dependent samples t-test
- Ch 9 Independent samples t-test
- Learning Statistics with R
- Ch 5 Descriptive Statistics
- Ch 13 Comparing two means
- Desc. Stats. in R: Intro to Descriptive Statistics (Princeton Research Guide)
- Descriptives in R (Datacamp)
- Intro to R for Adv. Research Methods: Ch5 Descriptive Statistics and T-Tests in R (University of Galway)
Essential Stats Packages for Psychologists:
Materials:
SlidesDemo only:d2mr-assessment/00_in-class-materials/stats-demo-1.qmd
Class plans:
- Stats part 1 demo (descriptive statistics)
Notice:
We will not meet for class today. Review the descriptive stats demo on your own. The TAs will be available for real-time help in the #help-stats-and-analysis
channel during our normal class time.
To-do:
Take some time to plan out what statistics you’ll want to run on your data for your final research project, both descriptive and inferential. Remember that you’ll need to include both for full points. We’ll cover how to implement common hypothesis testing/inferential statistics on Friday (e.g., ANOVA, chi2, linear regression, glm), and you’ll want to be ready to try them out with your data during class. If you know you’ll need to include a specific kind of analysis that’s not one of the “basics,” drop a message in the #help-stats-and-analysis
channel with info about what you’ll be using and I may add it to the materials for Friday if it’s something I think others would benefit from covering too.
Recommended exercise:
- Descriptive statistics structured exercise:
- Practice using R to calculate and interpret descriptive statistics. Unlike in previous assignments, you will not use a built-in dataset. Instead, you will use data from a published, open-access dataset capturing relationships between math anxiety and self-perception. The exercise includes simple tasks with tables and ggplot.
- Associated files:
04_data-analysis/01_descriptive
Week 7 | Class 2
Friday, February 21, 2025
Readings & Resources:
- Stats and R
- A Practical Extension of Introductory Statistics in Psychology using R
- Ch 5 Simple Linear Regression
- Ch 10 One-way ANOVA
- Ch 12 GLM Approach Summary
- Learning Statistics with R
- Ch 12 Categorical data analysis
- Ch 14 Comparing several means (1-way ANOVA)
- Ch 15 Linear regression
- Brown VA. (2021) An Introduction to Linear Mixed-Effects Modeling in R. Advances in Methods and Practices in Psychological Science.
- Linear & Mixed-effects model tutorials
Materials:
SlidesDemo only:d2mr-assessment/00_in-class-materials/stats-demo-2.qmd
Class plans:
- Stats part 2 demo (inferential statistics)
- Group work: selecting analyses
I get the impression that many of you aren’t feeling confident about how to earn all the points for your mini-projects. If I’m right, I want to fix this as soon as possible since we’re nearing the end of the quarter. The 2 biggest things I want to communicate in this long message are:
- Talk to me if you need help.
- Don’t let flexibility be a distraction.
Not sure how many projects to do? Which projects to do? What counts as assessed and unassessed and off the menu and off the syllabus or any of the other terms I’m throwing around? Please ask me.This is the first time using an alternative grading system in this class, and it’s important to me that it is a better experience for you than the old system. Clearly not everything is going 100% smoothly, and I’m learning as we go what parts of the course design need fixing. That’s all well and good for future students who will benefit from changes made going forward, but I have a responsibility to make sure the less-than-helpful parts of the system aren’t holding you back right now.If you are feeling unsure about how to get the grade you want, please email me. We can work out a plan to give you more structure and guidance to get you where you want to go. You don’t have to just flail and guess and hope for the best.In addition to encouraging you to ask for help with your specific circumstances, here are some general bits of advice:
- Again, don’t let flexibility be a distraction. If you’re looking at this vast “menu” and all the many times I say “do whatever!” on the website and it’s making you feel like you don’t know where to start, tune it out. Pick the mini-projects you’re going to do now from the ones already published, ignore the optional/challenge parts, and set yourself deadlines for submission (you can ask me or the TAs to help you set goals).
- You only have to meet 20 unique objectives across all mini-projects. Pick which 20 you want those to be and work toward them. Frustrated by git and/or apaquarto and dreading the idea of making a new repo for those purposes? Don’t include those objectives in your 20.
- I expect most students will need to complete 3-5 mini-projects to earn 50 points. If you’re not sure how many you should try, budget time for 6-7.
- To help you plan, I’ve put together a list of all published mini-projects along with the essential info about each. View that list here.
- If you’re unsure how to earn a point for some particular objective, look at the recommendations in the assessment file and follow them to the letter. In some cases that will mean forcing in some arbitrary elements just to prove you can.
- For the other 20 points, there’s a good chance that in the work you’re already doing you are checking off some of the “unassessed” objectives. Adding lots of comments? Using internally consistent style? Planning out your plots with the grammar of graphics? Transforming your data into a tidy format because you can identify that it needs to be tidy for plotting? These are all points. When you complete the assessment for a project, look at that list carefully and see if there are any points you already did without even thinking about it and/or if there are some you could hit with minimal additional effort.
- You do not need to push yourself to complete projects to exceptional standards. Remember: Done is better than good. If you have to choose between getting something done and making something “good,” choose done. In this system you are cumulatively earning points, not losing points for imperfections. There’s no reason not to turn in “bad” work if it’s going to get you to your ultimate goal.
- If the “off-the-menu” and “off-the-syllabus” project ideas are confusing, just ignore them. You don’t need either. These are there for people who are already motivated to pursue them for their own interests, not to distract anyone from focusing on the basics.
I genuinely want every person in this class to earn an A, and from what I’ve seen so far I think you all can. Again, it’s my responsibility to make sure this new class structure is not so chaotic that it harms your learning or your grades, but I can’t help you if you don’t let me know there’s a problem.
To-do: Make/review you plan
- It’s the end of 7th week — how close are you to the 50 mini-project points? The 40 research project points? What do you need to do to get the grade you want by the end of the quarter?
- Deadlines:
- Mini-projects due March 5th (Wed Week 9)
- Final research projects due March 12th (Wed Finals Week)
Recommended exercise:
- Inferential statistics structured exercise:
- Practice using R to analyze data with common inferential statistics techniques. The exercise includes some work with tables and ggplot2.
- Associated files:
04_data-analysis/02_hypothesis
Unit 4: Disseminating research dynamically
Week 8: Creating your manuscript
Dynamic text; BibTeX & citr
Week 8 | Class 1
Monday, February 24, 2025
Readings & Resources:
Tasks:
Reminders:
Additional information:
To-do:
Recommended exercises:
Week 8 | Class 2
Friday, February 28, 2025
Readings & Resources:
Tasks:
Reminders:
Additional information:
To-do:
Recommended exercises:
Week 9: Publishing & polishing
Making the most of Quarto; Your Data to Manuscript Workflow
Week 9 | Class 1
Monday, March 3, 2025
Readings & Resources:
Tasks:
Reminders:
Additional information:
To-do:
Recommended exercises:
Week 9 | Class 2
Friday, March 7, 2025
Readings & Resources:
Tasks:
Reminders:
Additional information:
To-do:
Recommended exercises: