Overview
Description
This course tackles the basic skills needed to build an integrated
research report with the R programming language. We will cover every
step from data to manuscript including: Using R’s libraries to clean up
and re-format messy datasets, preparing data sets for analysis, running
statistical tools, generating clear and attractive figures and tables,
and knitting those bits of code together with your manuscript writing.
The result will be a reproducible, open-science friendly report that you
can easily update after finishing data collection or receiving comments
from readers. Never copy-paste your way through a table again! The R
universe is large, so this course will focus specifically on: The core R
libraries, the tidyverse library, and R Markdown. Students will also
learn about the use of GitHub for version control.
Weekly meetings:
- Tuesdays and Thursdays 11am - 12:20pm
- 1155 E 60th St, Room 289B
Instructor:
- Dr. Natalie Dowling
- Email: ndowling@uchicago.edu
- GitHub: @nrdowling
- Office: 1155 Building, Room 404
- Office hours: Tues/Thurs 4pm - 5pm - sign up via
GCal
Teaching assistant:
- Grace Huang
- Email: ysh@uchicago.edu
- GitHub: @ysh627
- Office: Green 208
- Office hours: Wednesdays 12-1:30pm (email for alternative
appointment)
Hubs:
- Course site &
syllabus (you are here!)
- Piazza
- Canvas
- Example repo:
schelling-games
- Duplicate this repo to poke around a (mostly) functional github
pages site created entirely using R Markdown. Most of the .Rmd and .R
files include extensive commenting, including some ideas for how you can
practice coding by revising or adding to those R scripts directly. (I
recommend duplicating rather than mirroring. Set up a new repo via “import”
using the same url copied from the green code tab. You’ll have a
completely different repo – no need to worry about messing up the
original. If you just clone it, it will still be attached to the
original, but your changes still won’t push because you aren’t a
collaborator.)
- Class repo:
d2m-2024
- Begins empty at the start of the quarter. Will be updated with
in-class demos and data as we go.
Course materials
- Most required readings will be drawn from one of the following great
(and free! and online!) resources:
- Additional resources can be found on the resources page
Dataset
Nearly all assignments in this course will involve working with a
dataset of your choice. We will begin with setting up a repo for your
data and R scripts, then move to tidying the data, then to analyzing and
plotting the data, and finally to interpreting the results based on
those data. In other words, having an appropriate dataset as early as
possible is essential. You will get the most out
of this class if you work with your own data for a project you will
continue to be invested in after the end of the quarter, like a BA or MA
thesis, grant proposal, or manuscript intended for publication.
If you are not currently working on this type of project, you should
consider taking this class next year (or whenever you do have
data). Whether or not you use your own data, you are responsible for
providing your dataset. Read the dataset selection guidelines for more
information.
Grading
Students enrolled in this course will be graded on the following
basis:
- Short assignments: 60% (graded complete/incomplete, 3% each)
- Scientific report: 20% (including 2 ungraded preparatory tasks)
- Participation: 20% (including self-assessment)
Assignments
There is a short assignment for each lecture. Each assignment will
involve working with your own data, data provided for that assignment,
or both and is designed to give you hands-on practice with using R.
Assignments typically begin in class and may be completed after class
(each one is highlighted in orange)
on a pass/fail basis. Students must push their completed assignment to
the appropriate GitHub repository, tagging and Dr. Dowling (@nrdowling)
and Grace (@ysh627) with the assignment number so that they receive a
notification. There are 20 assignments total, each worth 3% of your
final grade.
Assignment details are included at the end of the lecture slides.
When are the assignments due? We will usually start
assignments in class. Anything you don’t finish during class time should
be completed before the start of the next class meeting. If an
assignment is listed under “pre-class preparation,” it should (hopefully
unsurprisingly) be completed in preparation for the start of that
class.
Need help? We will do much of this work in class, so ask
your questions while we’re together. Each student will be assigned to a
support group of other students to help answer questions (in addition to
access to help from Dr. Dowling and Grace during office hours). A large
portion of the course is developing your search skills when it comes to
debugging your code (look for advice and
solutions from others who have faced similar problems on sites like
stackoverflow and github)! Week 2 slides will include a formal
procedure to follow for troubleshooting help.
Scientific report
Your final assignment will be a scientific report in R Markdown that
is developed via a GitHub repository and includes:
- data read-in, pre-processing, and analysis
- at least 2 figures and 1 table with captions and in-text
references
- at least 2 in-text R code references
- BibTeX citations
- at least 1500 words of the manuscript in at least four subsections
(Introductions, Methods, Results, Discussion; unless otherwise agreed
with Dr. Dowling).
Additional details about report requirements will be provided
throughout the quarter.
Items on the syllabus in green
indicate preparatory tasks for your final report: receiving approval for
your dataset (by Tuesday of Week 2) and negotiating a manuscript plan
(by Friday of Week 7). Each requires you to submit something informal
via Canvas. These tasks are not graded, but grading for your final
report will be contingent on completing both on time. Failing to meet with me for a report meeting will
result in an automatic 10 points off your final report grade.
Participation
Students earn participation credit through their attendance and
participation in their support group and discussion of course content in
office hours. Students are generally expected to come to class prepared
to practice using R together. That means you should come with your
charged laptop, prepared with any data you are using and any installed
software that is required (see the ‘preparation’ note for each class).
Be prepared to share your screen with your support group and with the
instructors. That said, please stay in touch about your limitations
regarding in-person participation and/or technology access.
This grade is not an attendance grade. It is much more important that
you care for your mental and physical health and the health of those
around you than it is to check the “came to class” box. Please do not
come to class if you are ill, but DO contact Dr. Dowling and Grace at
least 24 hours in advance of an absence, when possible. Missing classes
without notifying us may result in a lower participation grade. If your
attendance is low enough to impact your work (with or without notifying
us) you will need to meet with Dr. Dowling to discuss how it may affect
your grade.
Self-assessment
At the end of the quarter you will write a brief reflection (1-2
paragraphs) on your participation in this class and assign yourself a
grade (out of 20).
Your self-assigned grade may or may not end up as your final
participation grade, but it will be strongly taken into
consideration.
Grading
Participation grades will be posted to this assignment at the same
time your final scientific report is graded. Once assigned,
participation grades are final. If you are concerned
about your grade you should reach out to Dr. Dowling or Grace before the
end of the quarter, who can give you guidance on ways to increase this
part of your grade.
Late work & attendance
Life happens. Family emergencies, exams for other courses, COVID-19,
roommates in crisis…
If you need an extension, just ask. Send an email to Dr. Dowling (cc
Grace) as far in advance as possible explaining what’s going on
(vague!) and how much extra time you need (specific). It is
never necessary to provide details about your physical or mental
health. You are always welcome to have an open discussion about
what you’re dealing with in office hours, but in your emails about
absences it is more than sufficient to just say you’re sick or having a
personal emergency.
With prior notice and instructor approval, late work will not be
penalized. Students who register for the class after one or more
assignments were already due should contact Dr. Dowling to arrange an
extended deadline.
Without notice and/or approval, or after the agreed upon extended
deadline:
- Short assignments will be graded as incomplete, receiving 0 (of 3)
points.
- Final scientific reports will lose 1 (of 20) points per 12 hours
late.
If you will be missing class for any reason, email
both Dr. Dowling and Grace before the start of class.
Please do not come to class if you are
feeling at all ill. You do not need to be specific about
why you will be absent, but please do give a guess as to when you’ll be
back in class and whether you’ll be able to work remotely while you’re
out.
Stressed out by emailing professors? Here are some good examples
of what is helpful for your instructors.
I have exams in two other classes 5th week. Is it ok if I take an
extra 48 hours to turn in Assignment 5?
I just found out I need to quarantine after a COVID exposure. No
positive test as of now, so I’ll plan on missing classes Tuesday and
Thursday but completing the assignments at home. I’ll let you know if
anything changes.
I have a family emergency and need to travel home. I won’t be able to
work while I’m home and I’m not sure how long I’ll need to be away. Can
we meet over Zoom to make a plan to handle late assignments and
absences?
Course schedule
This schedule is subject to change. Refer to this page for syllabus
updates (including new and updated links, files, etc.) throughout the
quarter. Slides will typically be posted the morning of class.
For each day of class the schedule below includes an overview of what
we’ll cover in class and any key information you’ll need for that
day.
The accordion boxes contain detailed information about all
tasks you should complete. These boxes will describe things you
need to do both before and after that class, so you should
always read details at least one class ahead. Tasks are color coded:
- Ungraded assignments, preparatory tasks,
readings, and resources are in burgundy. Required readings are marked
with as asterisk and should be completed before start of
class.
- Graded assignments due before the start
of next class (unless otherwise noted) are in orange.
- Preparatory tasks for the final project
are in green.
- Extra information necessary for that
class period is in blue.
Module 1: Fundamentals of R and GitHub
Thursday, January 4 2024 (1.1+1.2)
Pre-quarter preparation
Because Winter 2024 begins on a Wednesday, we will not meet for the
first lecture (which would otherwise be Tuesday Jan 2). It is critical
that you come to the first class meeting on Thursday January 4th having
completed the getting started tasks – start by watching the short
recorded lecture! – and Assignment 1.
Class meeting
slides
- Introductions & Course Structure
- Lecture: R Scripts & Packages, and Markdown
- Groups: Assign support groups & discuss datasets
- Assignment: (2) Initialize a papaja
.Rmd
Submit your dataset for approval by
Friday Jan 5 at 11:59pm. This should be done via email, not GitHub.
Friday is the absolute deadline, but the earlier the
better! After you receive approval, complete the “Select dataset and get
approval” assignment on Canvas to receive completion credit.
- Install the tidyverse and papaja packages
- Create an R script file (.R) that:
- Loads the tidyverse and papaja packages
- REQUIRES 1+ other package of your choice
- Includes 1+ lines of commented text
- Assigns 1+ numeric variable and 1+ string/character variable
- BONUS: Create a dataframe
- Create an R Notebook with the Papaja template
- Change the title and author fields at the top
source()
your R script in the first code chunk
- FOR THIS AND ALL GRADED ASSIGNMENTS: Commit (with
an informative message) and push to GitHub tagging Dr. Dowling and
Grace
- Ungraded final step (no submission): knit .Rmd to Word doc or PDF
and troubleshoot as needed
- Above-and-beyond: Experiment with creating .Rmd
files without papaja
- We’ll only use papaja in this class, but knowing how to make use of
.Rmds with other outputs might be very useful for you. For example if
you want to make an html page (like this one!) or produce manuscripts in
non-APA styles.
- Papaja installation is trickier than other packages for a few
reasons:
It is not currently available through CRAN and so cannot be
installed with the install.packages() function. Papaja is now
available through CRAN! Hooray! That means you can install it just like
anything else, with the function
install.packages("papaja")
. These directions for installing
it directly from the source will still work, but the CRAN installation
should be much simpler.
- It requires manually setting up a TeX distribution – like TinyTex
(recommended), MikTeX, MacTeX, or TeX Live – which can have their own
installation issues.
- Because the installation is a relatively involved process, it can be
difficult to determine exactly where in the process errors are
occurring.
- So what are you supposed to do about it?
- Follow the installation guide in papaja’s detailed documentation. I
recommend starting with this guide rather than going to it when you hit
a problem. Past students have encountered an issue where once their TeX
distribution was installed in an incompatible way, they couldn’t undo
it.
- If it doesn’t install smoothly, set a timer for 1 hour and do your
best to troubleshoot the issue. If you can’t fix it in that time, makes
notes on what you’ve done so far and what you think might still be going
wrong and then stop messing with it.
- Adjust your assignment submission as follows:
- Create an .Rmd with the default Notebook template (& update the
YAML title field)
- Include the line of code that would load the papaja package, then
comment out that line
- Add your notes from above about the state of the installation
- Review the dataset
guidelines
- Once you are confident your dataset meets the requirements, request
approval via email by Friday Jan 5 at 11:59pm:
- Send the email to Dr. Dowling (ndowling @ uchicago.edu) and cc Grace
(ysh @ uchicago.edu)
- Subject line should be:
(Your first & last name) Dataset Selection
- Your email should include:
- A brief (2-4 sentences) description of your data
- The data itself, in whole or part; this may be an attached .csv
file, a link to a shared document, or instructions for where to find the
data on the github repo you have already invited us to
- If you are unable to share the data with us via email (due to IRB
limitations or anonymization concerns), explain this issue and a
tentative plan for how you will get the data into a sharable state
(i.e., any data necessary to run your analyses is included in a private
github repo) within one week
- At least one research question you tentatively plan to ask
that can be answered with this dataset in the state it is currently in
- Specifically, your question should be answerable with an outcome
variable in your data right now
- It’s ok if it’s a boring question, and it’s ok if you do not end up
answering this question in your final project
- Once you receive a reply approving your data from either Dr. Dowling
or Grace, complete the Canvas assignment to receive completion credit
- Due before class on Tuesday Jan 9th 11am, meaning you will need to
have gotten approval over the weekend
- In the text box, include a brief but descriptive name for your
dataset and the sender (Dr. Dowling/Grace) and timestamp of the approval
email
- NOTE: If you received approval for your dataset
when you requested enrollment consent, you do not need to request it
again. Simply complete the Canvas assignment to receive credit.
Tuesday, January 9 2024 (2.1)
Pre-class preparation
Take the weekend to catch up on any pre-quarter preparation tasks not
yet completed. It is critical that you are fully caught up by today or
you will not be able to follow along with demos and assignments.
Additionally, confirm that your dataset approval is marked as complete
on Canvas (contact Dr. Dowling if it is not marked as complete by start
of class).
Class meeting
slides
- Lecture: Best Practices & Troubleshooting
- Assignment: (3) Create and edit your
README.md and .gitignore files
- Part 1: README
- Create a README.md file in the top level of your repo if it doesn’t
exist
- Edit your README to include:
- A brief description of your data
- A brief overview of your planned project
- A real or hypothetical file tree
- Format of your choosing
- This can be extremely simple, but should demonstrate you’ve thought
about how plan to keep your repo organized.
- At a minimum, your README should include the following markdown
elements:
- 1 header (any level)
- bold italics or
strikethough
- Part 2: .gitignore
- Create a .gitignore file in the top level of your repo if it does
not exist
- You can start with GitHub’s default R .gitignore template
- Edit the .gitignore to include:
- A dedicated directory called localonly
- All files of specific types that you want on your computer but don’t
want to upload to GitHub
- One specific file that is not in an ignored folder and is not an
ignored filetype
- Add a comment for each
NOTE: A primary goal of this assignment is for you
to practice answering your own questions. We have not formally covered
either of these files in detail. It’s you job to figure out the point
and the implementation of each. There will be a slide in the W2C2
lecture with “verbose” instructions for the .gitignore. Do your best at
this point, then make adjustments as needed after next class.
Thursday, January 11 2024 (2.2)
Pre-class preparation
- Confirm tidyverse and papaja packages are successfully installed and
load without errors.
- Remember that papaja in particular often requires
troubleshooting!
- Read:
- Intr2R Chapters 7 & 8
- R Markdown Definitive Guide Sections 2.5, 2.6, & 3.2
Class meeting
slides
- Lecture: Fundamentals of R & Markdown
- Assignments:
- (4) Create
hello_world()
function
- (5) Update .Rmd YAML, markdown, and code
chunk
- Note that these are 2 separate
assignments! You will reference the .R file you create in
Assignment 4 in your Assignment 5 .Rmd file, but they will count as two
assignment credits because each will be as much effort as the single
assignment most other days. The
hello_world()
function in
(4) may be particularly challenging if you are brand new to programming.
Plan accordingly!!
- Create a hello_world() function in an .R script dedicated to
defining functions
- This should be the script you eventually source in your .Rmd with
project-specific functions (if necessary)
- Your function should include:
- 1+ object assignment
- 1+ conditional statement
- CHALLENGE: 1+ for or while loop
- Your functions should take at least 1 argument, such as:
- name (string)
- time_of_day (numeric or POSIX)
- is_morning (boolean)
- return_n_greetings (integer)
- Your function should have at least 2 possible return values, such
as:
- hello, class
- Good morning, Dr. Dowling!
- Sup?
- bonjour mes amis
- Feel free to get creative here! This will feel less tedious if you
can make yourself laugh while you do it. (It will make grading more
enjoyable, too.)
- Update the YAML header of your .Rmd, minimally:
- Title
- Short title
- Author(s)
- Call your hello_world() function within your .Rmd:
- source() the R script where you defined the function
- Add a chunk that calls the function
- Don’t forget to follow best
practices for naming and placing chunks!
- Add/edit markdown to include at least (lorem ipsum is ok):
- 1st, 2nd, & 3rd headers (1 of each)
- 1 unordered list & 1 ordered list
- Bold & italicized text
- 1 linked URL
- 1 HTML-style comment
Module 2: The Tidyverse
Tuesday, January 16 2024 (3.1)
Pre-class preparation
- Read chapters 7 (data import) and 3 (data transformation) in
r4ds.
- Download and preview the cheatsheets for readr and dplyr.
- Grab the ones for ggplot2, tidyr, forcats, and stringr if you want
to look ahead!
- Ensure that your data - or a portion of it - is available in a
tabular format (like a .csv file or 1 sheet of an Excel file or
GoogleSheets document)
- Have RStudio and your tabular data file available to work with at
the start of class!
Class meeting
slides
- Lecture: Welcome to the Tidyverse
- Assignment: (6) Data read-in and -out,
tidy data evaluation
- Create (or update) a code chunk in your .Rmd for data read-in
- Use readr to read in your data (and any supplementary data files you
may have)
- Don’t forget to give your chunk a unique and informative name!
- Examine your dataset in RStudio. Answer these questions with
comments in your read-in chunk
- Is it tidy? Remember that “tidy” means:
- Each column is a single variable
- Each row is a single observation
- Each cell is a single measurement
- If not, what data wrangling is necessary to tidy it?
- Which variables, observations, and measurements in your data are
absolutely necessary for your plans? Which are (likely) extraneous?
- Create (and name!) a code chunk to create an intermediate dataset
- Use 2+ dplyr functions
- Use readr to write your intermediate dataset to a .csv, .xlsx, or
other tabular filetype
- Examine your new intermediate data file in Excel (or similar)
Thursday, January 18 2024 (3.2)
Pre-class preparation
- Review the cheatsheets for core tidyverse packages
- Read chapters 5, 14, & 16 (& optionally ch15) in r4ds
- Pull the most recent version of the d2m-2024 repo to try out the
tidyverse functions demo code in
demo-snippets.R
.
Class meeting
slides
- Lecture: Data Preparation in the Tidyverse
- Assignment: (7) Mock up a preliminary
dataset & start data prep
- Tidyverse cheat sheets:
- R for Data Science:
- Conceptualize the dataset you’ll need to create (this is just to get started, not a
submission). Some ideas (not tasks!) to help get you thinking:
- Draw out a couple mock figures you’d like to include and/or
articulate your research questions and hypotheses
- How will you need to organize your variables (and within-variable
groups) to produce those figures and/or analyses? What data types will
your variables need to be? - Does anything that currently looks like
string, numeric, or logical data need to be handled as factor?
- Mockup a preliminary dataset & begin prep - PUSH TO YOUR
REPO:
- A tabular mock-up of your ideal data structure. Create this in Excel
(or similar) either by manipulating your actual data manually or
creating fully mock data
- You can do this in R if you want, but the goal is more to
get a solid concept of data structure rather than actually executing the
data wrangling
- Building on or replacing the code you wrote in assignment (6),
create an .R script called
data-prep.R
:
- Use comments to write out in plain English the prep steps you expect
to carry out in order to make your existing data look like your mock-up,
including data import, wrangling, and exporting intermediate
datasets
- Make note of anything that you know you’ll need to do conceptually
but don’t yet know how to execute in R
- Begin to fill in the parts of your data prep you feel comfortable
starting, e.g., import/export, filtering/selecting, object
assignment
Tuesday, January 23 2024 (4.1)
Pre-class preparation
No new material this week! We’ll walk through a data-prep
demo and work in groups on Assignment (8) (and your own data prep time
permitting).
- Tiny bit of new material: combining data with binds and
joins
- Come prepared to make the most of your time by being thoughtful and
thorough in completing Thursday’s assignment (7) & having quick
access to the tidyverse documentation and cheatsheets from last
week
- Office hours appointments for the
required final report planning meetings begin this week
and continue through Friday of 7th week (Feb 16). Book a 20 minute
slot here. See box below for more details.
Class meeting
slides
- Demo: Data Manipulation in the Tidyverse
- Plus super-quick overview of combining data in R
- Assignments:
- (8) Recreate this dataset
- Continue your data preparation
- Office hours appointments for the required final
report planning meetings begin this week and continue through Friday of
7th week (Feb 16).
- Book a 20
minute slot here. See box below for more details.
- The intention of these meetings is to set clear expectations for
your final project, based both on general course requirements and your
specific project goals. You should come prepared to discuss your
project’s aims and current status, but it is not necessary to
bring in any concrete work. Please sign up for regular
office hours outside these appointments if you’d like more hands-on
support for your project.
- Note that this appointment schedule includes literally all
my availability. Do not delay booking an appointment, as I will
not be able to accommodate requests to meet outside these
times.
- Recreate the modified starwars dataset:
- Copy the file
starwars.R
in the d2m-2024 repo into your
own project repo
- Follow the instructions in the file to import and examine the “goal”
tibble
- Manipulate the built-in starwars tibble to create a new object
called
sw.wrangled
- When you type
sw.wrangled
into the console, you should
see exactly what’s in the image:
d2m-2024/images/starwars-goal.png
- Don’t forget to check number of rows and column data types
- If you can’t figure everything out, add comments explaining what you
would like to do but aren’t sure how
Continue prepping your data! We will finish data wrangling on
Thursday. Your data should be fully prepped and ready for analysis and
visualization in the tidyverse by Week 5.
Thursday, January 25 2024 (4.2)
Pre-class preparation
- No new material! Aside from a brief discussion about final projects,
the whole class will be time to work on data preparation with support
from your group and instructors. Review and adjust your plans from
assignment (7) ahead of time to make the best use of your time.
Class meeting
- Brief discussion of final report requirements
- Lab: Data Manipulation in the Tidyverse
- Assignments:
- (9) Finish prepping your data!
- Before your final commit, be sure that your scripts include
sufficient and informative comments
- (10) Imagine two plots you would like to
see of your data. Add comment blocks into your rmd describing (in plain
English, not code) what elements will be necessary to create each
plot.
Finish preparing your data for analysis and presentation! Use
tidyverse tools to create the mock-up data you created in Assignment
(7). Be sure your data frames(s) is/are in a tidy, long format. In
particular, as you imagine plots you’d like to create in Assignment
(10), be sure that any measures you want to include on x or y axes are
each in one column and that any groups you might want to compare against
each other are in one column
e.g.: You want to compare number of gestures used by parents and
children. Your original wide,
untidy dataset includes 2 measurement columns:
parent_gestures
and child_gestures
. Your long, tidy has one
speaker
column with values parent
and
child
and one n_gestures
column with the
number of gestures produced by that subject at the observation.
- Mock up 2 plots (of 2 different types) you could include in your
final report
- Add code chunks to your .Rmd for each plot and give each a unique,
informative name
- Add plain-English comments within your chunks explaining the goal
for the plot. To start thinking about this, ask yourself questions like:
- What variables are on the axis/axes?
- What kind of plot is it?
- Will you need to compare across groups?
- What would you anticipate seeing if your hypotheses were borne out
in your analyses?
- Create a visual of each mock-up that render in your .Rmd when you
knit. This can be either:
- A first attempt at the ggplot code itself if you are already
familiar with ggplot
- .jpg/.gif/.png files created outside R (e.g., in Excel, in MS paint,
sketched by hand or on an iPad) and called in the text below the
corresponding code chunk using R markdown
- Look back at the slides or the R Markdown cheatsheet if you don’t
remember how to insert an image file!
- Due by 5pm Friday Week 7: Feb 16, 2024
- After your report plan meeting with Dr. Dowling, write up a brief
summary of what you discussed and submit to the “Report plan” assignment
on Canvas. Your write-up should include:
- The specific plans for you analysis, figure, and table chunks
- A general description for how you will distribute the 1500+ words
across 4 sections
- At least 2 elements of code produced in code chunks that you can
referent in text (e.g., a p-value from the model you run in your
analysis chunk, a value within the table you created in the table chunk,
a summary value like mean or median that you calculated and stored as a
variable)
- 2-4 sentences explaining your personal goals for the final report.
What do you want to have accomplished or learned by the time you submit
your report at the end of the quarter?
Module 3: Presentating Data
Tuesday, January 30 2024 (5.1)
Pre-class preparation
- Remember to book your final report
meeting with Dr. Dowling and submit the meeting summary to Canvas (see
details in W4C2 box above)
- Read chapters on data visualization and ggplot in r4ds: 1* and 9, 10, 11*
- Review the ggplot2
cheatsheet
Class meeting
slides
- Walkthrough of Assignment (8) Recreate this dataset
- Follow along with script in
d2m-2024
repo
- Lecture: Introduction to Data Visualization and ggplot2
- Assignment: (11) Create basic plots (2
parts: recreate these plots & add 1 plot to .Rmd)
- submit BOTH to receive credit for
Assignment (11)! Commit your final submission of both at
the same time in just one commit
- submit BOTH tasks to receive credit
for Assignment (11)! Commit your final submission of
both at the same time in just one commit
- Part 1: Recreate these plots (basic)
- In your existing
starwars.R
file from Assignment (8),
replicate the following three plots using dplyr and ggplot2 functions
(these can be found in the d2m-2024
repo)
- You do not need to save these as image files
- If plotting the dataset you created in Assignment (8) isn’t working
as expected, reproduce it with the code provided during the class
walkthrough
- submit BOTH tasks to receive credit
for Assignment (11)! Commit your final submission of
both at the same time in just one commit
- Part 2: Add a plot to your .Rmd
- Add code to one chunk created in Assignment (10)
- This can be very basic for now, but should at an absolute minimum
contain the appropriate geom layer with correctly specified
aesthetics
- Add/edit comments to make note of what you want to do to “fine tune”
or “pretty up” your plot (e.g., changing theme, adding labels, changing
how axis and legend labels appear, grouping by color, faceting,
etc.)
Thursday, February 1 2024 (5.2)
Pre-class preparation
- Refer to the ggplot2 resources from last class!
Class meeting
Lecture: Aesthetics & Layers
- Going rogue: Messing around with ggplot live on air
- Assignment: (12) Create intermediate
plots (2 parts: recreate these plots & add 1 (more) plot to
.Rmd)
- submit BOTH to receive credit for
Assignment (12)! Commit your final submission of both at
the same time in just one commit
- submit BOTH tasks to receive credit
for Assignment (12)! Commit your final submission of
both at the same time in just one commit
- Part 1: Recreate these plots (intermediate)
- In your existing
starwars.R
file from Assignments (8)
& (11), replicate the following three plots using dplyr and ggplot2
functions (these can be found in the d2m-2024
repo)
- You do not need to save these as image files
- If plotting the dataset you created in Assignment (8) isn’t working
as expected, reproduce it with the code provided during the class
walkthrough
- submit BOTH tasks to receive credit
for Assignment (12)! Commit your final submission of
both at the same time in just one commit
- Part 2: Tune up your plot and add another to your
.Rmd
- Tune up the plot you made in Assignment (11), for example think
about:
- Did you use the best kind of plot to tell the story you’re hoping to
tell?
- Did you have to do some wonky wrangling to plot what you
wanted?
- Are there adjustments you could make to scale, coord, or labs layers
that would make things easier to read?
- Are you effectively making use of groups? Could you communicate your
point better by grouping in a different way (e.g., color, shape, fill,
facet)? Add code to one chunk created in Assignment (10)
- Add/edit comments to make note of what you want to do to make your
plot “pretty” like customizing colors or fonts, tweaking the theme, or
relabeling in the legend
- Add in the second plot you planned in Assignment (10). At a minimum,
replicate what you did in Assignment (11) for the first plot. Add some
“tune-ups” if you can!
Tuesday, February 6 2024 (6.1)
Pre-class preparation
- Refer to the ggplot2 resources from last week
- Read up on ggplot theme layer customization in the documentation
and ggplot2-book
- These are going to be essential references when you do Assignment
(13)!
- If you haven’t signed up for a report
meeting, do that right now. Slots are filling up
fast, and report meeting slots during week 7 are competing with MAPSS
pre-registration meeting slots. I cannot meet outside the listed
appointment times. Failing to meet with
me for a report meeting will result in an automatic 10 points off your
final report grade.
Class meeting
slides
- Lecture: “Pretty Plots” & .Rmd Figure Chunks
- Assignment: (13) Create advanced plots
(2 parts: recreate this hideous plot & tune up your own plots
- ggplot theme layer customization:
- From last week:
- submit BOTH tasks to receive credit
for Assignment (13)! Commit your final submission of
both at the same time in just one commit
- Part 1: Recreate this plot
- In your existing
starwars.R
file from Assignments (8),
(11), & (12), replicate this using dplyr and ggplot2 functions
(these can be found in the d2m-2024
repo)
- You do not need to save this as an image file
- If plotting the dataset you created in Assignment (8) isn’t working
as expected, reproduce it with the code provided during the class
walkthrough
- There’s a lot going on
visually in this plot. Start by creating a base plot with just
the required
aes()
mappings and geoms, then making a
commented list of all the different modifications you can spot, whether
or not you have any idea how to replicate them. Some hints:
- You’ll need 13 elements in your
theme()
layer
after setting one of the built-in themes
- Install, load, and investigate the
ggsci
package to
choose a scale layer
- Look for a list of “web-safe font families” to find the right
fonts
- Neither the
limits
nor breaks
arguments
are required in your axis scales
- Remember that facet text can’t be modified in the
labs()
layer
- You can use a color selector (i.e., eye-dropper) tool like this one to identify
precise colors
- If something doesn’t look right even with the precise hex code, it
could be an effect of transparency (“alpha”) somewhere along the
way
- Try to match as much as you can, but don’t let it drive you crazy.
Set a timer to keep working, then add/edit comments to describe what is
still left to be done. Your completion grade is going to be based on
putting in a good-faith effort, not getting a perfect
replication
- submit BOTH tasks to receive credit
for Assignment (13)! Commit your final submission of
both at the same time in just one commit
- Part 2: Tune up your plots & .Rmd code chunks
- Tune up the plots you made in Assignments (11) & (12), for
example think about:
- Did you use the best kind of plot to tell the story you’re hoping to
tell?
- Did you have to do some wonky wrangling to plot what you
wanted?
- Are there adjustments you could make to scale, coord, or labs layers
that would make things easier to read?
- Are you effectively making use of groups? Could you communicate your
point better by grouping in a different way (e.g., color, shape, fill,
facet)?
- “Prettify” your plots with
scale_*()
,
theme()
, and labs()
layers, as well as
“aesthetic” non-aesthetics (i.e., visual attributes outside the
aes()
and so not mapped to your data)
- Use code chunks in your .Rmd to render publication-ready plots when
knit that:
- Are in their own uniquely named chunks
- Have a caption defined and referenced in the chunk’s fig.cap
option
- Include appropriate and reader-friendly labels and text (no
inscrutable axis or level names!)
- Include a theme layer with 1+ customized element (e.g., font, legend
placement, background color)
- Bonus: Add a chunk that reads in one image file,
like a photo, diagram, or saved ggplot
- Give it a chunk caption just like with your plot chunks
Thursday, February 8 2024 (6.2)
Class meeting
slides
- Lecture: Tables & Kables
- Assignment:
Recreate this kable
- (14) Add a table/kable to your
.Rmd
- Add a table/kable chunk to your .Rmd. Include:
- A unique chunk name
- An informative caption
- Reader-friendly column names
- At least 3 style modifications, for example (but not limited to):
- grouped header rows
- font or color changes
- table alignment or positioning
- scaled table size
- Reference your table and both your figures from
your recent assignments in the text of your .Rmd. Be sure that when you
knit the document you see the figures numbered in the order you
expect.
Tuesday, February 13 2024 (7.1)
Pre-class preparation
- (15) Imagine (at least) two analyses you
would like to run with your data, including 1+ descriptive analysis and
1+ hypothesis testing analysis. Add comment blocks into your rmd
describing (in plain English, not code) the analyses you intend to run,
for example listing the independent and dependent variables, explaining
your hypothesized results, and identifying precisely which tests are
appropriate.
- This is a good place to check in with your co-authors or advisor
before moving forward!
- REMINDER: You must meet with
Dr. Dowling by the end of this week (i.e., before end of day Friday,
February 16th) to negotiate your plan for the scientific report
assignment, which is due on Tuesday of Finals Week.
- Don’t delay! More than 25 of these meetings need to happen; don’t
count on available meeting times at the end of 7th week. You can sign up
for office hours any time in weeks 5-7 to make your report plan. Extra
slots will be added to my office hours sign-up schedule for these weeks.
Look out for a Canvas announcement sometime in week 3 or 4 for when
those times are posted.
- Failing to meet with me for a report meeting
will result in an automatic 10 points off your final report
grade.
Class meeting
No slides this week! Demo only.
- Demo: Data Analysis Part 1
- Refer to
stats-demo.Rmd
in the d2m-2024
repo
- Assignment: (16) Add 2 analysis chunks -
1 descriptive, 1 hypothesis testing
- And complete Assignment (15) in the pre-class prep if you didn’t
before class!
Imagine (at least) two analyses you would like to run with your data,
including 1+ descriptive analysis and 1+ hypothesis testing analysis.
Add comment blocks into your rmd describing (in plain English, not code)
the analyses you intend to run, for example listing the independent and
dependent variables, explaining your hypothesized results, and
identifying precisely which tests are appropriate.
- Based on your plans from Assignment (15) (go complete that if you
haven’t already!) fill in your planning chunks with basic analyses
- One chunk should include descriptive statistics and another should
include a hypothesis test
- More than one of either or both works too! Just make sure you have
at least one of each
- For anything we covered in class today (descriptives and basic
hypothesis testing), create as complete and polished code as
possible
- For anything more advanced, explore the
psych
,
lme4
, and stats
packages and give it your best
shot based on the info you can find. You’ll clean it up after Thursday’s
class.
Thursday, February 15 2024 (7.2)
Class meeting
No slides this week! Demo only.
- Demo: Data Analysis Part 2
- Assignment: (17) Create a partial draft
of methods/results with 5 in-line code references
- Complete the code chunks for the two analysis chunks you have been
working on in Assignments (15) and (16)
- In the narrative of your R Markdown manuscript, include at least
5 in-line code references
- By “in-line code” I mean using R Markdown to interpret some text
outside of a code chunk as R code, so that it runs it as code
and displays the resulting output as fully in-line in your regular old
text
- Look at the
stats-demo.Rmd
file in the
d2m-2024
repo for lots of examples! Anything that starts
with r
is what I mean here.
- Remember that just using backticks without the
r
is
just formatting, not code. (Everything on this page that looks like code
in monospaced font is done that way.)
- At least 1 of these in-line code refs should “do something.” In
other words, it should be more than just a value reference; it should
include a function.
- Examples of value calls:
`r model$p.value`
`r median_mpg`
`r mass_height_corr3$ci2[["r"]]`
- Examples of in-line function code:
`r apa_p(model$p.value)`
`r nrow(filter(mtcars, mpg > median(mpg)))`
`r round(mass_height_corr3$ci2[["r"]], 3)`
Module 4: Creating a Manuscript
Tuesday, February 20 2024 (8.1)
Pre-class preparation
- Review the readings & references box below to get a head start
installing Zotero, Better BibTeX, and the
citr
package.
- Pull the most recent version of the
d2m-2024
repo and
browse the bibtex-demo
directory.
Class meeting
slides
- Lecture: BibTeX References, citr, & Zotero
- Workshop: Set up your ideal citation enviornment and start
citing
- Assignment: (18) Create a .bib file with
5+ references
- Finish downloading and setting up Zotero, Better BibTeX, and the
citr
package add-on
- Create a .bib file
- Since you’ve been working in a papaja document all quarter, it’s
likely a .bib file was automatically generated or that you made a blank
one so that your .Rmd would knit properly. If so, you can continue to
use that one or create a new one, just make sure it is named
appropriately and referenced correctly in the .Rmd where needed
- Use the papaja default (
r-references.bib
) or give it
some other sensible name
- Add 5+ relevant references for your manuscript
- Try to do different entry types if you can (journal article, book,
book chapter, conference paper, etc.)
- Already wrote a lit review for your project/proposal?
Great! You don’t need to rewrite it. You can copy and paste your
existing work into your RMarkdown notebook, edit formatting as needed,
and then swap out the in-text citations you wrote out manually for
dynamic bibtex references
- Knit your .Rmd to .pdf and confirm that the citations render as
expected
- Error messages? Add/edit/check the name of your .bib file in the
YAML header
- Unexpected formatting?
- Check your .bib file to see if the problem is in that file (in which
case, correct it)
- If not, debug by pinpointing the problem and following the
troubleshooting guidelines from W2
Thursday, February 22 2024 (8.2)
Pre-class preparation
Come to class with all your BibTeX tools from last class installed
and fully functional! Troubleshoot any knitting issues with Assignment
(18) before the beginning of class.
Class meeting
slides
- same slides as last class
- Lecture: BibTeX References, citr, & Zotero (continued)
- Assignment: (19) Add 10+ in-text
references to your manuscript in at least 3 different
formats
Building on Assignment (18), continue adding BibTeX references to
your literature review (or anywhere in your narrative text). Your repo
should now include (among other things):
- A .bib file with at least 10 BibTeX entries
- With at least 3 different source types (e.g., article, book,
conference paper, policy memo, etc.)
- A papaja .Rmd with at least 10 in-text bibtex references
- With at least 3 different formats (e.g., adding extra text, using
year only, including chapters, etc.)
- You can rewrite or remove unnecessary extra formats later for your
final project, but use them in this assignment for the sake of
practice
- A .pdf file created when you knit your papaja notebook rendering all
bibtex references error free
- Everything is correctly APA formatted both in the text (e.g.,
alphabetical order, no random text, last names only, etc.) and the
entries in the references section
- There are no formatting or input errors in the bibtex entries (e.g.,
broken special characters, incorrect capitalization, missing or
misattributed fields, etc.)
Tuesday, February 27 2024 (9.1)
Class meeting
slides
- Housekeeping: Final reports, self-evaluation, last class
planning
- Lecture: Putting it Together
- Assignment:
- (20) Clean up your repo & revise
your .README as needed
- Revise your .Rmd (if needed) to include required elements
- Clean up your repo
- Revise your README to reflect the current state of the project:
- Narrative description of project purpose, may include project
abstract
- Representation of current directory structure
- Remove or archive old files (add to .gitignore as needed)
- Review and update your .gitignore
- Revise your .Rmd so that it knit to PDF without errors and minimally
includes:
- 1 table
- 1 figure
- 1 extracted value reference
- 1 in-line function (or math)
- 10 bibtex citations
Thursday, February 29 2024 (9.2)
Class meeting
- Lecture: Polishing & Publishing
- Assignment: Final reports due next
week!
Final report & participation self-assessment
Tuesday, March 5 2024
Scientific report due. Push all materials
to GitHub AND submit a .pdf (.doc acceptable if you
absolutely insist) of your knit report to Canvas.
Participation self-assessments due,
submitted via Canvas.
Course policies
Accessibility and Accommodation
I am committed to making this course accessible to students of all
backgrounds, identities, and abilities. If there are circumstances that
make aspects of this course difficult for you to access, please contact
me so we can discuss how to accommodate your needs. This includes, but
is not limited to, accommodations around the format of course materials,
the use of Canvas and other digital resources, the classroom and other
physical resources, and the structure of assignments.
I will work with you to create an accessible learning environment
whether or not you disclose your disability or personal circumstances.
If you choose to disclose personal information with me, I will keep
those discussions confidential. For certain accommodations you may need
to contact Student Disability Services at (773) 702-6000 or
disabilities@uchicago.edu. If you have a documented disability (or think
you may have a disability) and, as a result, need a reasonable
accommodation to participate in class, complete course requirements, or
benefit from the University’s programs or services, I encourage you to
contact Student Disability Services.
Diversity, Inclusion, and Community
We will commit as a class to creating a welcoming, respectful, and
productive classroom. We will expect each other to be mutually
respectful of our meaningful identities. I expect that when we engage
with each other in discussion we are considerate of the diversity of our
classroom with regards to gender, sexuality, disability, race,
ethnicity, religion, socioeconomic status, immigration status, and
language. It is critical that we maintain respectful dialogue in the
classroom, which includes using correct names and pronouns. If a member
of our community – including myself – is creating an unwelcome space for
you, I hope you will bring this to my attention immediately.