GLHLTH 590, Duke University, Spring, 2021

This course will introduce you to data science and data visualization in R. The core content of the course focuses on data acquisition and wrangling, exploratory data analysis, data visualization, inference, modelling, and effective communication of results. My goal is to bring you from zero to being able to work in a team on a fully reproducible data science project analyzing a dataset of your choice and answering questions you care about.

Teaching Team

Eric Green, PhD

Sid Zadey, TA


The course content is organized into five units:

Unit 1 - Hello world: This unit will introduce you to the content, pedagogy, and toolkit of the course.

Unit 2 - Exploring data: This unit will focus on data visualization and data wrangling. Specifically we’ll cover fundamentals of data and data visualization, confounding variables, and Simpson’s paradox as well as the concept of tidy data, data import, data cleaning, and data curation. We’ll end the unit with web scraping and introduce the idea of iteration in preparation for the next unit. Also in this unit you’ll get an introduction to the toolkit: R, RStudio, R Markdown, Git, and GitHub.

Unit 3 - Data science ethics: In this unit, we’ll discuss misrepresentation of findings, particularly in data visualisations, breaches of data privacy, and algorithmic bias.

Unit 4 - Making rigorous conclusions: In this unit, you’ll learn about modelling and statistical inference for making data-based conclusions. We’ll discuss building, interpreting, and selecting models, visualizing interaction effects, and prediction and model validation.

Unit 5 - Looking forward: In the last unit, we’ll take a sneak peak at a few advanced topics to leave you eager to continue your learning.


Week Date Unit Topic
1 Jan 21 Hello world Welcome
2 Jan 26/28 Hello world Meet the toolkit
3 Feb 2/4 Exploring data Visualizing data
4 Feb 9/11 Exploring data Wrangling and tidying data
5 Feb 16/18 Exploring data Importing and recoding data
6 Feb 23/25 Exploring data Communicating data science results effectively
7 Mar 2/4 Exploring data Web scraping and programming
- Mar 9/11 - No class/catch up
8 Mar 16/18 Data science ethics Data science ethics
9 Mar 23/25 Making rigorous conclusions Modelling data
10 Mar 30/Apr 1 Making rigorous conclusions Classification and model building
11 Apr 6/8 Making rigorous conclusions Model validation and uncertainty quantification
12 Apr 13/15 Looking further Chef’s choice
13 Apr 20/22 Looking further Wrap up

Links will be posted the Saturday before each week begins.


There are no prerequisites to take this course aside from a good dose of curiosity and interest in learning about R, data visualization, and data science. All you need to participate is a computer with an internet connection. We’ll use RStudio Server to run R in your browser, Zoom to connect each week, and Campuswire to communicate and share ideas (and struggles).

To login to R, please go to, authenticate with your Duke NetID, and click to open your RStudio environment. You can (and should) also download and install R and RStudio Desktop on your computer as well, but please the browser version for coursework.


You’ll prepare for each class session by watching recorded video lectures and coding alongside. During class, we’ll use our time together to review the lessons and complete application exercises and labs. You’ll then complete one homework assignment each week that will be due Sunday at 10pm ET. To tie it all together, you’ll work alone or with a partner to complete a final project with a dataset of your choosing. There are no exams (including no final exam).


You will be evaluated on the basis of your weekly assignments (50%) and independent visualization project (50%). Ranges for letter grades will be set at the end of the semester. Cumulative scores of at least 90, 80, and 70 will be guaranteed at least an A-, B-, and C-, respectively.


Equity and Inclusion at DGHI

On June 10, 2020 DGHI formed the Equity Task Force (ETF) to identify and address structural inequities related to global power dynamics, race, ethnicity, gender, and all marginalized identities throughout the institute. While it is expected that these goals will be met with gradual progress, the urgency for change and the need to learn from past mistakes must be acted upon immediately.

As our institute begins the process of making these changes, I will attempt to do the same throughout my courses. We as a class will honor all requests to address students by pronouns or names with which they are most comfortable, and I welcome your input on things I can do to make the course materials, my lessons, and the classroom experience more inclusive. I invite you to keep an open line of communication with me throughout this process about any areas where you think I am or the institute is excelling or could use some improvement. If you don’t feel comfortable reaching out to me about areas of concern please contact the ETF co-chairs (Kate Whetten, Kim McNeil or the relevant steering committee members.

If you require any additional accommodations or arrangements (hearing, vision, English language comprehension, extenuating personal or family circumstances, etc.) please let me know ASAP and, if relevant, please contact the Student Disability Access Office to ensure the accommodation/arrangements can be implemented in a timely fashion.