DA 101 - Introduction to Data Analytics

Fall 2022

Your Professor:

Matt Lavin

My Email:

lavinm@denison.edu

My Office:

Burton D. Morgan Center 411

Office Hours

10 to 11:20 a.m. MW by appointment; drop-ins 2:30 p.m. to 3:30 p.m. on Thursdays

Our Classroom:

Burton D. Morgan Center 219

When We Meet:

MWF 11:30 a.m. - 12:20 p.m.

When the Lab Meets:

Tuesdays, 1:30 p.m. - 4:20 p.m.

Course Description

Many of the most pressing problems in the world can be addressed with data. We are awash in data, and modern citizenship demands that we become literate in how to interpret data, what assumptions and processes are necessary to analyze data, as well as how we might participate in generating our own analyses and presentations of data. Consequently, data analytics is an emerging field with skills applicable to a wide variety of disciplines. This course introduces analysis, computation, and presentation concerns through the investigation of data driven puzzles in wide array of fields – political, economic, historical, social, biological, and others. No previous experience is required.

By the end of the course, you should be able to:

  • Identify, describe, and use different formats of data and data sources in class discussion and during lab projects
  • Collect, clean, store, and extract data needed for an analysis during lab projects
  • Write basic computer programs using RStudio for a reproducible data analysis workflow
  • Create data visualizations and extract and interpret meaning from the visual information
  • Perform statistical analysis on a dataset, and interpret the results
  • Reflect and evaluate on ethical, social, and legal issues in data collection, analysis, and security in discussion and in class projects using real datasets
  • Communicate and interpret all aspects of data analysis (data, cleaning, analysis, results) to a diverse, technical or non-technical audience, in oral, visual, and written format
  • Synthesize the above skills to create & present a new, independent data analysis project

Office Hours

This semester, I will be using a mix of drop-in office hours and in-person appointments via Google Calendar. For office hours by appointment, visit my appointment page, where you will see a real-time account of when I am available. My standard appointment slots will be divided into 20-minute blocks from 10 to 11:20 a.m. on Mondays and Wednesdays. Note that these appointment slots will disappear from my calendar once I've been booked. Please book appointments at least 24 hours in advance. If I ever need to cancel by-appointment office hours on a given day (say, for example, if I'm ill), I will update the calendar and email anyone with an appointment.

Drop-in office hours will be held in my office from 2:30 to 3:30 p.m. on Thursdays. For these, you will not need an appointment, but I will see students in the order they arrive, so there is no guarantee that I will have time for everyone on a given day. If your question is time sensitive, you should make an appointment. If I ever need to cancel office hours on a given drop-in day (say, for example, if I'm ill), I will e-mail the entire class.


Additional Norms and Policies


Here you will find information on required readings, import university policies, and course-specific policies like attendance and cell phone use.

Required Texts

R for Data Science (Wickham and Grolemund), ISBN-13: 978-1491910399
Free online (http://r4ds.had.co.nz) or order print edition online by matching the ISBN
Additional selected readings will be made available as html or pdf, and linked to the course website or shared via Canvas

Software

All projects in this course will be scripted and analyzed using R, an open source data analysis language and environment. No previous experience with R, statistical software packages, or computer programming is required. Specifically, we will be using RStudio as our programming environment. Instructions for installing R and R Studio will be posted to the calendar below. Note: RStudio is being renamed to Posit in October.

Grading and Feedback

Since there are multiple sections of DA 101 every semester, the various instructors work hard to make sure there is approximate parity in terms of content, workload, and expectations. However, it is also true that each professor has their strengths, areas of interest, and priorities, and I'm sure I'm no exception. We'll have to spend some time together for you to get a real sense of what I value, how I grade, etc., but I look forward to that process, and to getting to know you all better more generally. One of the big advantages of a school like Denison is that, if you want to work with me again, you'll probably be able to, whether in another data analytics course, a summer research fellowship, or some other capacity. 

As a general rule, the expectations in this course are high, and I'm confident you can all do great work. The feedback I provide on assignments is designed to help you get there. My goal is to provide specific, relevant, and honest feedback when I grade your work. This will include constructive criticism, strategies for improvement, and guidance on how students can achieve success. I will not do "compliment sandwiches" just to begin and end on a positive remark, but this means that, when I praise your work, it's an honest (and I think more meaningful) act of praise. 

Regarding the major assignment rubric, it is adapted from the standards that the data analytics program uses for all its majors. I don't expect your work to meet the same standards as a graduating senior, but I think using the same categories on our rubric will give you a better idea of what it might mean to major in data analytics, as many of you are hoping to do. If you don't want to be a data analytics major, these criteria are still highly relevant to almost any program of study. 

Item Description
Assignment Process: All materials are turned in on time and in the right place. Assignment directions are followed. Required components are all present and submitted on time.
Attention to Detail: The project is well organized, flows logically, and follows the all formatting guidelines, including attention to proofreading, proper citations, and language that is appropriate to a well-informed, non-technical reader.
Research Question and Research Design: The project has a focused and well defined research question that can be addressed with computational, data-driven analysis. The focal data set and method(s) are appropriate for the research question.
Data, Visuals, and Code: The data are fully described, properly sourced, and presented in appropriate ways. Visuals (tables, charts, graphs) are used effectively to describe multiple aspects of the research project (data, methods, or results). The paper provides sufficient details and/or points to supplementary materials that make the research reproducible by a technical reader (i.e, detailed footnotes, appendices, GitHub, code, etc.)
Data Analysis Methods: The method(s) used to test the research question is justified, validated, and applied appropriately; the student appropriately describes the strengths and weaknesses of the methods used; outside sources are used to justify how the methods are used and interpreted.
Reporting and Interpretation of Results: The results are interpreted correctly and clearly address the research question; the project discusses its limitations, the extent to which it can be generalized, and expansion to further research.
Ethical Considerations: The writing thoughtfully engages any ethical considerations of using the data, methods, and implications of communicating the findings.

Grade Breakdown

Item Percentage Comments
Attendance and Participation 15 Attendance will be taken every day. Late arrival counts as half an absence. Participation will be assessed using a mix of preparedness, speaking during class discussions, remaining attentive during lectures, and completing in in-class assignments.
Oral Presentation 5 Individual assignment
Quizzes 10 Individual assignments
Data and Code Ledger 10 Individual assignments
Lab Work and Lab Reports 40 Individual and team-based assignments
Final Project 20 Individual assignment (see assignment description)

The Quantitative GE Requirement

The goal of the quantitative reasoning requirement is to develop the skills of all students in the descriptive, analytical, and predictive aspects of quantitative reasoning. A course fulfilling this requirement must utilize numerical quantities and employ, as an integral and sustained part of the course, at least one of the following forms of quantitative reasoning.

  1. the application of mathematical models to describe or predict the behavior of systems, and the design, construction, and interpretation of graphical representations of mathematical models.
  2. the utilization, numerical analysis, and interpretation of the significance and limitations of data to answer questions, test hypotheses, or solve problems, and the design, construction, and interpretation of graphical representations of numerical data.

Late Work

The pace of this class is fast, especially in the first five weeks of the semester when we are focusing on core proficiencies. If you have a legitimate emergency such as a serious illness, a mental health emergency, or a death in the family, I will grant an appropriate extension with a new due date. The trade off is that work turned in this way is probably not going end up in my hand when I grade everything else, so it's going to get very sparse feedback. If you miss a deadline entirely without getting an extension, you will automatically lose 10 points off the top of your grade for each day it is late, in addition to any points you lose for the quality of the work. Retroactive and last-minute extensions will not be granted.

Distractions

Cell phones should be off and put away. Laptops are okay for notes and such but, during Friday discussions, I will ask you to put them away, as you won't need to take notes on those days. When laptops are being used, you should not be messaging, using Facebook, etc. I will check screens regularly give you a verbal warning on your first offense. After that, I reserve the right to ask you to leave class and mark you absent if you are creating a distraction.

Being Prepared for Class

Coming to class prepared means that you have the day's reading in hand (printed or digital) and have come to class with a way to take notes (printed or digital). If you are not prepared for class, I reserve the right to grade as if you were absent for that day. Anything due on a given day is due at the start of class. Any digital submission of material is due by the time class starts on the day the hard copy is due.

Disability Resources

If you are a student who feels you may need an accommodation based on the impact of a disability, you should contact me privately as soon as possible to discuss your specific needs. I rely on the Academic Resource Center in 020 Higley Hall to verify the need for reasonable accommodations based on documentation on file in that office.

Academic Integrity

Proposed and developed by Denison students, passed unanimously by DCGA and Denison’s faculty, the Code of Academic Integrity requires that instructors notify the Associate Provost of cases of academic dishonesty. Cases are typically heard by the Academic Integrity Board, which determines whether a violation has occurred, and, if so, its severity and the sanctions. In some circumstances the case may be handled through an Administrative Resolution Procedure. Further, the code makes students responsible for promoting a culture of integrity on campus and acting in instances in which integrity is violated.

Academic honesty, the cornerstone of teaching and learning, lays the foundation for lifelong integrity. Academic dishonesty is intellectual theft. It includes but is not limited to providing or receiving assistance in a manner not authorized by the instructor in the creation of work to be submitted for evaluation. This standard applies to all work ranging from daily homework assignments to major exams. Students must clearly cite any sources consulted--not merely for quoted phrases, but also for ideas and information that are not common knowledge. Neither ignorance nor carelessness is an acceptable defense in cases of plagiarism. It is the student’s responsibility to follow the appropriate format for citations. Students should ask their instructors for assistance in determining what sorts of materials and assistance are appropriate for assignments and for guidance in citing such materials clearly.

Our Commitment to Liberal Arts Education

Denison's mission statement articulates an explicit commitment to liberal arts education. It emphasizes active learning, which defines students as active participants in the leaning process, not passive recipients. Denison seeks to foster self-determination and to demonstrate the transformative power of education. A crucial aspect of this approach is what Denison's mission statement refers to as "a concern for the whole person," which is why the university provides a "living-learning environment" based on individual needs and an overriding concern for community. This community is based on "a firm belief in human dignity and compassion unlimited by cultural, racial, sexual, religious or economic barriers, and directed toward an engagement with the central issues of our time."

In this class, we will discuss inequality directly. In many cases, you will asked to apply quantitative reasoning skills to these subject, which can be difficult because there is always the potential for the available data to complicate or contradict something you may feel very passionate about. In these cases, you should aspire to adopt an attitude of critical skepticism, i.e. wary of claims that are not supported by evidence but potentially willing to be persuaded by evidence if you find it compelling, and willing to give that evidence a fair hearing.

How we treat one another will be a cornerstone of these conversations. Denison's "Guiding Principles" speak of "a community in which individuals respect one another and their environment." Further, "each member of the community possesses a full range of rights and responsibilities. Foremost among these is a commitment to treat each other and the environment with mutual respect, tolerance, and civility." It's easy to treat someone this way when you like them and agree with their ideas, but the real challenge is treating those who differ from us with the same compassion and respect. However, I consider disruptive, deceitful, or hateful behavior to be breaches of these responsibilities. Bullying, trolling, hate speech, and harassment of any kind will not be tolerated.

Teaching Assistants

Our TA for this section of DA 101 will be posted shortly. The TA will be available by appointment to help answer questions about the course or particular assignments. TA hours are also TBA. The TA's office hours will be posted on Canvas. Office hours for all other DA 101 TAs will follow as well. While each TA is scheduled to work with a particular faculty instructor and attend their lab sections, our program encourages students to visit any TA during scheduled office hours, especially if their availability better fits your schedule. In general, you should meet with a TA before coming to my office hours with a code-related or assignment-related question, as the TA should be able to answer most of these. 

Discrimination, Sexual Misconduct, and Sexual Assault

Essays, journals, and other coursework submitted for this class are generally considered confidential pursuant to the University’s student record policies. However, students should be aware that University employees are required by University policy to report allegations of discrimination based on sex, gender, gender identity, gender expression, sexual orientation or pregnancy to the Title IX Coordinator or a Deputy Title IX Coordinator. This includes reporting all incidents of sexual misconduct, sexual assault and suspected abuse/neglect of a minor. Further, employees are to report these incidents that occur on campus and/or that involve students at Denison University whenever the employee becomes aware of a possible incident in the course of their employment, including via coursework or advising conversations. There are others on campus to whom you may speak in confidence, including clergy and medical staff and counselors at the Wellness Center. More information on Title IX and the University’s Policy prohibiting sex discrimination, including sexual harassment, sexual misconduct, stalking and retaliation, including support resources, how to report, and prevention and education efforts, can be found at: https://denison.edu/campus/title-ix.


Assignments

Oral Presentation (5% of grade)

Each of you will be responsible for giving a PechaKucha presentation this semester. A PechaKucha is type of lightning talk where a presenter shows 20 slides for 20 seconds of commentary each (6 minutes and 40 seconds total). For your topic and date, I will provide a sign-up sheet with a list of topics. Each topic will be designed to connect to the material we will be discussing that week in class. Before the day you are giving your presentation, you will be asked to sshare a copy of your slides. Google Slides, PowerPoint, or PDF formats are acceptable. Include your notes for each slide, either in the slide notes section or in a separate document. These presentations are meant to be a low pressure assignment, so I will grade on a check, check-plus, check-minus scale. I reserve the right to give harsher grades to students who simply don't do the required work or seem to be blowing off the assignment.

Quizzes (10% of grade)

This course has intermittent quizzes on material from readings and lectures. Quizzes are designed to measure how well you are integrating the material. They will generally consist of 10-20 multiple choice, fill-in-the-blank, and short answer style questions. Quizzes will be taken in class without access to books or notes. Between the "Weekly Rhythm" table below (which states that all quizzes will be held on Wednesdays) and the Weekly Calendar (which notes the weeks with quizzes) you should never be surprised by a quiz date.

Data and Code Ledger (10% of grade)

The purpose of the Data and Code Ledger is to take notes and examples in your own words to consolidate the gains you have made in learning how to code in R for data analysis. The ledger should review and demonstrate mastery of the range of commands, visualization techniques, and analytical methods you have learned this semester. The Data and Code Ledger is a living document that each of you will assemble over the course of the semester. You will turn it in on two occasions in lieu of a lab report for that week. The first submission will involve completing about 50% of the assigned material. The second submission will involve correcting issues raised during the first submission and completing the rest of the assignment. Grading will weighted so that the second, cumulative submission counts more than the first. I will distribute the assignment template via Github Classroom.

Lab Work and Lab Reports (40% of grade)

This portion of your grade consists of participating in the lab and completing all lab assignments. Assignment descriptions, datasets, and starter code will be made available through Github and Github Classroom. Labs will be a mix of individual and team-based assignments. In general, lab assignments will be due one week from the day they are assigned, by the start of our next lab. The lab schedule has more details about our weekly topics and which weeks have lab reports.

Final Project (20% of course grade)

The purpose of the final project is to combine the core skills you have gained in the class in an application of the data cycle producing a short polished report on a question of your choosing, ideally something that you’re passionate about or is relevant to your life or interests (some suggestions are offered toward the end). This final project will incorporate all of the steps involved in the data cycle as we have studied it. You will be in charge of stating an interesting question that doesn’t duplicate previous projects/labs, conducting exploratory analyses using skills from the entire semester, building a model that you will interpret, then communicating your key findings in a polished, professional narrative. There are two assignments related to the final project: a Final Project Scaffolding Assignment, and a Final Project Rough Draft due Tuesday, December 6. The final projct is due at the start of our scheduled exam block (2:00 p.m. Thursday, December 15, 2022). Note: Denison policy does not permit me to give extensions on this assignment, so any late submission will receive a 0 grade.


Lab Schedule

Lab 1: Tuesday, August 30, 2022

Introduction to R, RStudio, Executing .r files, The File System, and File Paths

Due Next Week: Individual Submission

Lab 2: Tuesday, September 6, 2022

(CANCELED)

Due Next Week: None

Lab 3: Tuesday, September 13, 2022

Invasive Species Part 1

Due Next Week: Individual Submission

Lab 4: Tuesday, September 20, 2022

Book Reviews as Data (+Activity on Collecting Data)

Due Next Week: Turn in Data and Code Ledgers (Individual Submission)

Lab 5: Tuesday, September 27, 2022

Invasive Species Part 2

Due Next Week: Team Lab Report

Lab 6: Tuesday, October 4, 2022

Substance Use

Due Next Week: Lab Report

Lab 7: Tuesday, October 11, 2022

Audio Books

Due Next Week: Nothing due. Turn in lab assignment October 25.

Lab 8: Tuesday, October 18, 2022

No Lab This Week. Enjoy Your Day Off!

Due Next Week: Team Lab Report

Lab 9: Tuesday, October 25, 2022

Political Polarization Part 1

Due Next Week: Work on polarization lab

Lab 10: Tuesday, November 1, 2022

Political Polarization Part 2

Due Next Week: Team Lab Report

Lab 11: Tuesday, November 8, 2022

Examples of Reproducible Code

Due Next Week: Turn in Data and Code Ledgers

Lab 12: Tuesday, November 15, 2022

Authorship Attribution

Due Next Week: Team Lab Report

Lab 13: Tuesday, November 29, 2022

Final Projects Sprint

Due Next Week: Rough drafts of final projects

Lab 14: Tuesday, December 6, 2022

General Social Survey Data

Due Next Week: Complete IRB training

Lab 15: Tuesday, December 13, 2022

Refactoring Workshop


Weekly Calendar

Weekly Rhythm

Monday Tuesday Wednesday Friday
Student presentations + lecture, coding practice, etc. Lab day. Turn in previous lab assignment by start of class. Quiz day + lecture, coding practice, etc. Complete weekly reference reading by start of class Discussion day. Complete weekly discussion reading and write notes on discussion topics by start of class

Week 1: Introducing Data Analytics

Monday, August 29, 2022

In Class: Student Introductions

Homework: Sign up for Github, Complete Course Survey

Tuesday, August 30, 2022

First lab day: See lab schedule above

Wednesday, August 31, 2022

In Class: Discuss survey results; discuss "the promise of data"

Homework: Read "The Dream of Prediction" from Everything is Obvious* (once you know the answer) (pdf on Canvas)

Friday, September 2, 2022

In Class: Discuss "The Dream of Prediction"

Homework: Complete the first lab by Tuesday

*Note: For all subsequent weeks, this calendar does not reflect daily due dates. Use the "Weekly Rhythm" table align the week's material with day-to-day expectations.

Week 2: Data, Metadata and Quantification
(Monday, September 05, 2022 - Friday, September 09, 2022)

This Week's Reference Reading: Read R for Data Science "1. Introduction" (to the book); "2. Introduction" (to the "Explore" section); and "4. Workflow: Basics" (https://r4ds.had.co.nz/)

This Week's Discussion Reading: Cathy O'Neil, "Arms Race" (from Weapons of Math Destruction)

Week 3: Working with Data
(Monday, September 12, 2022 - Friday, September 16, 2022)

This Week's Reference Reading: Read R for Data Science "5. Data Transformation" (https://r4ds.had.co.nz/)

This Week's Discussion Reading: Richard Jean So, "All Models Are Wrong"

Notes and Reminders: Written response on O'Neil due Monday (counts as quiz grade)

Week 4: Data Visualization
(Monday, September 19, 2022 - Friday, September 23, 2022)

This Week's Reference Reading: Read R for Data Science "3. Data Visualization" (https://r4ds.had.co.nz/)

This Week's Discussion Reading: Alberto Cairo, "The Five Qualities of Great Visualizations" (from The Truthful Art)

Week 5: Descriptive Statistics 1
(Monday, September 26, 2022 - Friday, September 30, 2022)

This Week's Reference Reading: Read R for Data Science "7. Exploratory Data Analysis" (https://r4ds.had.co.nz/)

This Week's Discussion Reading: Dubner and Leavitt, "Introduction: the Hidden Side of Everything" (from Freakonomics)

Notes and Reminders: Quiz on Wednesday

Week 6: Descriptive Statistics 2
(Monday, October 03, 2022 - Friday, October 07, 2022)

This Week's Reference Reading: No reference reading this week

This Week's Discussion Reading: No discussion reading this week

Week 7: Statistical Significance
(Monday, October 10, 2022 - Friday, October 14, 2022)

This Week's Reference Reading: No reference reading this week

This Week's Discussion Reading: Kahneman, "The Illusion of Understanding," "The Illusion of Validity," and "Intuitions vs. Formulas" (from Thinking Fast and Slow)

Notes and Reminders: Quiz on Wednesday

Week 8: Bayesian Methods
(Monday, October 17, 2022 - Friday, October 21, 2022)

This Week's Reference Reading: No reference reading this week

This Week's Discussion Reading: Nate Silver, "Less and Less Wrong" (from The Signal and the Noise)

Notes and Reminders: No class or lab on Monday and Tuesday

Week 9: Predictive Methods 1
(Monday, October 24, 2022 - Friday, October 28, 2022)

This Week's Reference Reading: No reference reading this week

This Week's Discussion Reading: No discussion reading this week

Notes and Reminders: Quiz on Wednesday

Week 10: Predictive Methods 2
(Monday, October 31, 2022 - Friday, November 04, 2022)

This Week's Reference Reading: No reference reading this week

This Week's Discussion Reading: Gebru et. al., "Datasheets for Datasets"

Notes and Reminders: Final Project Scaffolding Assignment in Class Wednesday

Week 11: Case studies 1: computational biology
(Monday, November 07, 2022 - Friday, November 11, 2022)

This Week's Reference Reading: Read William Stafford Noble "A Quick Guide to Organizing Computational Biology Projects" (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424)

This Week's Discussion Reading: D'Ignazio and Klein, "What Gets Counted Counts" (from Data Feminism)

Week 12: Case studies 2: digital humanities
(Monday, November 14, 2022 - Friday, November 18, 2022)

This Week's Reference Reading: Read José Nilo G. Binongo "Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution" (pdf on Canvas)

This Week's Discussion Reading: Andrew Piper, "There Will Be Numbers"

Notes and Reminders: Quiz on Wednesday

Week 13: Thanksgiving Break
(Monday, November 21, 2022 - Friday, November 25, 2022)

Notes and Reminders: No class

Week 14: Case studies 3: psychology
(Monday, November 28, 2022 - Friday, December 02, 2022)

This Week's Reference Reading: Harris et. al., "Two Failures to Replicate High-Performance-Goal Priming Effects" (pdf on Canvas)

This Week's Discussion Reading: No discussion reading this week

Week 15: Case studies 4: political science
(Monday, December 05, 2022 - Friday, December 09, 2022)

This Week's Reference Reading: Excerpt from "Bowling Alone"

This Week's Discussion Reading: Stephens-Davidowitz,"Big Data, Big Schmata? What it Cannot Do" ( from Everybody Lies)

Week 16: Farewells
(Monday, December 12, 2022 - Friday, December 16, 2022)

This Week's Reference Reading: No reference reading this week

This Week's Discussion Reading: No discussion reading this week

Notes and Reminders: Wrap-up (Tuesday's lab is our last class)