Your Professor:
Matt Lavin
My Email:
lavinm@denison.edu
My Office:
Burton D. Morgan Center 411
Office Hours
MTWF 9:30-11 a.m.; MW 3:20-4:40 p.m.
Our Classroom:
Class in William H. Doane Library A07; Lab Component Remote on Zoom
When We Meet:
MWF, 8:00-8:50 a.m.
When the Lab Meets:
Tuesdays, 1:50-4:40 p.m.
Many of the most pressing problems in the world can be addressed with data. We are awash in data, and modern citizenship demands that we become literate in how to interpret data, what assumptions and processes are necessary to analyze data, as well as how we might participate in generating our own analyses and presentations of data. Consequently, data analytics is an emerging field with skills applicable to a wide variety of disciplines. This course introduces analysis, computation, and presentation concerns through the investigation of data driven puzzles in wide array of fields – political, economic, historical, social, biological, and others. No previous experience is required.
By the end of the course, you should be able to:
Note that I am not planning to do "in-person" office hours this semester. I am using Google Calendar for virtual office hours, by appointment. If you go to my appointment page, you will see a real-time account of when I am available. My standard appointment slots are MTWF 9:30-11 a.m.; MW 3:20-4:40 p.m.. Note that these appointment slots will disappear once I've been booked. If I ever need to cancel office hours on a given day (say, for example, if I'm ill), I will update the calendar and email anyone with an appointment. If I find that I have more requests for appointments than I have availability, I may convert some of these slots to virtual hangouts, where anyone would be welcome to drop in, but for the moment I will hold these as opportunities for one-on-one discussion.
If my office hours by appointment do not work for your schedule, you can also email me to request an appointment at another time. When sending me such an email (or really any email), please follow some basic conventions of formality and politeness. There's no need to construct the equivalent of a business letter, but please don't begin your message with "hey," and please take an extra moment to make sure you spelled my name correctly. I promise to show you the same courtesy. I will do my best to reply within 48 hours, barring any emergency circumstances.
Here you will find information on required readings, import university policies, and course-specific policies like attendance and cell phone use.
R for Data Science (Wickham and Grolemund), ISBN-13: 978-1491910399 Free online (http://r4ds.had.co.nz) or order from print edition by matching the ISBN |
Additional selected readings will be made available as html or pdf, and linked to the course website |
All projects in this course will be scripted and analyzed using R, an open source data analysis language and environment. No previous experience with R, statistical software packages, or computer programming is required. Specifically, we will be using RStudio as our programming environment. Instructions for installing R and R Studio will be posted to the calendar below. |
Since there are multiple sections of DA 101 every semester, the various instructors work hard to make sure there is approximate parity in terms of content, workload, and expectations. However, it is also true that each professor has their strengths, areas of interest, and priorities, and I'm sure I'm no exception. We'll have to spend some time together for you to get a real sense of what I value, how I grade, etc., but I look forward to that process, and to getting to know you all better more generally. One of the big advantages of a school like Denison is that, if you want to work with me again, you'll probably be able to, whether in another data analytics course, a summer research fellowship, or some other capacity.
As a general rule, the expectations in this course are high, and I'm confident you can all do great work. The feedback I provide on assignments is designed to help you get there. My goal is to provide specific, relevant, and honest feedback when I grade your work. This will include constructive criticism, strategies for improvement, and guidance on how students can achieve success. I will not do "compliment sandwiches" just to begin and end on a positive remark, but this means that, when I praise your work, it's an honest (and I think more meaningful) act of praise.
Regarding the major assignment rubric, it is adapted from the standards that the data analytics program uses for all its majors. I don't expect your work to meet the same standards as a graduating senior, but I think using the same categories on our rubric will give you a better idea of what it might mean to major in data analytics, as many of you are hoping to do. If you don't want to be a data analytics major, these criteria are still highly relevant to almost any program of study.
Item | Description |
---|---|
Assignment Process: | All materials are turned in on time and in the right place. Assignment directions are followed. Required components are all present and submitted on time. |
Attention to Detail: | The project is well organized, flows logically, and follows the all formatting guidelines, including attention to proofreading, proper citations, and language that is appropriate to a well-informed, non-technical reader. |
Research Question and Research Design: | The project has a focused and well defined research question that can be addressed with computational, data-driven analysis. The focal data set and method(s) are appropriate for the research question. |
Data, Visuals, and Code: | The data are fully described, properly sourced, and presented in appropriate ways. Visuals (tables, charts, graphs) are used effectively to describe multiple aspects of the research project (data, methods, or results). The paper provides sufficient details and/or points to supplementary materials that make the research reproducible by a technical reader (i.e, detailed footnotes, appendices, GitHub, code, etc.) |
Data Analysis Methods: | The method(s) used to test the research question is justified, validated, and applied appropriately; the student appropriately describes the strengths and weaknesses of the methods used; outside sources are used to justify how the methods are used and interpreted. |
Reporting and Interpretation of Results: | The results are interpreted correctly and clearly address the research question; the project discusses its limitations, the extent to which it can be generalized, and expansion to further research. |
Ethical Considerations: | The writing thoughtfully engages any ethical considerations of using the data, methods, and implications of communicating the findings. |
Letter Grade | Percentile | Description |
---|---|---|
A+ | 97-100 | Superior achievement in all aspects |
A | 94-96 | Superior achievement in most areas |
A- | 90-93 | Superior achievement in at least one area |
B+ | 87-89 | Exceeds expectations in all aspects |
B | 84-86 | Exceeds expectations in most areas |
B- | 80-83 | Exceeds expectations in at least one area |
C+ | 77-79 | Meets expectations in all aspects |
C | 74-76 | Meets expectations in most areas |
C- | 70-73 | Meets expectations in at least one area |
D | 65-69 | Does the assigned task but does not meet expectations or work is not appropriate for college level |
F | 0-64 | Unexcused late work, does not do the assigned task, not complete, or quality is significantly below expectations |
Item | Percentage | Comments |
---|---|---|
Oral Presentation | 5 | Individual assignment |
Quizzes | 10 | Individual assignments |
Data and Code Ledger | 10 | Team-based assignments |
Lab Work and Lab Reports | 50 | Team-based assignments |
Final Project | 25 | Individual assignment. Four separate components (see assignment description) |
The goal of the quantitative reasoning requirement is to develop the skills of all students in the descriptive, analytical, and predictive aspects of quantitative reasoning. A course fulfilling this requirement must utilize numerical quantities and employ, as an integral and sustained part of the course, at least one of the following forms of quantitative reasoning.
Retroactive and last-minute extensions will not be granted. At the same time, life happens. Sometimes something just isn’t going to get done. If you speak to me at least a week ahead of time and I approve an extension, I will consider assigning a new due date and hold you to it. The trade off is that work turned in this way is probably not going end up in my hand when I grade everything else, so it’s going to get less feedback. If you miss a deadline entirely without getting an extension, you will automatically receive a 0 for your grade.
Cell phones should be off and put away. Laptops are okay for notes and such, but you should not be messaging, using Facebook, etc. I’ll check screens regularly give you a verbal warning on your first offense. After that, I reserve the right to ask you to leave class and mark you absent if you are creating a distraction.
Coming to class prepared means that you have the day's reading in hand (printed or digital) and have come to class with a way to take notes (printed or digital). If you are not prepared for class, I reserve the right to grade as if you were absent for that day. Anything due on a given day is due at the start of class. Any digital submission of material is due by the time class starts on the day the hard copy is due. These policies apply for in-person and remote particiation.
In these unprecedented times, our section of DA 101 will have student participants in various locations and time zones. We can also expect that one or more students may need to miss class because of illness or quarantine protocol. As a result, there will be a no-permission-needed policy of allowing students to participate remotely or asynchronously, as long as they complete all the work for each day of class. I only ask that you keep me informed and meet with me and/or our TA, as needed, in order to keep apace with the course work. Note, however, that Denison's university-wide attendance policy still applies. This means, among other things, that if a class is missed, for any reason, the student is responsible for determining what occurred in the missed class. Additionally, absence from a class will not be accepted as an excuse for not knowing class material.
If you need to participate in a particular class remotely but synchronously, you can do so by joining the remote feed for our course, which will be password protected but online for every class period. (Our lab will be completely remote and conducted in the same way.) If you need to participate in a particular class remotely and asynchronously, you will be able to access a video recording of the day's Zoom broadcast via Notebowl. You should also look at the daily calendar and complete any readings, quizzes, homework, lab reports, etc. If you are missing a team-based assignment, you should coordinate your participation with your teammates. To get access to any lectures, or to make up a peer review, you should email me about whether to make a virtual appointment with me or our TA.
If all classes, at some point in the term, are forced to switch entirely to remote learning, I will provide detailed instructions on how to complete all the remaining assignments.
If you are a student who feels you may need an accommodation based on the impact of a disability, you should contact me privately as soon as possible to discuss your specific needs. I rely on the Academic Resource Center in 020 Higley Hall to verify the need for reasonable accommodations based on documentation on file in that office.
Proposed and developed by Denison students, passed unanimously by DCGA and Denison’s faculty, the Code of Academic Integrity requires that instructors notify the Associate Provost of cases of academic dishonesty. Cases are typically heard by the Academic Integrity Board, which determines whether a violation has occurred, and, if so, its severity and the sanctions. In some circumstances the case may be handled through an Administrative Resolution Procedure. Further, the code makes students responsible for promoting a culture of integrity on campus and acting in instances in which integrity is violated.
Academic honesty, the cornerstone of teaching and learning, lays the foundation for lifelong integrity. Academic dishonesty is intellectual theft. It includes, but is not limited to, providing or receiving assistance in a manner not authorized by the instructor in the creation of work to be submitted for evaluation. This standard applies to all work ranging from daily homework assignments to major exams. Students must clearly cite any sources consulted--not merely for quoted phrases, but also for ideas and information that are not common knowledge. Neither ignorance nor carelessness is an acceptable defense incases of plagiarism. It is the student’s responsibility to follow the appropriate format for citations. Students should ask their instructors for assistance in determining what sorts of materials and assistance are appropriate for assignments and for guidance in citing such materials clearly.
Denison's mission statement articulates an explicit commitment to liberal arts education. It emphasizes active learning, which defines students as active participants in the leaning process, not passive recipients. Denison seeks to foster self-determination and to demonstrate the transformative power of education. A crucial aspect of this approach is what Denison's mission statement refers to as "a concern for the whole person," which is why the university provides a "living-learning environment" based on individual needs and an overriding concern for community. This community is based on "a firm belief in human dignity and compassion unlimited by cultural, racial, sexual, religious or economic barriers, and directed toward an engagement with the central issues of our time."
In this class, we will discuss inequality directly. In many cases, you will asked to apply quantitative reasoning skills to these subject, which can be difficult because there is always the potential for the available data to complicate or contradict something you may feel very passionate about. In these cases, you should aspire to adopt an attitude of critical skepticism, i.e. wary of claims that are not supported by evidence but potentially willing to be persuaded by evidence if you find it compelling, and willing to give that evidence a fair hearing.
How we treat one another will be a cornerstone of these conversations. Denison's "Guiding Principles" speak of "a community in which individuals respect one another and their environment." Further, "each member of the community possesses a full range of rights and responsibilities. Foremost among these is a commitment to treat each other and the environment with mutual respect, tolerance, and civility." It's easy to treat someone this way when you like them and agree with their ideas, but the real challenge is treating those who differ from us with the same compassion and respect. However, I consider disruptive, deceitful, or hateful behavior to be breaches of these responsibilities. Bullying, trolling, hate speech, and harassment of any kind will not be tolerated.
Our TA for this section of DA 101 will be Lam Tran (tran_l7[at]denison.edu). Lam is a double major in data analytics and economics, with a minor in computer science, and she will graduate in Spring 2021. She has worked extensively in Python and R, and she experience with data visualization, data analysis, and database design. She will attend our weekly lab (remotely), and she will be available by appointment to help answer questions about the course or particular assignments. Her TA hours are Wednesdays, 9-10 p.m. and Thursdays, 6-7 p.m.. You can connect on Zoom on either day, and in person on Thursdays (in Burton D. Morgan 405). Lam's office hours are also posted on NoteBowl with a Zoom link and a Password. Office hours for all other DA 101 TAs will follow. While each TA is scheduled to work with a particular faculty instructor and attend their lab sections, our program encourages students to visit any TA during scheduled office hours, especially if their availability better fits your schedule.
Each of you will be responsible for giving a PechaKucha presentation this semester. A PechaKucha is type of lightning talk where a presenter shows 20 slides for 20 seconds of commentary each (6 minutes and 40 seconds total). Read more about the PechaKucha presentation.
This course has intermittent quizzes on material from readings and lectures. Quizzes are designed to measure how well you are integrating the material. They will generally consist of 10-20 multiple choice, fill-in-the-blank, and short answer style questions. There are four quizzes for this course, one for each quarter. Normally, I would do these in class, but in the interests of reducing hand-offs of pieces of paper, all quizzes will be online and open book, to be completed outside of class. There are also four take-home assignments, one for each quarter, and each of these counts as a quiz grade. Quizzes and take-home assignments will be conducted through Notebowl or Github Classroom.
The Data and Code Ledger is a living document that you and your lab teammates will assemble over the course of the semester. You will turn it in periodically in lieu of a lab report for that week. Read more about the Data and Code Ledger.
This portion of your grade consists of participating in the lab and completing all lab assignments. Assignment descriptions, datasets, and starter code will be made available through Github and Github Classroom. Links to individual lab assignments can be found on the Lab Schedule below.
The final project is divided into four components, each due during a different quarter of the semester. Each assignment is worth 25% of your assignment grade. Read more about the components of the Final Project Assignment.
Note: The lab component of this course will be conducted remotely using Zoom. Most labs require online submission of a team-based lab report, which is always due by 5 p.m. on the following Friday. Where noted, individual submissions are required but also due on Fridays. I'll add links to each lab as we go. The Zoom link and password are in NoteBowl. Click for general information about labs.
Lab 1: Tuesday, August 18, 2020
Invasive Species Part 1
Due Thursday: Individual Submission
Lab 2: Tuesday, August 25, 2020
Collecting and Coding Data
Due Thursday: Individual Submission
Lab 3: Tuesday, September 1, 2020
Invasive Species Part 2
Due Thursday: Team Lab Report
Lab 4: Tuesday, September 8, 2020
Book Reviews
Due Thursday: Turn in Data and Code Ledgers (Individual Submission)
Lab 5: Tuesday, September 15, 2020
Political Polarization Part 1
Due Friday: No homework
Lab 6: Tuesday, September 22, 2020
Political Polarization Part 2
Due Friday: Team Lab Report
Lab 7: Tuesday, September 29, 2020
Substance Use
Due Friday: Team Lab Report
Lab 8: Tuesday, October 6, 2020
Simpson's Paradox
Due Next Week: Turn in Data and Code Ledgers (Individual Submission)
Lab 9: Tuesday, October 13, 2020
Airbnb
Due Next Week: Team Lab Report
Lab 10: Tuesday, October 20, 2020
Meta-Analysis
Due Next Week: No homework
Lab 11: Tuesday, October 27, 2020
Reproducible Code
Due Next Week: Turn in Data and Code Ledgers (Individual Submission)
Lab 12: Tuesday, November 3, 2020
Authorship Attribution
Due Next Week: Team Lab Report
Lab 13: Tuesday, November 10, 2020
P-Hacking
Due Next Week: Complete IRB Training
Lab 14: Tuesday, November 17, 2020
Gender
Completed During Class Time: Course Evaluations
Quarter 1: Introduction to Data Analytics
Week 1: Data Literacy
In Class: Introductions
Homework: Sign up for Github, make Google Drive Folder, Complete Course Survey
In Class: Discuss survey results
Homework: Watch Three TED Talks ... Stacy Smith, The Data Behind Hollywood's Sexism (15:36); JP Rangaswami, Information is Food (7:48); John Wilbanks, Let's Pool Our Medical Data (16:11)
Discussion: The Promise of Data
Homework: Read R for Data Science "1. Introduction" (to the book) and "2. Introduction" (to the "Explore" section)
Week 2: Data, Metadata and Quantification
Slides/Discussion: Data and Data Analysis
Homework: Read Definitions of Data, from Data Science: An Introduction; Fill in some details on your Github profiles, including profile photos. (Note: You are not required to share a photo of yourself if you don't want to. The idea here is just to replace the default profile photo with something more personalized.)
Slides/Discussion: Data vs. Metadata; The Data Lifecycle
Homework: Read "Chapter 4: Field of Ignorance" from Moneyball (pdf)
Discussion: The Problems of Quantification
Homework: Read R for Data Science "5. Data transformation"
Week 3: Working with Data
Slides/Discussion: Transforming Data
Homework: Sign up for oral presentations (Google drive doc, link in NoteBowl); complete Quiz 1: Data Literacy, Metadata, and the Data Lifecycle
Slides/Discussion: Scaling, resampling, bootstrapping
Homework: Read Richard Jean So, "All Models Are Wrong" (pdf)
Discussion: All Models Are Wrong. Some Models Are Useful.
Homework: Read R for Data Science "3. Data visualization"
Week 4: Data Visualization
Slides/Discussion: Reviewing data visualization in R (ggplot2), common visualization types
Homework: Watch Hans Rosling, TED Talk, "The Best Stats You've Ever Seen"; Watch David McCandless, "The Beauty of Data Visualization"
PechaKucha: The Five Hat Racks Principle
Slides/Discussion: Storytelling with Data
Homework: Read Cairo, The Truthful Art, 41-65 (pdf)
PechaKucha: Accessible Data Visualizations
Discussion: The Five Qualities of Great Visualizations
Homework: Complete the Topic Exploration Assignment. Submit via Google Drive folder and Bring a Laptop to Next Class
Quarter 2: Data and Communication
Week 5: Descriptive Analytics 1
PechaKucha: Mean, Median, and Mode
In Class: Peer Feedback on Topic Exploration Assignment
Homework: Read R for Data Science "7. Exploratory Data Analysis"
PechaKucha: Variance, Standard Deviation & Interquartile Range
Slides/Discussion: Using data to generate questions and pursue insight
Homework: Read "Introduction: the Hidden Side of Everything", from Freakonomics (pdf)
PechaKucha: Z-scores and Z-differences
Discussion: Norms, trends, individuals, and outliers
Homework: Take-home assignment (counts as quiz grade)
Week 6: Descriptive Analytics 2
PechaKucha: Bias in Sampling
Slides/Discussion: Distributions, sampling, central limit theorem
Homework: Complete Quiz 2: Working with Data, Data Visualization, Sampling, and Populations
Slides/Discussion: Descriptive Statistics Continued
Homework: Cathy O'Neil, "Arms Race: Going to College" from Weapons of Math Destruction (pdf)
PechaKucha: Information vs. Entropy
Discussion: Information, Overfitting, and Overconfidence
Homework: Read excerpt from Erich L. Lehmann, Fisher, Neyman, and the Creation of Classical Statistics (Springer, 2011) (pdf)
Week 7: Hypothesis Testing
PechaKucha: ANOVA Analysis
Slides/Discussion: T-Tests, P-Values
Homework: Complete the Project Plan Assignment. Submit via Google Drive folder and Bring a Laptop to Next Class
PechaKucha: Familywise Comparison Error
In Class: Peer Review of Project Plan Assignment
Slides/Discussion: Confidence Intervals, Effect Size
Homework: Read R for Data Science "22. Introduction (to the "Model" section) and 23. Model Basics"
PechaKucha: Chi-Squared Test
In Class: Mid-term Evaluation and Progress Report
Discussion: Frequentism and its Assumptions
Homework: No-work Weekend
Quarter 3: Data Analysis
Week 8: Predictive Analytics 1
In Class: Mid-term Evaluation and Progress Report
Slides/Discussion: Goodness of Fit
Homework: Complete Quiz 3: Hypothesis Testing, Fitting Models, P-Values, Effect Size, Confidence Intervals
PechaKucha: Pearson vs. Spearman Correlation
Slides/Discussion: Logistic and Linear Regression
Homework: Read "Scientific Realism," Stanford Encyclopedia of Philosophy, Parts 1.1, 1.2, 1.3, and 4.1
PechaKucha: Cohen's D Test
Discussion: Instrumentalism and its assumptions
Homework: Read Trevor Martin, Dissecting Trump's Most Rabid Online Following, FiveThirthyEight.com, 23 March 2017.
Week 9: Predictive Analytics 2
PechaKucha: Cosine Similarity
Slides/Discussion: Unsupervised learning
Homework: Take-home assignment (counts as quiz grade)
Slides/Discussion: Unsupervised learning continued
Homework: Read "Interpretations of Probability," Stanford Encyclopedia of Philosophy
PechaKucha: K-Means Clustering
Discussion: What is Probability?
Homework: Read Nate Silver, Less and Less Wrong," The Signal and the Noise (Penguin, 2012) 221-247 (pdf)
Week 10: Bayesian Analytics
PechaKucha: Principal Component Analysis
Slides/Discussion: Bayes Theorum and Bayesian Data Analysis
Homework: Watch Alex Edmans, TEDx Talk, What to Trust in a Post-Truth World (17:47)
Slides/Discussion: How to Think Like a Bayesian
Homework: Complete Final Project Visualization Component. Submit via Google Drive folder and Bring a Laptop to Next Class. (Final version due Friday at 5 p.m.)
PechaKucha: Hidden Markov Models
Discussion: Peer Review of Final Project Visualization Component
Homework: Read William Stafford Noble "A Quick Guide to Organizing Computational Biology Projects"
Quarter 4: Case Studies and Student Projects
Week 11: Case Studies 1, Computational Biology
PechaKucha: Factor Analysis
Slides/Discussion: Organizing Computational Projects
Homework: Complete Quiz 4: Regression, Predictive Analytics, Classification, and Clustering
Slides/Discussion: Organizing Computational Projects Continued
Homework: Read D'Ignazio and Klein, "The Numbers Don't Speak for Themselves", Data Feminism Cambridge: MIT Press, 2020: Open Access Edition Online. https://data-feminism.mitpress.mit.edu/pub/czq9dfs5/release/2
PechaKucha: Decision Tree Classifiers
Discussion: What is Data Feminism?
Homework: Read José Nilo G. Binongo "Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution" (pdf)
Week 12: Case Studies 2, authorship attribution
PechaKucha: Support Vector Machine Classifiers
Slides/Discussion: Authorship attribution methods
Homework: Take-home assignment (counts as quiz grade)
PechaKucha: Neural Networks / Deep Learning
Slides/Discussion: Author, Date, Gender, and Genre Signals
Homework: Read Andrew Piper, "There Will Be Numbers"
PechaKucha: Random Forest Classifiers
Discussion: Can culture be analyzed quantitatively?
Homework: Read Harris et. al., "Two Failures to Replicate High-Performance-Goal Priming Effects" (pdf)
Week 13: Case Studies 3, Priming Effects
Slides/Discussion: Priming Effects and the Replication Crisis
Homework: Read Longino, "Values and Objectivity," Science as Social Knowledge (pdf)
Discussion: The curse of dimensionality, the file drawer problem, and other blind spots
Homework: Read Helena Mihaljevic et. al., "Reflections on Gender Analyses of Bibliographic Corpora" Frontiers in Big Data 2 (August 28, 2019): 29
In Class: Discussion of Gender Bias
Homework: Complete First Draft of Final Project Written Analysis. Submit via Google Drive folder and Bring a Laptop to Next Class
Week 14: Case Studies 4, Gender Prediction
Slides/Discussion: Peer Review of Final Project Written Analysis
Homework: Take-home assignment (counts as quiz grade)
Discussion: Data Analytics at Denison
Homework: Read Daniel Kahneman, "The Illusion of Understanding," "The Illusion of Validity," and "Intuitions vs. Formulas"Thinking Fast and Slow (pdf)
Discussion: The Utility of Data; The Limits of Quantification
Thanksgiving Break
Week 15: Remote Finals Week
Reading and Study Day
Reading and Study Day
Homework: Complete and Submit Final Version of Written Analysis and Reflection via Google Drive by 11:59 p.m. EST