Your Professor:
Matt Lavin
My Email:
lavinm@denison.edu
My Office:
Burton D. Morgan Center 411
Office Hours
MW 3:30-5:00 p.m.; Th 1:00-3:00 p.m.; and F 1-2 p.m.
Our Classroom:
Burton D. Morgan Center 315
When We Meet:
MWF 11:30 a.m.-12:20 p.m.
When the Lab Meets:
Tuesdays, 1:50-4:40 p.m.
Many of the most pressing problems in the world can be addressed with data. We are awash in data, and modern citizenship demands that we become literate in how to interpret data, what assumptions and processes are necessary to analyze data, as well as how we might participate in generating our own analyses and presentations of data. Consequently, data analytics is an emerging field with skills applicable to a wide variety of disciplines. This course introduces analysis, computation, and presentation concerns through the investigation of data driven puzzles in wide array of fields – political, economic, historical, social, biological, and others. No previous experience is required.
By the end of the course, you should be able to:
Note that I am not planning to do "in-person" office hours this semester. I am using Google Calendar for virtual office hours, by appointment. If you go to my appointment page, you will see a real-time account of when I am available. My standard appointment slots are MW 3:30-5:00 p.m.; Th 1:00-3:00 p.m.; and F 1-2 p.m.. Note that these appointment slots will disappear once I've been booked. If I ever need to cancel office hours on a given day (say, for example, if I'm ill), I will update the calendar and email anyone with an appointment. If I find that I have more requests for appointments than I have availability, I may convert some of these slots to virtual hangouts, where anyone would be welcome to drop in, but for the moment I will hold these as opportunities for one-on-one discussion.
If my office hours by appointment do not work for your schedule, you can also email me to request an appointment at another time. When sending me such an email (or really any email), please follow some basic conventions of formality and politeness. There's no need to construct the equivalent of a business letter, but please don't begin your message with "hey," and please take an extra moment to make sure you spelled my name correctly. I promise to show you the same courtesy. I will do my best to reply within 48 hours, barring any emergency circumstances.
Here you will find information on required readings, import university policies, and course-specific policies like attendance and cell phone use.
R for Data Science (Wickham and Grolemund), ISBN-13: 978-1491910399 Free online (http://r4ds.had.co.nz) or order print edition online by matching the ISBN |
Additional selected readings will be made available as html or pdf, and linked to the course website or shared via Notebowl |
All projects in this course will be scripted and analyzed using R, an open source data analysis language and environment. No previous experience with R, statistical software packages, or computer programming is required. Specifically, we will be using RStudio as our programming environment. Instructions for installing R and R Studio will be posted to the calendar below. |
Since there are multiple sections of DA 101 every semester, the various instructors work hard to make sure there is approximate parity in terms of content, workload, and expectations. However, it is also true that each professor has their strengths, areas of interest, and priorities, and I'm sure I'm no exception. We'll have to spend some time together for you to get a real sense of what I value, how I grade, etc., but I look forward to that process, and to getting to know you all better more generally. One of the big advantages of a school like Denison is that, if you want to work with me again, you'll probably be able to, whether in another data analytics course, a summer research fellowship, or some other capacity.
As a general rule, the expectations in this course are high, and I'm confident you can all do great work. The feedback I provide on assignments is designed to help you get there. My goal is to provide specific, relevant, and honest feedback when I grade your work. This will include constructive criticism, strategies for improvement, and guidance on how students can achieve success. I will not do "compliment sandwiches" just to begin and end on a positive remark, but this means that, when I praise your work, it's an honest (and I think more meaningful) act of praise.
Regarding the major assignment rubric, it is adapted from the standards that the data analytics program uses for all its majors. I don't expect your work to meet the same standards as a graduating senior, but I think using the same categories on our rubric will give you a better idea of what it might mean to major in data analytics, as many of you are hoping to do. If you don't want to be a data analytics major, these criteria are still highly relevant to almost any program of study.
Item | Description |
---|---|
Assignment Process: | All materials are turned in on time and in the right place. Assignment directions are followed. Required components are all present and submitted on time. |
Attention to Detail: | The project is well organized, flows logically, and follows the all formatting guidelines, including attention to proofreading, proper citations, and language that is appropriate to a well-informed, non-technical reader. |
Research Question and Research Design: | The project has a focused and well defined research question that can be addressed with computational, data-driven analysis. The focal data set and method(s) are appropriate for the research question. |
Data, Visuals, and Code: | The data are fully described, properly sourced, and presented in appropriate ways. Visuals (tables, charts, graphs) are used effectively to describe multiple aspects of the research project (data, methods, or results). The paper provides sufficient details and/or points to supplementary materials that make the research reproducible by a technical reader (i.e, detailed footnotes, appendices, GitHub, code, etc.) |
Data Analysis Methods: | The method(s) used to test the research question is justified, validated, and applied appropriately; the student appropriately describes the strengths and weaknesses of the methods used; outside sources are used to justify how the methods are used and interpreted. |
Reporting and Interpretation of Results: | The results are interpreted correctly and clearly address the research question; the project discusses its limitations, the extent to which it can be generalized, and expansion to further research. |
Ethical Considerations: | The writing thoughtfully engages any ethical considerations of using the data, methods, and implications of communicating the findings. |
Item | Percentage | Comments |
---|---|---|
Oral Presentation | 5 | Individual assignment |
Quizzes | 10 | Individual assignments |
Data and Code Ledger | 15 | Team-based assignments |
Lab Work and Lab Reports | 40 | Team-based assignments |
Final Project | 30 | Individual assignment. Four separate components (see assignment description) |
The goal of the quantitative reasoning requirement is to develop the skills of all students in the descriptive, analytical, and predictive aspects of quantitative reasoning. A course fulfilling this requirement must utilize numerical quantities and employ, as an integral and sustained part of the course, at least one of the following forms of quantitative reasoning.
Retroactive and last-minute extensions will not be granted. At the same time, life happens. Sometimes something just isn’t going to get done. If you speak to me at least a week ahead of time and I approve an extension, I will consider assigning a new due date and hold you to it. The trade off is that work turned in this way is probably not going end up in my hand when I grade everything else, so it’s going to get less feedback. If you miss a deadline entirely without getting an extension, you will automatically receive a 0 for your grade.
Cell phones should be off and put away. Laptops are okay for notes and such, but you should not be messaging, using Facebook, etc. I’ll check screens regularly give you a verbal warning on your first offense. After that, I reserve the right to ask you to leave class and mark you absent if you are creating a distraction.
Coming to class prepared means that you have the day's reading in hand (printed or digital) and have come to class with a way to take notes (printed or digital). If you are not prepared for class, I reserve the right to grade as if you were absent for that day. Anything due on a given day is due at the start of class. Any digital submission of material is due by the time class starts on the day the hard copy is due. These policies apply for in-person and remote participation.
This class has a transitional learning policy, which means that we will conduct class over Zoom for the first several weeks (up to four weeks but not more than that) and then convert to an in-person course for all those willing and able to attend in person. Our section of DA 101 will have student participants in various locations and time zones. We can also expect that one or more students may need to miss class because of illness or quarantine protocol. As a result, there will be a no-permission-needed policy of allowing students to participate remotely. This section of DA 101 and its lab, however, will require synchronous participation. That is, all students, regardless of time zone, will be expected to log in live for class and lab. If this expectation will not work with your schedule, there are two additional sections of DA 101 that may be a better option. I also ask that you keep me informed and meet with me and/or our TA, as needed, in order to keep apace with the course work. Note, however, that Denison's university-wide attendance policy still applies. This means, among other things, that if a class is missed, for any reason, the student is responsible for determining what occurred in the missed class. Additionally, absence from a class will not be accepted as an excuse for not knowing class material.
If you need to participate in any particular class remotely but synchronously, you can do so by joining the remote feed for our course, which will be password protected but online for every class period. (Our lab will be completely remote and conducted in the same way.) If you need to participate in a particular class remotely and asynchronously, you will be able to access a video recording of the day's Zoom broadcast via Notebowl. Watching videos rather than attending class should be considered an option of last resort, and active engagement and participation are expected. You should also look at the daily calendar and complete any readings, quizzes, homework, lab reports, etc. If you are missing a team-based assignment, you should coordinate your participation with your teammates. To get access to any lectures, or to make up a peer review, you should email me about whether to make a virtual appointment with me or our TA.
If all classes, at some point in the term, are forced to switch entirely to remote learning, I will provide detailed instructions on how to complete all the remaining assignments.
If you are a student who feels you may need an accommodation based on the impact of a disability, you should contact me privately as soon as possible to discuss your specific needs. I rely on the Academic Resource Center in 020 Higley Hall to verify the need for reasonable accommodations based on documentation on file in that office.
Proposed and developed by Denison students, passed unanimously by DCGA and Denison’s faculty, the Code of Academic Integrity requires that instructors notify the Associate Provost of cases of academic dishonesty. Cases are typically heard by the Academic Integrity Board, which determines whether a violation has occurred, and, if so, its severity and the sanctions. In some circumstances the case may be handled through an Administrative Resolution Procedure. Further, the code makes students responsible for promoting a culture of integrity on campus and acting in instances in which integrity is violated.
Academic honesty, the cornerstone of teaching and learning, lays the foundation for lifelong integrity. Academic dishonesty is intellectual theft. It includes, but is not limited to, providing or receiving assistance in a manner not authorized by the instructor in the creation of work to be submitted for evaluation. This standard applies to all work ranging from daily homework assignments to major exams. Students must clearly cite any sources consulted--not merely for quoted phrases, but also for ideas and information that are not common knowledge. Neither ignorance nor carelessness is an acceptable defense incases of plagiarism. It is the student’s responsibility to follow the appropriate format for citations. Students should ask their instructors for assistance in determining what sorts of materials and assistance are appropriate for assignments and for guidance in citing such materials clearly.
Denison's mission statement articulates an explicit commitment to liberal arts education. It emphasizes active learning, which defines students as active participants in the leaning process, not passive recipients. Denison seeks to foster self-determination and to demonstrate the transformative power of education. A crucial aspect of this approach is what Denison's mission statement refers to as "a concern for the whole person," which is why the university provides a "living-learning environment" based on individual needs and an overriding concern for community. This community is based on "a firm belief in human dignity and compassion unlimited by cultural, racial, sexual, religious or economic barriers, and directed toward an engagement with the central issues of our time."
In this class, we will discuss inequality directly. In many cases, you will asked to apply quantitative reasoning skills to these subject, which can be difficult because there is always the potential for the available data to complicate or contradict something you may feel very passionate about. In these cases, you should aspire to adopt an attitude of critical skepticism, i.e. wary of claims that are not supported by evidence but potentially willing to be persuaded by evidence if you find it compelling, and willing to give that evidence a fair hearing.
How we treat one another will be a cornerstone of these conversations. Denison's "Guiding Principles" speak of "a community in which individuals respect one another and their environment." Further, "each member of the community possesses a full range of rights and responsibilities. Foremost among these is a commitment to treat each other and the environment with mutual respect, tolerance, and civility." It's easy to treat someone this way when you like them and agree with their ideas, but the real challenge is treating those who differ from us with the same compassion and respect. However, I consider disruptive, deceitful, or hateful behavior to be breaches of these responsibilities. Bullying, trolling, hate speech, and harassment of any kind will not be tolerated.
Our TA for this section of DA 101 will be posted shortly. The TA will be available by appointment to help answer questions about the course or particular assignments. His/her TA hours are also TBA. You can connect on Zoom on either day, and in person on Thursdays (in Burton D. Morgan 405). The TA's office hours will also be also posted on NoteBowl with a Zoom link and a Password. Office hours for all other DA 101 TAs will follow as well. While each TA is scheduled to work with a particular faculty instructor and attend their lab sections, our program encourages students to visit any TA during scheduled office hours, especially if their availability better fits your schedule.
Each of you will be responsible for giving a PechaKucha presentation this semester. A PechaKucha is type of lightning talk where a presenter shows 20 slides for 20 seconds of commentary each (6 minutes and 40 seconds total). (More about the PechaKucha Presentation on NoteBowl.)
This course has intermittent quizzes on material from readings and lectures. Quizzes are designed to measure how well you are integrating the material. They will generally consist of 10-20 multiple choice, fill-in-the-blank, and short answer style questions. There are four quizzes for this course, one for each quarter. Normally, I would do these in class, but in the interests of reducing hand-offs of pieces of paper, all quizzes will be online and open book, to be completed outside of class. There are also four take-home assignments, one for each quarter, and each of these counts as a quiz grade. Quizzes and take-home assignments will be conducted through Notebowl or Github Classroom.
The Data and Code Ledger is a living document that you and your lab teammates will assemble over the course of the semester. You will turn it in periodically in lieu of a lab report for that week. (More about the Data and Code Ledger on NoteBowl.)
This portion of your grade consists of participating in the lab and completing all lab assignments. Assignment descriptions, datasets, and starter code will be made available through Github and Github Classroom. Links to individual lab assignments can be found on the Lab Schedule below.
The final project is divided into four components, each due during a different quarter of the semester. Each assignment is worth 25% of your assignment grade. (More about Final Project Assignment on NoteBowl.)
Note: The lab component of this course will be conducted remotely using Zoom. Most labs require online submission of a team-based lab report, which is always due by the start of the following lab (one week later). Where noted, individual submissions are required but also due by the next lab. The Zoom link and password for labs are posted to NoteBowl.
Lab 1: Tuesday, February 2, 2021
Introduction to R, RStudio, Executing .r files, The File System, and File Paths
Due Next Week: Individual Submission
Lab 2: Tuesday, February 9, 2021
Invasive Species Part 1
Due Next Week: Individual Submission
Lab 3: Tuesday, February 16, 2021
Collecting and Representing Data
Due Next Week: Team Submission
Lab 4: Tuesday, February 23, 2021
Book Reviews as Data
Due Next Week: Turn in Data and Code Ledgers (Individual Submission)
Lab 5: Tuesday, March 2, 2021
Invasive Species Part 2
Due Next Week: Team Lab Report
Lab 6: Tuesday, March 9, 2021
Substance Use
Due Next Week: No homework
Lab 7: Tuesday, March 16, 2021
No Lab This Week. Enjoy Your Day Off!
Due Next Week: Turn in Data and Code Ledgers (Individual Submission)
Lab 8: Tuesday, March 23, 2021
Audio Books
Due Next Week: Team Lab Report
Lab 9: Tuesday, March 30, 2021
Political Polarization Part 1
Due Next Week: No Homework
Lab 10: Tuesday, April 6, 2021
Political Polarization Part 2
Due Next Week: Team Lab Report
Lab 11: Tuesday, April 13, 2021
Examples of Reproducible Code
Due Next Week: Turn in Data and Code Ledgers
Lab 12: Tuesday, April 20, 2021
Authorship Attribution
Due Next Week: Team Lab Report
Lab 13: Tuesday, April 27, 2021
Oral Presentations for Final Projects
Due Next Week: Complete IRB Training
Lab 14: Tuesday, May 4, 2021
Oral Presentations for Final Projects
Quarter 1: Introduction to Data Analytics
Week 1: Data Literacy
In Class: Introductions
Homework: Sign up for Github, make Google Drive Folder, Complete Course Survey
In Class: Discuss survey results
Homework: Watch Three TED Talks ... Stacy Smith, The Data Behind Hollywood's Sexism (15:36); JP Rangaswami, Information is Food (7:48); John Wilbanks, Let's Pool Our Medical Data (16:11)
In Class: The Promise of Data
Homework: Read R for Data Science "1. Introduction" (to the book) and "2. Introduction" (to the "Explore" section)
Week 2: Data, Metadata and Quantification
In Class: Lecture on Data and Data Analysis
Homework: Take-home assignment (counts as quiz grade)
In Class: Data vs. Metadata; The Data Lifecycle
Homework: Read "Chapter 4: Field of Ignorance" from Moneyball (pdf on Notebowl)
In Class: The Problems of Quantification
Homework: complete Quiz 1: Data Literacy, Metadata, and the Data Lifecycle (on Notebowl)
Week 3: Working with Data
In Class: Review of R, RStudio, and concepts so far
Homework: Read R for Data Science "5. Data transformation"
In Class: Transforming Data
Homework: Read Richard Jean So, "All Models Are Wrong" (pdf on Notebowl)
In Class: All Models Are Wrong. Some Models Are Useful.
Homework: Read R for Data Science "3. Data visualization"
Week 4: Data Visualization
In Class: Review of data visualization in R (ggplot2), common visualization types
Homework: Watch Hans Rosling, TED Talk, "The Best Stats You've Ever Seen"; Watch David McCandless, "The Beauty of Data Visualization"
In Class: Storytelling with Data
Homework: Read Cairo, The Truthful Art, 41-65 (pdf on Notebowl)
In Class: The Five Qualities of Great Visualizations
Homework: Complete the Topic Exploration Assignment. Submit on Notebowl and Bring a Laptop to Next Class
Quarter 2: Data and Communication
Week 5: Descriptive Statistics 1
In Class: Peer Feedback on Topic Exploration Assignment (final version due Friday at 11:59 p.m.)
Homework: Read R for Data Science "7. Exploratory Data Analysis"
In Class: Using data to generate questions and pursue insight
Homework: Read "Introduction: the Hidden Side of Everything", from Freakonomics (pdf on Notebowl)
In Class: Norms, trends, individuals, and outliers
Homework: Complete Quiz 2: Working with Data, Data Visualization, Sampling, and Populations (on Notebowl)
Week 6: Descriptive Statistics 2
In Class: Distributions, sampling, central limit theorem
Homework: Take-home assignment (counts as quiz grade)
In Class: Descriptive Statistics Continued, Shapiro-Wilk Test
Homework: Cathy O'Neil, "Arms Race: Going to College" from Weapons of Math Destruction (pdf on Notebowl)
In Class: Overconfidence
Homework: Mid-term Evaluation and Progress Report
Week 7: Statistical Significance
In Class: Review of Key Concepts; Discuss Project Plan Assignment
Homework for Friday: Duncan Watts, from "The Dream of Prediction," Everything is Obvious (pdf on Notebowl)
Reminder: No Lab This Week, and No Class on Wednesday.
No Class Today. Enjoy Your Time Off!
In Class: What is Probability?
Homework: Complete the Project Plan Assignment. Submit via Google Drive folder and Bring a Laptop to Next Class
Quarter 3: Data Analysis
Week 8: Predictive Analytics 1
In Class: T-Tests, P-Values; Peer Review of Project Plan Assignment (final version due Friday at 11:59 p.m.)
Homework: Sign up for oral presentations (link to sign-up sheet on NoteBowl)
In Class: Confidence Intervals, Effect Size
Homework: Read Daniel Kahneman, "The Illusion of Understanding," "The Illusion of Validity," and "Intuitions vs. Formulas" Thinking Fast and Slow (pdf on Notebowl)
In Class: Problems of Intuition, How and When Quantification Fails
Homework: Submit final version of Project Plan Assignment by Monday at 11:30 a.m.
Week 9: Predictive Analytics 2
In Class: Logistic and Linear Regression
Homework: Complete take-home assignment 3 (counts as quiz grade)
In Class: Logistic and Linear Regression Continued
Homework: Read Gebru et. al., "Datasheets for Datasets" (online); Complete team lab report
In Class: Data Ethics and Machine Learning
Homework: Watch Alex Edmans, TEDx Talk, What to Trust in a Post-Truth World (17:47)
Week 10: Bayesian Analytics
In Class: Bayes Theorem and Bayesian Data Analysis
Homework: Complete Final Project Visualization Component. Submit via Google Drive folder and Bring a Laptop to Next Class. (Final version due Monday at 11:30 a.m.)
In Class: Peer Review of Final Project Visualization Component (Final version due Monday at 11:30 a.m.)
Homework: Read Nate Silver, "Less and Less Wrong," from The Signal and the Noise (Penguin, 2012) 221-247 (pdf on Notebowl)
In Class: How to Think Like a Bayesian
Homework: Read William Stafford Noble "A Quick Guide to Organizing Computational Biology Projects"; "The importance of local context in COVID-19 models"; "Data curation during a pandemic and lessons learned from COVID-19" (these are short but may be difficult to read, so plan accordingly)
Quarter 4: Case Studies and Student Projects
Week 11: Case Studies 1, Computational Biology
In Class: Organizing Computational Projects
Homework: None
In Class: Computational Biology, Data Curation and Modeling in Various Disciplines
Homework: Read D'Ignazio and Klein, "What Gets Counted Counts", Data Feminism, Cambridge: MIT Press, 2020: Open Access Edition Online. https://data-feminism.mitpress.mit.edu/pub/h1w0nbqp/release/2?readingCollection=0cd867ef
In Class: What is Data Feminism?
Homework for Tuesday: Read José Nilo G. Binongo "Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution" (pdf on Notebowl)
Reminder: No Class on Monday.
Week 12: Case Studies 2, Authorship Attribution
No Class Today. Enjoy Your Time Off!
In Class: In Class: Authorship attribution methods
Homework: Read Andrew Piper, "There Will Be Numbers"
In Class: Can culture be analyzed quantitatively?
Homework: Read Harris et. al., "Two Failures to Replicate High-Performance-Goal Priming Effects" (pdf on Notebowl)
Week 13: Case Studies 3, Priming Effects
In Class: Priming Effects and the Replication Crisis
Homework: Take-home assignment (counts as quiz grade)
In Class: Type I errors, p-hacking, the curse of dimensionality, the file drawer problem
Homework: Read Longino, "Values and Objectivity," from Science as Social Knowledge (pdf on Notebowl)
In Class: What is scientific objectivity?
Homework: Complete First Draft of Final Project Written Analysis. Submit via Google Drive folder and Bring a Laptop to Next Class
Week 14: Case Studies 4, Gender Prediction
In Class: Peer Feedback on Topic Exploration Assignment ... revise and turn in final drafts by May 12, 2021
Homework: Read Helena Mihaljevic et. al., "Reflections on Gender Analyses of Bibliographic Corpora" Frontiers in Big Data 2 (August 28, 2019): 29
In Class: Gender bias and questioning binarism
Homework: Complete Course Evaluations
In Class: Data Analytics at Denison; Moving Forward as a Student and a Human
Homework: Complete Final Projects
Week 15: Exam Week
Homework: Complete and Submit Final Version of Final Projects (Written Analysis and Reflection) via Github by 2 p.m. Eastern Time