Your Professor:
Matt Lavin
My Email:
lavinm@denison.edu
My Office:
Knapp 205-B
Office Hours
1:30-3:00 p.m. MW by appointment and Th 10-11:30 drop-in
Our Classroom:
BMRG 219
When We Meet:
MWF 12:30-1:20
Cultural analytics is a field of study that systematically examines contemporary and historical cultures using data-driven, quantitative (and often computational) tools and methods. Scholars in this fledgling discipline are endeavoring to tackle big questions about the customs, beliefs, rules, identities, and institutions of one or more cultures. How can we train a machine learning model to predict a user's rating for a film or TV show based on that user's previous ratings, without any other information about the user or the film or TV show? Can large scale data analysis demonstrate how two seemingly similar TV series visually and narratively frame their female leads in radically different ways? What kinds of text-mining methods have been used to detect bias in reviews on ratemyprofessor.com, and how have these biases changed over time? In what ways can a deeper understanding of cultural objects interrogate and deepen our understanding of data analytics? These some examples of the kinds of research questions you will find in this introduction to cultural analytics. Students in this class will code in Python to collect and wrangle data in a variety of formats. They will use data analytics methods to locate, interpret, and communicate meaningful patterns in a range of cultural objects, such as literature, artwork, TV episodes, TED Talks, or podcasts. The course will include foundational skill building (Python programming, reproducible code, etc.), one or more team-based assignments, and a self-directed, individual research project.
This semester, I will be using a mix of drop-in office hours and in-person appointments via Google Calendar. For office hours by appointment, visit my appointment page, where you will see a real-time account of when I am available. My standard appointment slots will be divided into 20-minute blocks from 1:30 to 3 p.m. on Mondays and Wednesdays. Note that these appointment slots will disappear from my calendar once I've been booked. Please book appointments at least 24 hours in advance. If I ever need to cancel by-appointment office hours on a given day (say, for example, if I'm ill), I will update the calendar and email anyone with an appointment.
Drop-in office hours will be held in my office from 10 to 11:30 a.m. on Thursdays. For these, you will not need an appointment, but I will see students in the order they arrive, so there is no guarantee that I will have time for everyone on a given day. If your question is time sensitive, you should make an appointment. If I ever need to cancel office hours on a given drop-in day (say, for example, if I'm ill), I will e-mail the entire class.
Here you will find information on required readings, import university policies, and course-specific policies like attendance and cell phone use.
No required textbook. Selected readings will be made available on Notebowl, and linked to the course website (expect to spend a little more than normal on printing fees) |
Self-directed reading as part of your final project |
Computers: Students are required to provide their own laptops and to install free and open source software on those laptops. Support will be provided by the instructor in the installation of any useful or required software. If at any time you don’t have access to a laptop please contact the instructor and the Data Analytics Program can provide you with a loan from the laptop cart. In class, please use eduroam to connect to the Internet instead of Denison Guest. Please be respectful with your use of laptops and technology in class. I request that you only use them for class related purposes, as I and others may find them distracting (For example, no email or social media should be open in your browser tabs!). Cell phones should be kept silent and put away. |
Github, Programming Languages, Software: We will be using git and Github for version control and collaboration, as well as Github classroom as a means for you to access assignment templates and turn in your work. All programming required for the course assumes that you are using the latest version of Python 3 bundled with the Anaconda Distribution (3.9 as of January 2022). Programs should generally work on Mac, Linux, or Windows, but there may be small differences depending on your operating system, your computer's CPU, etc. |
Our teaching assistant this semester is Raina (Chengjun) Zhang. Raina is a senior and a double major in data analytics and computer science, with a minor in math. She is fascinated with NLP, unsupervised learning, and reinforcement learning. Her first internship was in a mobile game company in China. During the internship, she used K-means and RFM models to analyze users' in-game behavior data. In addition, she also tried to classify mobile games on the App Store by genres based on Latent Semantic Analysis. Raina's office hours will be 5 to 6 p.m. on Tuesdays and Thursdays in the Data Analytics Lab (BMRG 405). (If you want to meet up with Raina this week or next, email her to arrange a location.)
Item | Percentage | Comments |
---|---|---|
Individual and Team-Based Coding Challenges | 32% | Week-to-week coding challenges designed to be completed individually or in teams, depending on the difficult level. Eight projects in total, 4 points each. Designed to help students develop competencies required for success in the class. Work will be assessed on quality of your solution, the elegance of your code, and the level of critical thinking demonstrated in written materials. |
Team Projects | 20% | Larger scale |
10 points | Authorship Attribution Project | |
10 points |
Sound Project |
|
Individual Research Project | 40% | |
5 points |
Scaffolding Assignment 1: Research Plan and Data Curation |
|
5 points |
Scaffolding Assignment 2: Initial Findings, Data, and Code |
|
5 points |
Scaffolding Assignment 3: Peer Review of First Drafts |
|
25 points |
Final Project Paper and Code Submission |
|
Attendance and Participation | 8% |
As with any class, you will have to spend some time with me to get a real sense of what I value and how I grade, but I look forward to that process, and to getting to know you all better more generally. One of the big advantages of a school like Denison is that, if you want to work with me again, you'll probably be able to, whether in another data analytics course, a summer research fellowship, or some other capacity.
As a general rule, the expectations in this course are high, and I'm confident you can all do great work. The feedback I provide on assignments is designed to help you get there. My goal is to provide specific, relevant, and honest feedback when I grade your work. This will include constructive criticism, strategies for improvement, and guidance on how students can achieve success. I will not do "compliment sandwiches" just to begin and end on a positive remark, but this means that, when I praise your work, it's an honest (and I think more meaningful) act of praise.
In general, you will be graded on a mix of criteria that includes assignment process (following directions, turning things in on time); technical profiency (transparency of method, effective research design, proper validation methods, reproducible code, etc.); and use of effective communication strategies (sufficient levels of detail, clarity, depth of research, proper citations, etc.). Individual and team based coding challenges will typically include a "Grading Criteria" section, and major assignments for this class will be distributed with custom rubrics.
The pace of this class is fast, especially in the first five weeks of the semester when we are focusing on core profiencies. If you have a legitimate emergency such as a serious illness, a mental health emergency, or a death in the family, I will grant an appropriate extension with a new due date. The trade off is that work turned in this way is probably not going end up in my hand when I grade everything else, so it’s going to get less feedback. If you miss a deadline entirely without getting an extension, you will automatically lose 10 points off the top of your grade for each day it is late, in addition to any points you lose for the quality of the work.
Cell phones should be off and put away. Laptops are okay for notes and such, but you should not be messaging, using Facebook, etc. I’ll check screens regularly give you a verbal warning on your first offense. After that, I reserve the right to ask you to leave class and mark you absent if you are creating a distraction.
Coming to class prepared means that you have the day's reading in hand (printed or digital) and have come to class with a way to take notes (printed or digital). If you are not prepared for class, I reserve the right to grade as if you were absent for that day. Anything due on a given day is due at the start of class. Any digital submission of material is due by the time class starts on the day the hard copy is due. If students so not come to class prepared, I reserve the right to add reading quizzes to the day's work. I very much prefer not to do exercise this option and am asking you to help make it unnecessary.
Proposed and developed by Denison students, passed unanimously by DCGA and Denison’s faculty, the Code of Academic Integrity requires that instructors notify the Associate Provost of cases of academic dishonesty. Cases are typically heard by the Academic Integrity Board, which determines whether a violation has occurred, and, if so, its severity and the sanctions. In some circumstances the case may be handled through an Administrative Resolution Procedure. Further, the code makes students responsible for promoting a culture of integrity on campus and acting in instances in which integrity is violated.
Academic honesty, the cornerstone of teaching and learning, lays the foundation for lifelong integrity. Academic dishonesty is intellectual theft. It includes, but is not limited to, providing or receiving assistance in a manner not authorized by the instructor in the creation of work to be submitted for evaluation. This standard applies to all work ranging from daily homework assignments to major exams. Students must clearly cite any sources consulted--not merely for quoted phrases, but also for ideas and information that are not common knowledge. Neither ignorance nor carelessness is an acceptable defense incases of plagiarism. It is the student’s responsibility to follow the appropriate format for citations. Students should ask their instructors for assistance in determining what sorts of materials and assistance are appropriate for assignments and for guidance in citing such materials clearly.
Denison's mission statement articulates an explicit commitment to liberal arts education. It emphasizes active learning, which defines students as active participants in the leaning process, not passive recipients. Denison seeks to foster self-determination and to demonstrate the transformative power of education. A crucial aspect of this approach is what Denison's mission statement refers to as "a concern for the whole person," which is why the university provides a "living-learning environment" based on individual needs and an overriding concern for community. This community is based on "a firm belief in human dignity and compassion unlimited by cultural, racial, sexual, religious or economic barriers, and directed toward an engagement with the central issues of our time."
In this class, we will discuss inequality directly. In many cases, you will asked to apply quantitative reasoning skills to these subject, which can be difficult because there is always the potential for the available data to complicate or contradict something you may feel very passionate about. In these cases, you should aspire to adopt an attitude of critical skepticism, i.e. wary of claims that are not supported by evidence but potentially willing to be persuaded by evidence if you find it compelling, and willing to give that evidence a fair hearing.
How we treat one another will be a cornerstone of these conversations. Denison's "Guiding Principles" speak of "a community in which individuals respect one another and their environment." Further, "each member of the community possesses a full range of rights and responsibilities. Foremost among these is a commitment to treat each other and the environment with mutual respect, tolerance, and civility." It's easy to treat someone this way when you like them and agree with their ideas, but the real challenge is treating those who differ from us with the same compassion and respect. However, I consider disruptive, deceitful, or hateful behavior to be breaches of these responsibilities. Bullying, trolling, hate speech, and harassment of any kind will not be tolerated.
Week-to-week coding challenges designed to be completed individually or in teams, depending on the difficult level. Eight projects in total, 4 points each. Designed to help students develop competencies required for success in the class. Work will be assessed on quality of your solution, the elegance of your code, and the level of critical thinking demonstrated in written materials.
The first of two team-based projects. In this assignment, your team will asked to assess whether a novel written with multiple narrators can be successfully identified as work belonging to its actual author when a machine learning model is trained on the true, as well as a distractor set of three alternative authorship candidates. Training and test data for this assignment will be provided by the professor, but your team will need to use Natural Language Processing (NLP) methods to prepare the data for analysis.
The second of two team-based projects. In this assignment, your team will asked to train a model to detect different types of well known, regional English accents. Audio data for this assignment will be provided by the professor, but your team may need provide data labels, partition the data into a training and test set, and design your computation so that the overall strength of your model can be assessed.
An individual assignment in which you will be expected to articulate a research question related to the subject of cultural analytics, design a computational approach to your question, identify and/or develop a dataset for your analysis, execute a computation, and write a paper that describes your work using the IMRAD template (Introduction, Methods, Results, Analysis, and Discussion). Includes several smaller assignments with due dates throughout the semester, the purpose of which is to ensure the success of your project.
Week 1: Python Core Skills - Review
In Class: Syllabus, Introductions
Homework: Github accounts, Survey
In Class: Discuss Survey Results; Installing Python
Homework: Read "The NYT Spelling Bee Gives Me L-I-F-E"
Why We're Reading This: This article explains the motivation behind our Spelling Bee Problem
In Class: Python Fundamentals worksheet; Discuss Spelling Bee Problem
Homework: Complete Spelling Bee Problem; refer to "Programming in Python" from Introduction to Cultural Analytics & Python as needed
Week 2: Python Core Skills - Pandas
In Class: NO CLASS
Homework: Spelling Bee Problem
In Class: Discuss Spelling Bee Problem Results
Homework: Read "What Is the Most Valuable State on Jeopardy!?"
Why We're Reading This: This article explains the motivation behind our Jeopardy problem
In Class: Begin Pandas worksheet
Homework: Complete Panda worksheet; refer to "Data Analysis (Pandas)" from Introduction to Cultural Analytics & Python as needed
Week 3: NLP
In Class: Tokenization, normalization, Counters
Homework: Start Jeopardy Problem
In Class: Bag of words, VSM in pandas
Homework: Jeopardy Problem
In Class: Discuss Jeopardy Problem Results
Homework: Read Underwood and Sellers, "The Emergence of Literary Diction"
Why We're Reading This: This article explains the motivation behind our OED problem and is a nice introduction to cultural analytics
Week 4: Humanities Data
In Class: Requests, BeautifulSoup, HTML, time
Homework: Complete scraping practice worksheet; refer to "Data Collection" from Introduction to Cultural Analytics & Python as needed
In Class: Discuss scraping practice worksheet
Homework: OED problem
In Class: Discuss OED problem results
Homework: Read Martin Eve, "Close Reading with Computers" chapter excerpt (Notebowl); look at Authorship Attribution project assignment description
Why We're Reading This: This chapter is the basis of our authorship attribution project
Week 5: Text Analysis
In Class: Authorship attribution, function words, Burrows' Delta
Homework: Start Authorship Attribution project
In Class: Authorship Attribution continued. Scikit Learn.
Homework: Continue Authorship Attribution project
In Class: Authorship Attribution continued. Scikit Learn.
Homework: Finish Authorship Attribution project; Read Piper, "There Will Be Numbers"
Why We're Reading This: This reading defines cultural analytics
Week 6: Text Analysis
In Class: Discuss Problem results. From AA to CA
Homework: Begin Cather Letters Classification Problem
In Class: Looping XML to create rectangular data
Homework: Read Ignatow, Gabe, and Rada Mihalcea. "The Philosophy and Logic of Text Mining" (notebowl)
Why We're Reading This: This reading provides a really effective summary of the big ideas behind text mining
In Class: Discuss Reading.
Homework: Complete Cather Letters Classification Problem
Week 7: Text Analysis
In Class: Discuss Problem Results
Homework: (recommended but not required) "Exploring the Intersection of Personal and Public Authorial Voice in the Works of Willa Cather", Digital Scholarship in the Humanities, Volume 30, Issue supplement 1, December 2015, Pages i36–i42, https://doi.org/10.1093/llc/fqv041
In Class: Class canceled. Catch up on other work.
Homework: Read Milo Beckman, "These Are The Phrases Each GOP Candidate Repeats Most"
Why We're Reading This: This reading demonstrates TF-IDF by example
In Class: TF-IDF; Dictionary based approaches. (Refer to "Analyzing Documents with TF-IDF" )
Homework: Begin neologisms problem
Recommended but Not Required: "Exploring and Analyzing Network Data with Python"
Week 8: Network Analysis
In Class: Network basics: nodes, edges, edge weights (Guest lecturer John Ladd)
Homework: Read Seth Stephens-Davidowitz, "Zooming In," from Everybody Lies (notebowl)
In Class: Cosine similarity
Homework: Finish Neologisms Problem
In Class: Discuss Problem Results
Homework: Work on final projects
Week 9: SPRING BREAK - NO CLASS
Week 10: Synthesis 1
In Class: Review/Big Picture/Demo
Homework: Read Ignatow, Gabe, and Rada Mihalcea. "Research Design and Basic Tools" (Notebowl)
Why We're Reading This: This reading cover approaches to research design for computational text analysis, but it generalizes well to the process of designing effective cultural analytics research
In Class: Review/Big Picture/Demo
Homework: Scaffolding Assignment 1
In Class: Discuss Research Plans
Homework: Read "The Bill Watterson Interview"
Why We're Reading This: This interview helps explain the motivation behind our Image Classification Problem
Week 11: Images
In Class: Image libraries, common tasks
Homework: Work on final projects
In Class: Working with bounding boxes
Homework: Start Images Problem
In Class: Work on Images Problem with teammates
Homework: Finish Images Problem
Week 12: Audio
In Class: Discuss Images Problem Results
Homework: Read "Measured Applause"
Why We're Reading This: This article helps frame the potential of audio analysis in cultural analytics research
In Class: Discuss reading, sound libraries, sound features
Homework: Sound Project
In Class: Work on sound project with teammates
Homework: Work on sound project
Week 13: Video
In Class: Discuss Working with Video
Homework: Read "Visual Style in Two Network-Era Sitcoms"
Why We're Reading This: This article helps frame the potential of video analysis in cultural analytics research
In Class: Discuss Reading. Distant Viewing Toolkit
Homework: Work on Sound Problem
In Class: Work on Sound Problem with teammates
Homework: Complete Sound Problem
Week 14: Synthesis 2
In Class: Discuss Problem Results
Homework: Work on final projects
In Class: Review/Big Picture/Demo
Homework: Scaffolding Assignment 2
No Class: Awards Convocation
Week 15: Synthesis 3
In Class: Discuss Initial Results
Homework: Work on final projects
In Class: Review/Big Picture/Demo
Homework: Scaffolding Assignment 3
In Class: Peer review
Homework: Work on final projects
Week 16: Synthesis 3
In Class: Review/Big Picture/Demo
Exam Week
Homework: Final Project due by 8:30 p.m. on Wednesday, May 4