DA 245 - Introduction to Cultural Analytics

Spring 2022

Your Professor:

Matt Lavin

My Email:

lavinm@denison.edu

My Office:

Knapp 205-B

Office Hours

1:30-3:00 p.m. MW by appointment and Th 10-11:30 drop-in

Our Classroom:

BMRG 219

When We Meet:

MWF 12:30-1:20

Course Description

Cultural analytics is a field of study that systematically examines contemporary and historical cultures using data-driven, quantitative (and often computational) tools and methods. Scholars in this fledgling discipline are endeavoring to tackle big questions about the customs, beliefs, rules, identities, and institutions of one or more cultures. How can we train a machine learning model to predict a user's rating for a film or TV show based on that user's previous ratings, without any other information about the user or the film or TV show? Can large scale data analysis demonstrate how two seemingly similar TV series visually and narratively frame their female leads in radically different ways? What kinds of text-mining methods have been used to detect bias in reviews on ratemyprofessor.com, and how have these biases changed over time? In what ways can a deeper understanding of cultural objects interrogate and deepen our understanding of data analytics? These some examples of the kinds of research questions you will find in this introduction to cultural analytics. Students in this class will code in Python to collect and wrangle data in a variety of formats. They will use data analytics methods to  locate, interpret, and communicate meaningful patterns in a range of cultural objects, such as literature, artwork, TV episodes, TED Talks, or podcasts. The course will include foundational skill building (Python programming, reproducible code, etc.), one or more team-based assignments, and a self-directed, individual research project.


Office Hours

This semester, I will be using a mix of drop-in office hours and in-person appointments via Google Calendar. For office hours by appointment, visit my appointment page, where you will see a real-time account of when I am available. My standard appointment slots will be divided into 20-minute blocks from 1:30 to 3 p.m. on Mondays and Wednesdays. Note that these appointment slots will disappear from my calendar once I've been booked. Please book appointments at least 24 hours in advance. If I ever need to cancel by-appointment office hours on a given day (say, for example, if I'm ill), I will update the calendar and email anyone with an appointment.

Drop-in office hours will be held in my office from 10 to 11:30 a.m. on Thursdays. For these, you will not need an appointment, but I will see students in the order they arrive, so there is no guarantee that I will have time for everyone on a given day. If your question is time sensitive, you should make an appointment. If I ever need to cancel office hours on a given drop-in day (say, for example, if I'm ill), I will e-mail the entire class.


Additional Norms and Policies


Here you will find information on required readings, import university policies, and course-specific policies like attendance and cell phone use.

Required Texts

No required textbook. Selected readings will be made available on Notebowl, and linked to the course website (expect to spend a little more than normal on printing fees)
Self-directed reading as part of your final project

Computers and Software

Computers: Students are required to provide their own laptops and to install free and open source software on those laptops. Support will be provided by the instructor in the installation of any useful or required software. If at any time you don’t have access to a laptop please contact the instructor and the Data Analytics Program can provide you with a loan from the laptop cart. In class, please use eduroam to connect to the Internet instead of Denison Guest. Please be respectful with your use of laptops and technology in class. I request that you only use them for class related purposes, as I and others may find them distracting (For example, no email or social media should be open in your browser tabs!). Cell phones should be kept silent and put away.
Github, Programming Languages, Software: We will be using git and Github for version control and collaboration, as well as Github classroom as a means for you to access assignment templates and turn in your work. All programming required for the course assumes that you are using the latest version of Python 3 bundled with the Anaconda Distribution (3.9 as of January 2022). Programs should generally work on Mac, Linux, or Windows, but there may be small differences depending on your operating system, your computer's CPU, etc.

Our Teaching Assistant

Our teaching assistant this semester is Raina (Chengjun) Zhang. Raina is a senior and a double major in data analytics and computer science, with a minor in math. She is fascinated with NLP, unsupervised learning, and reinforcement learning. Her first internship was in a mobile game company in China. During the internship, she used K-means and RFM models to analyze users' in-game behavior data. In addition, she also tried to classify mobile games on the App Store by genres based on Latent Semantic Analysis. Raina's office hours will be 5 to 6 p.m. on Tuesdays and Thursdays in the Data Analytics Lab (BMRG 405). (If you want to meet up with Raina this week or next, email her to arrange a location.)

Grade Breakdown

Item Percentage Comments
Individual and Team-Based Coding Challenges 32% Week-to-week coding challenges designed to be completed individually or in teams, depending on the difficult level. Eight projects in total, 4 points each. Designed to help students develop competencies required for success in the class. Work will be assessed on quality of your solution, the elegance of your code, and the level of critical thinking demonstrated in written materials.
Team Projects 20% Larger scale
  10 points Authorship Attribution Project
  10 points

Sound Project

Individual Research Project 40%  
  5 points

Scaffolding Assignment 1: Research Plan and Data Curation

  5 points

Scaffolding Assignment 2: Initial Findings, Data, and Code

  5 points

Scaffolding Assignment 3: Peer Review of First Drafts

  25 points

Final Project Paper and Code Submission

Attendance and Participation 8%  

Grading and Feedback

As with any class, you will have to spend some time with me to get a real sense of what I value and how I grade, but I look forward to that process, and to getting to know you all better more generally. One of the big advantages of a school like Denison is that, if you want to work with me again, you'll probably be able to, whether in another data analytics course, a summer research fellowship, or some other capacity.

As a general rule, the expectations in this course are high, and I'm confident you can all do great work. The feedback I provide on assignments is designed to help you get there. My goal is to provide specific, relevant, and honest feedback when I grade your work. This will include constructive criticism, strategies for improvement, and guidance on how students can achieve success. I will not do "compliment sandwiches" just to begin and end on a positive remark, but this means that, when I praise your work, it's an honest (and I think more meaningful) act of praise.

In general, you will be graded on a mix of criteria that includes assignment process (following directions, turning things in on time); technical profiency (transparency of method, effective research design, proper validation methods, reproducible code, etc.); and use of effective communication strategies (sufficient levels of detail, clarity, depth of research, proper citations, etc.). Individual and team based coding challenges will typically include a "Grading Criteria" section, and major assignments for this class will be distributed with custom rubrics.

Late Work

The pace of this class is fast, especially in the first five weeks of the semester when we are focusing on core profiencies. If you have a legitimate emergency such as a serious illness, a mental health emergency, or a death in the family, I will grant an appropriate extension with a new due date. The trade off is that work turned in this way is probably not going end up in my hand when I grade everything else, so it’s going to get less feedback. If you miss a deadline entirely without getting an extension, you will automatically lose 10 points off the top of your grade for each day it is late, in addition to any points you lose for the quality of the work.

Distractions

Cell phones should be off and put away. Laptops are okay for notes and such, but you should not be messaging, using Facebook, etc. I’ll check screens regularly give you a verbal warning on your first offense. After that, I reserve the right to ask you to leave class and mark you absent if you are creating a distraction.

Being Prepared for Class

Coming to class prepared means that you have the day's reading in hand (printed or digital) and have come to class with a way to take notes (printed or digital). If you are not prepared for class, I reserve the right to grade as if you were absent for that day. Anything due on a given day is due at the start of class. Any digital submission of material is due by the time class starts on the day the hard copy is due. If students so not come to class prepared, I reserve the right to add reading quizzes to the day's work. I very much prefer not to do exercise this option and am asking you to help make it unnecessary.

Academic Integrity

Proposed and developed by Denison students, passed unanimously by DCGA and Denison’s faculty, the Code of Academic Integrity requires that instructors notify the Associate Provost of cases of academic dishonesty. Cases are typically heard by the Academic Integrity Board, which determines whether a violation has occurred, and, if so, its severity and the sanctions. In some circumstances the case may be handled through an Administrative Resolution Procedure. Further, the code makes students responsible for promoting a culture of integrity on campus and acting in instances in which integrity is violated.

Academic honesty, the cornerstone of teaching and learning, lays the foundation for lifelong integrity. Academic dishonesty is intellectual theft. It includes, but is not limited to, providing or receiving assistance in a manner not authorized by the instructor in the creation of work to be submitted for evaluation. This standard applies to all work ranging from daily homework assignments to major exams. Students must clearly cite any sources consulted--not merely for quoted phrases, but also for ideas and information that are not common knowledge. Neither ignorance nor carelessness is an acceptable defense incases of plagiarism. It is the student’s responsibility to follow the appropriate format for citations. Students should ask their instructors for assistance in determining what sorts of materials and assistance are appropriate for assignments and for guidance in citing such materials clearly.

Our Commitment to Liberal Arts Education

Denison's mission statement articulates an explicit commitment to liberal arts education. It emphasizes active learning, which defines students as active participants in the leaning process, not passive recipients. Denison seeks to foster self-determination and to demonstrate the transformative power of education. A crucial aspect of this approach is what Denison's mission statement refers to as "a concern for the whole person," which is why the university provides a "living-learning environment" based on individual needs and an overriding concern for community. This community is based on "a firm belief in human dignity and compassion unlimited by cultural, racial, sexual, religious or economic barriers, and directed toward an engagement with the central issues of our time."

In this class, we will discuss inequality directly. In many cases, you will asked to apply quantitative reasoning skills to these subject, which can be difficult because there is always the potential for the available data to complicate or contradict something you may feel very passionate about. In these cases, you should aspire to adopt an attitude of critical skepticism, i.e. wary of claims that are not supported by evidence but potentially willing to be persuaded by evidence if you find it compelling, and willing to give that evidence a fair hearing.

How we treat one another will be a cornerstone of these conversations. Denison's "Guiding Principles" speak of "a community in which individuals respect one another and their environment." Further, "each member of the community possesses a full range of rights and responsibilities. Foremost among these is a commitment to treat each other and the environment with mutual respect, tolerance, and civility." It's easy to treat someone this way when you like them and agree with their ideas, but the real challenge is treating those who differ from us with the same compassion and respect. However, I consider disruptive, deceitful, or hateful behavior to be breaches of these responsibilities. Bullying, trolling, hate speech, and harassment of any kind will not be tolerated.


Assignments

Individual and Team-Based Coding Challenges

Week-to-week coding challenges designed to be completed individually or in teams, depending on the difficult level. Eight projects in total, 4 points each. Designed to help students develop competencies required for success in the class. Work will be assessed on quality of your solution, the elegance of your code, and the level of critical thinking demonstrated in written materials.

Authorship Attribution Project

The first of two team-based projects. In this assignment, your team will asked to assess whether a novel written with multiple narrators can be successfully identified as work belonging to its actual author when a machine learning model is trained on the true, as well as a distractor set of three alternative authorship candidates. Training and test data for this assignment will be provided by the professor, but your team will need to use Natural Language Processing (NLP) methods to prepare the data for analysis.

Sound Project

The second of two team-based projects. In this assignment, your team will asked to train a model to detect different types of well known, regional English accents. Audio data for this assignment will be provided by the professor, but your team may need provide data labels, partition the data into a training and test set, and design your computation so that the overall strength of your model can be assessed.

Individual Research Project

An individual assignment in which you will be expected to articulate a research question related to the subject of cultural analytics, design a computational approach to your question, identify and/or develop a dataset for your analysis, execute a computation, and write a paper that describes your work using the IMRAD template (Introduction, Methods, Results, Analysis, and Discussion). Includes several smaller assignments with due dates throughout the semester, the purpose of which is to ensure the success of your project.

 


Weekly Calendar

Week 1: Python Core Skills - Review

Monday, January 17, 2022

In Class: Syllabus, Introductions

Homework: Github accounts, Survey

Wednesday, January 19, 2022

In Class: Discuss Survey Results; Installing Python

Homework: Read "The NYT Spelling Bee Gives Me L-I-F-E"

Why We're Reading This: This article explains the motivation behind our Spelling Bee Problem

Friday, January 21, 2022

In Class: Python Fundamentals worksheet; Discuss Spelling Bee Problem

Homework: Complete Spelling Bee Problem; refer to "Programming in Python" from Introduction to Cultural Analytics & Python as needed

Week 2: Python Core Skills - Pandas

Monday, January 24, 2022

In Class: NO CLASS

Homework: Spelling Bee Problem

Wednesday, January 26, 2022

In Class: Discuss Spelling Bee Problem Results

Homework: Read "What Is the Most Valuable State on Jeopardy!?"

Why We're Reading This: This article explains the motivation behind our Jeopardy problem

Friday, January 28, 2022

In Class: Begin Pandas worksheet

Homework: Complete Panda worksheet; refer to "Data Analysis (Pandas)" from Introduction to Cultural Analytics & Python as needed

Week 3: NLP

Monday, January 31, 2022

In Class: Tokenization, normalization, Counters

Homework: Start Jeopardy Problem

Wednesday, February 2, 2022

In Class: Bag of words, VSM in pandas

Homework: Jeopardy Problem

Friday, February 4, 2022

In Class: Discuss Jeopardy Problem Results

Homework: Read Underwood and Sellers, "The Emergence of Literary Diction"

Why We're Reading This: This article explains the motivation behind our OED problem and is a nice introduction to cultural analytics

Week 4: Humanities Data

Monday, February 7, 2022

In Class: Requests, BeautifulSoup, HTML, time

Homework: Complete scraping practice worksheet; refer to "Data Collection" from Introduction to Cultural Analytics & Python as needed

Wednesday, February 9, 2022

In Class: Discuss scraping practice worksheet

Homework: OED problem

Friday, February 11, 2022

In Class: Discuss OED problem results

Homework: Read Martin Eve, "Close Reading with Computers" chapter excerpt (Notebowl); look at Authorship Attribution project assignment description

Why We're Reading This: This chapter is the basis of our authorship attribution project

Week 5: Text Analysis

Monday, February 14, 2022

In Class: Authorship attribution, function words, Burrows' Delta

Homework: Start Authorship Attribution project

Wednesday, February 16, 2022

In Class: Authorship Attribution continued. Scikit Learn.

Homework: Continue Authorship Attribution project

Friday, February 18, 2022

In Class: Authorship Attribution continued. Scikit Learn.

Homework: Finish Authorship Attribution project; Read Piper, "There Will Be Numbers"

Why We're Reading This: This reading defines cultural analytics

Week 6: Text Analysis

Monday, February 21, 2022

In Class: Discuss Problem results. From AA to CA

Homework: Begin Cather Letters Classification Problem

Wednesday, February 23, 2022

In Class: Looping XML to create rectangular data

Homework: Read Ignatow, Gabe, and Rada Mihalcea. "The Philosophy and Logic of Text Mining" (notebowl)

Why We're Reading This: This reading provides a really effective summary of the big ideas behind text mining

Friday, February 25, 2022

In Class: Discuss Reading.

Homework: Complete Cather Letters Classification Problem

Week 7: Text Analysis

Monday, February 28, 2022

In Class: Discuss Problem Results

Homework: (recommended but not required) "Exploring the Intersection of Personal and Public Authorial Voice in the Works of Willa Cather", Digital Scholarship in the Humanities, Volume 30, Issue supplement 1, December 2015, Pages i36–i42, https://doi.org/10.1093/llc/fqv041

Wednesday, March 2, 2022

In Class: Class canceled. Catch up on other work.

Homework: Read Milo Beckman, "These Are The Phrases Each GOP Candidate Repeats Most"

Why We're Reading This: This reading demonstrates TF-IDF by example

Friday, March 4, 2022

In Class: TF-IDF; Dictionary based approaches. (Refer to "Analyzing Documents with TF-IDF" )

Homework: Begin neologisms problem

Recommended but Not Required: "Exploring and Analyzing Network Data with Python"

Week 8: Network Analysis

Monday, March 7, 2022

In Class: Network basics: nodes, edges, edge weights (Guest lecturer John Ladd)

Homework: Read Seth Stephens-Davidowitz, "Zooming In," from Everybody Lies (notebowl)

Wednesday, March 9, 2022

In Class: Cosine similarity

Homework: Finish Neologisms Problem

Friday, March 11, 2022

In Class: Discuss Problem Results

Homework: Work on final projects

Week 9: SPRING BREAK - NO CLASS

Week 10: Synthesis 1

Monday, March 21, 2022

In Class: Review/Big Picture/Demo

Homework: Read Ignatow, Gabe, and Rada Mihalcea. "Research Design and Basic Tools" (Notebowl)

Why We're Reading This: This reading cover approaches to research design for computational text analysis, but it generalizes well to the process of designing effective cultural analytics research

Wednesday, March 23, 2022

In Class: Review/Big Picture/Demo

Homework: Scaffolding Assignment 1

Friday, March 25, 2022

In Class: Discuss Research Plans

Homework: Read "The Bill Watterson Interview"

Why We're Reading This: This interview helps explain the motivation behind our Image Classification Problem

Week 11: Images

Monday, March 28, 2022

In Class: Image libraries, common tasks

Homework: Work on final projects

Wednesday, March 30, 2022

In Class: Working with bounding boxes

Homework: Start Images Problem

Friday, April 1, 2022

In Class: Work on Images Problem with teammates

Homework: Finish Images Problem

Week 12: Audio

Monday, April 4, 2022

In Class: Discuss Images Problem Results

Homework: Read "Measured Applause"

Why We're Reading This: This article helps frame the potential of audio analysis in cultural analytics research

Wednesday, April 6, 2022

In Class: Discuss reading, sound libraries, sound features

Homework: Sound Project

Friday, April 8, 2022

In Class: Work on sound project with teammates

Homework: Work on sound project

Week 13: Video

Monday, April 11, 2022

In Class: Discuss Working with Video

Homework: Read "Visual Style in Two Network-Era Sitcoms"

Why We're Reading This: This article helps frame the potential of video analysis in cultural analytics research

Wednesday, April 13, 2022

In Class: Discuss Reading. Distant Viewing Toolkit

Homework: Work on Sound Problem

Friday, April 15, 2022

In Class: Work on Sound Problem with teammates

Homework: Complete Sound Problem

Week 14: Synthesis 2

Monday, April 18, 2022

In Class: Discuss Problem Results

Homework: Work on final projects

Wednesday, April 20, 2022

In Class: Review/Big Picture/Demo

Homework: Scaffolding Assignment 2

Friday, April 22, 2022

No Class: Awards Convocation

Week 15: Synthesis 3

Monday, April 25, 2022

In Class: Discuss Initial Results

Homework: Work on final projects

Wednesday, April 27, 2022

In Class: Review/Big Picture/Demo

Homework: Scaffolding Assignment 3

Friday, April 29, 2022

In Class: Peer review

Homework: Work on final projects

Week 16: Synthesis 3

Monday, May 2, 2022

In Class: Review/Big Picture/Demo

Exam Week

Homework: Final Project due by 8:30 p.m. on Wednesday, May 4