DA 351 - Advanced Descriptive Methods for Data Analytics

Spring 2025

Your Professor:

Matt Lavin

My Email:

lavinm@denison.edu

My Office:

BMRG 411

Office Hours

10:30 - 11:30 a.m. M W by appointment, 12:30-1:30 p.m. T walk-in

Our Classroom:

Talbot 229

When We Meet:

1:30-2:50 p.m. MW

Course Description

Advanced Descriptive Methods, in parallel with DA 352 and 353, is designed to develop students' understanding of the cutting-edge methods and algorithms of data analytics and how they can be used to answer questions about real-world problems. While all advanced methods for Data Analytics can be applied in a variety of capacities, descriptive analytics emphasizes using Natural Language Processing (NLP) methods to work with text as data, modeling for interpretability, and designing and deploying Computer Vision (CV) systems. In DA 351, students will examine both supervised and unsupervised methods, including topics such as advanced regression, K nearest neighbors, hierarchical clustering, ranked cosine similarity, and deep learning.

Shared Learning Goals for All Advanced Methods Courses:

  1. assess model performance under uncertainty
  2. align research questions, target data, and methods
  3. handle large data and/or computations by writing efficient code
  4. remain flexible when encountering and adopting new models and methods
  5. maintain well-organized code, project spaces, and documentation

Specific Learning Goals for Advanced Descriptive Methods:

  1. establish a working toolkit for advanced data analytics in Python
  2. gain experience selecting supervised and unsupervised machine learning models for interpretability, and interpreting the results when such models are used
  3. apply natural language processing (NLP) and computer vision (CV) strategies in Data Analytics contexts

Office Hours

I am always happy to see students during my office hours, whether it's to discuss this class, majoring in DA, how I can contribute to your learning at Denison, or your plans for life after graduation (career, graduate school, etc.). Like many professors, I offer mix of in-person appointments (via Google Calendar) and drop-in office hours.

For office hours by appointment, visit my appointment page, where you will see a real-time account of when I am available. You can book the appointment with one or two clicks by selecting any time when I'm listed as available. My standard appointment slots are divided into 20-minute blocks from 10:30 to 11:30 a.m. on Mondays and Wednesdays. Note that these appointment slots will disappear from my calendar once I've been booked, and you are required to book appointments at least 24 hours in advance.

Drop-in office hours will be held in my office from 12:30 to 1:30 p.m. on Tuesdays. For these, you will not need an appointment, and I encourage you to drop by or work in or near the lab space across the hall from me. I see students in the order they arrive, so there is no guarantee that I will have time for everyone on a given day. In other words, if you have a very specific or time sensitive question or concern, it's best to make an appointment or email me.

If I ever need to cancel by-appointment office hours on a given day (say, for example, if I'm ill), I will update the calendar and email anyone with an appointment. If I ever need to cancel office hours on a given drop-in day, I will post to Canvas or e-mail the entire class.


Additional Norms and Policies

Required Texts

No book purchases are required. Selected readings will be made available as html or pdf, and linked to the course website or shared via Canvas.

Software and Platforms

All assignments in this course will be scripted and analyzed using Python . All of my demos will use Jupyter Notebooks as a programming environment. You are welcome to use Jupyter Notebook , JupyterLab , or any other Integrated Development Environment (IDE) of your choice when writing code. Most assignments, however, will require you to turn in a .py or .ipynb file and/or a written document (.docx, .pdf). Many assignments will require specific Python dependencies, such as a particular module or library, so it is recommended that you install the Anaconda platform, which includes Python , Jupyter Notebooks , and most if not all of the libraries we will use. Lastly, for our unit on relational databses, we will use the desktop software DB Browser for SQLite , which is free to download and use.

Assignments will be shared via GitHub Classroom , which provides a collaboration and version control system via Github . Typically, Github Classroom assignments will contain datasets, assignment template files and/or starter code. In some cases, if a dataset is too large for Github 's file size limits, assignments will come with instructions on how to download the data online, or from a Google Drive folder. Completed assignments will be turned in for grading on Canvas , and all grades and student feedback will be issued on Canvas in order to maintain educational privacy.

Grading and Feedback

As a general rule, the expectations in this course are high, and I'm confident you can all do great work. The feedback I provide on assignments is designed to help you get there. My goal is to provide specific, relevant, and honest feedback when I grade your work. This will include constructive criticism, strategies for improvement, and guidance on how students can achieve success. I will not do "compliment sandwiches" just to begin and end on a positive remark, but this means that, when I praise your work, it's an honest (and I think more meaningful) act of praise.

Grade Breakdown

Item Percentage Comments
Attendance and Participation 16 See description below
Quizzes and Homework 24 Full list of assignments on Canvas
Test 1 10 In-class, individual assignment, cumulative (covers material up to the class before the exam)
Test 2 10 In-class, individual assignment, cumulative (covers material up to the class before the exam)
Project 1 8 Team-based assignment
Project 2 8 Team-based assignment
Team Project Presentation 2 Teams of students will present their project 2 findings in class
Final Exam 22 In-person, individual assignment, cumulative (covers material from the entire course)

Attendance and Timeliness

In a liberal arts context, it is particularly important that students arrive on time and come prepared to engage with the community we will be building in our classroom. I expect you to attend class every class meeting, and I expect you to arrive on time. Attendance will be taken every day. If you cannot attend class, it is your responsibility to get (from a classmate) all written notes about what we discussed in class, including in-class announcements.

Note: Late arrival counts as half an absence. If you have a commitment immediately before this class that will force to come late (say, a class in Mitchell) you should either drop that class or drop this one.

Since regular attendance is a prerequisite to passing this course, missing more than five classes will result in an additional penalty of -5% from your final course grade for each unexcused absence beyond the fifth.

Participation and Distractions

Participation will be assessed using a mix of preparedness, speaking during class discussions, remaining attentive during lectures, and completing in-class assignments. An unexcused absence will result in a participation score of zero for the day, and coming late will result in a maximum participation score of 50 for the day.

Creating distractions in class is especially frowned upon. Disruptions such as falling asleep, leaving class excessively, and talking over the professor are frowned upon. Cell phones should be off and put away. Laptops are okay for notes and such but, when laptops are being used, you should not be messaging, using Facebook, etc. In the event of a disruption, I will typically give a verbal warning for your first offense. After that, I reserve the right to ask you to leave class and mark you absent for the day.

Late Work

If you have a legitimate emergency such as a serious illness, a mental health emergency, or a death in the family, I will grant an appropriate extension with a new due date. The trade off is that work turned in this way is probably not going end up in my hand when I grade everything else, so it's going to get very sparse feedback. If you miss a deadline entirely without getting an extension, you will automatically lose 10 points off the top of your grade for each day it is late, in addition to any points you lose for the quality of the work. Retroactive and last-minute extensions will not be granted.

Being Prepared for Class

Coming to class prepared means that you have the day's reading in hand (printed or digital) and have come to class with a way to take notes (printed or digital). If you are not prepared for class, I reserve the right to grade as if you were absent for that day. Anything due on a given day is due at the start of class. Any digital submission of material is due by the time class starts on the day the hard copy is due.

Peer Learning Strategists Program

The Peer Learning Strategists (PLS) program was developed by Denison students and faculty for those in introductory science classes. It is an initiative of a larger program called RAISE (Readiness and Inclusion in Science Education) and is a great resource to learn how to study more efficiently and learn more effectively. The PLS program employs peer-to-peer mentoring focused on teaching overarching learning strategies crucial to success in college science classes. Trained science majors work as PLS mentors to help hone your learning approach since skills most helpful in college often differ from skills that led to high achievement in high school. Students meet one-on-one with a PLS mentor one hour weekly for at least three weeks with some students continuing beyond the three-sessions recommendation. PLS mentors are not tutors, content is not course-specific, and conversations provide space for attaining skills for lifelong learning and success. Contact Science Initiatives Coordinator Jeni Miller or Dr. Melanie Lott with additional questions.

Academic Integrity

Proposed and developed by Denison students, passed unanimously by DCGA and DenisonÕs faculty, the Code of Academic Integrity requires that instructors notify the Associate Provost of cases of academic dishonesty. Cases are typically heard by the Academic Integrity Board, which determines whether a violation has occurred, and, if so, its severity and the sanctions. In some circumstances the case may be handled through an Administrative Resolution Procedure. Further, the code makes students responsible for promoting a culture of integrity on campus and acting in instances in which integrity is violated.

Academic honesty, the cornerstone of teaching and learning, lays the foundation for lifelong integrity. Academic dishonesty is intellectual theft. It includes but is not limited to providing or receiving assistance in a manner not authorized by the instructor in the creation of work to be submitted for evaluation. This standard applies to all work ranging from daily homework assignments to major exams. Students must clearly cite any sources consulted--not merely for quoted phrases, but also for ideas and information that are not common knowledge. Neither ignorance nor carelessness is an acceptable defense incases of plagiarism. It is the studentÕs responsibility to follow the appropriate format for citations. Students should ask their instructors for assistance in determining what sorts of materials and assistance are appropriate for assignments and for guidance in citing such materials clearly.

Note on Technology: Unauthorized use of technology (including, but not limited to, artificial intelligence sites and translation programs) in the preparation or submission of academic work can be considered a form of cheating and/or plagiarism. Instructors may at their discretion create assignments that incorporate the use of supporting technologies and will inform students of acceptable uses of technology in their courses. It is the responsibility of the student to ask the instructor for clarification whenever they are unclear about the parameters of a specific assignment and to understand that presenting the work of artificial intelligence as your own constitutes a violation of Denison's Code. Cases of suspected inappropriate use of technology may be submitted to the Academic Integrity Board to initiate an investigation of academic dishonesty. For further information about the Code of Academic Integrity, see https://denison.edu/academics/curriculum/integrity .

Our Commitment to Liberal Arts Education

Denison's mission statement articulates an explicit commitment to liberal arts education. It emphasizes active learning, which defines students as active participants in the leaning process, not passive recipients. Denison seeks to foster self-determination and to demonstrate the transformative power of education. A crucial aspect of this approach is what Denison's mission statement refers to as "a concern for the whole person," which is why the university provides a "living-learning environment" based on individual needs and an overriding concern for community. This community is based on "a firm belief in human dignity and compassion unlimited by cultural, racial, sexual, religious or economic barriers, and directed toward an engagement with the central issues of our time."

In this class, we will discuss inequality directly. In many cases, you will asked to apply quantitative reasoning skills to these subject, which can be difficult because there is always the potential for the available data to complicate or contradict something you may feel very passionate about. In these cases, you should aspire to adopt an attitude of critical skepticism, i.e. wary of claims that are not supported by evidence but potentially willing to be persuaded by evidence if you find it compelling, and willing to give that evidence a fair hearing.

How we treat one another will be a cornerstone of these conversations. Denison's "Guiding Principles" speak of "a community in which individuals respect one another and their environment." Further, "each member of the community possesses a full range of rights and responsibilities. Foremost among these is a commitment to treat each other and the environment with mutual respect, tolerance, and civility." It's easy to treat someone this way when you like them and agree with their ideas, but the real challenge is treating those who differ from us with the same compassion and respect. However, I consider disruptive, deceitful, or hateful behavior to be breaches of these responsibilities. Bullying, trolling, hate speech, and harassment of any kind will not be tolerated.

Discrimination, Sexual Misconduct, and Sexual Assault

Essays, journals, and other coursework submitted for this class are generally considered confidential pursuant to the UniversityÕs student record policies. However, students should be aware that University employees are required by University policy to report allegations of discrimination based on sex, gender, gender identity, gender expression, sexual orientation or pregnancy to the Title IX Coordinator or a Deputy Title IX Coordinator. This includes reporting all incidents of sexual misconduct, sexual assault and suspected abuse/neglect of a minor. Further, employees are to report these incidents that occur on campus and/or that involve students at Denison University whenever the employee becomes aware of a possible incident in the course of their employment, including via coursework or advising conversations. There are others on campus to whom you may speak in confidence, including clergy and medical staff and counselors at the Wellness Center. More information on Title IX and the UniversityÕs Policy prohibiting sex discrimination, including sexual harassment, sexual misconduct, stalking and retaliation, including support resources, how to report, and prevention and education efforts, can be found at: https://denison.edu/campus/title-ix .


Assignments

TBA


Weekly Calendar

[This Section is Under Development and Subject to Change]

Weekly Rhythm

Monday Wednesday
Typically reading assignments should be done by this day. Coding practice and other activities. Coding homeworks turned in and discussed in class. Instructor slide presentations. Tests will also take place on Wednesdays.

Week 1: Introducing Advanced Descriptive Methods for DA
(Monday, January 20, 2025 - Wednesday, January 22, 2025)

Learning Outcomes: Understand professor's expectations; identify the focus of the class

By Next Monday: Sign up for Github, Complete Course Survey; Read Keshav, "How to Read a Paper" (on Canvas); install Anaconda; install Github Desktop or set up credentials on the command line

Week 2: Core Concepts
(Monday, January 27, 2025 - Wednesday, January 29, 2025)

Learning Outcomes: Establish a working toolkit for advanced data analytics in Python

By Wednesday: Read "Machine Learning Using Python" chapter 3, "Numpy and Pandas" (https://faculty.washington.edu/otoomet/machinelearning-py/numpy-and-pandas.html)

By Next Monday: Read "Machine Learning Using Python" chapter 10, "Linear Regression" (https://faculty.washington.edu/otoomet/machinelearning-py/linear-regression.html)

Week 3: Core Concepts
(Monday, February 03, 2025 - Wednesday, February 05, 2025)

Learning Outcomes: Establish a working toolkit for advanced data analytics in Python

By Wednesday: Coding Homework 1

By Next Monday: Read excerpt from Boettcher, Emma."Predicting the Difficulty of Trivia Questions Using Text Features." 2016. (Canvas)

Week 4: Natural Language Processing Fundamentals
(Monday, February 10, 2025 - Wednesday, February 12, 2025)

Learning Outcomes: Understand NLP fundamentals, and how NLP fits into data analytics

By Wednesday: Coding Homework 2

By Next Monday: Read "Introduction to OpenCV," "Core Operations," "Image Processing in OpenCV" (https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html)

Week 5: Computer Vision Fundamentals
(Monday, February 17, 2025 - Wednesday, February 19, 2025)

Learning Outcomes: Understand major feature extraction approaches and their differences

On Wednesday: Test 1

By Next Monday: Read Ollie Lueck, "Ready, Set, Data! Preparing Denison’s Data Analytics Graduates for Employment" (Canvas)

Week 6: Classification (NLP)
(Monday, February 24, 2025 - Wednesday, February 26, 2025)

Learning Outcomes: Implement text classification in Python; use cross-validation; understand KNN vs. logistic regression

On Monday: Introduce Project 1; form teams; schedule 1-2 outside-of-class meetings with teammates

By Wednesday: Coding Homework 3

By Next Monday: Read "Face Recognition with OpenCV," https://docs.opencv.org/4.x/da/d60/tutorial_face_main.html

Week 7: Classification (CV)
(Monday, March 03, 2025 - Wednesday, March 05, 2025)

Learning Outcomes: Implement image classification in Python

By Wednesday: Coding Homework 4

Week 8: PCA
(Monday, March 10, 2025 - Wednesday, March 12, 2025)

Learning Outcomes: Use PCA for descriptive data analysis, visualization, and dimension reduction

By Wednesday: Complete project 1

By Monday after Break: Read Kenneth Ward Church & Patrick Hanks, "Word Association Norms, Mutual Information, and Lexicography" (https://aclanthology.org/J90-1003/)

Spring Break
(Monday, March 17, 2025 - Wednesday, March 19, 2025)

Reminder: No class

Week 9: Word Co-Occurrence
(Monday, March 24, 2025 - Wednesday, March 26, 2025)

Learning Outcomes: Understand Bag-of-Words vs. N-Gram features; use mutual information to identify entities and phrases

By Wednesday: Coding Homework 5

By Next Monday: Read Goran Glavaš et al., "A resource-light method for cross-lingual semantic textual similarity, Knowledge-Based Systems" (2018), https://doi.org/10.1016/j.knosys.2017.11.041.

Week 10: Word Embeddings
(Monday, March 31, 2025 - Wednesday, April 02, 2025)

Learning Outcomes: Manipulate word embeddings; use embeddings as features

On Monday: Introduce Project 2

By Wednesday: Coding Homework 6

By Next Monday: Read Clark et al. (2019), "Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts," https://aclanthology.org/P19-1264/

Week 11: Transformers and LLMs
(Monday, April 07, 2025 - Wednesday, April 09, 2025)

Learning Outcomes: Apply understanding of word embeddings to LLMs; use LLMs critically

On Wednesday: Test 2

By Next Monday: Gerster et al. (2020) "Computer vision supported pedestrian tracking: A demonstration on trail bridges in rural Rwanda," https://doi.org/10.1371/journal.pone.0241379

Week 12: CNNs
(Monday, April 14, 2025 - Wednesday, April 16, 2025)

Learning Outcomes: Understand theory of CNNs; learn to implement in Python

By Next Monday: Read Monarch, Human-in-the-Loop Machine Learning, 1-48 (on Canvas)

Week 13: Synthesis
(Monday, April 21, 2025 - Wednesday, April 23, 2025)

Learning Outcomes: Understand big ideas for performing DA at scale; employ HITL strategies to improve model performance

By Wednesday: Coding Homework 8

By Sunday at midnight: Complete Project 2

Week 14: Wrapping Up
(Monday, April 28, 2025 - Wednesday, April 30, 2025)

Learning Outcomes: Revisit learning goals, share results

In Class Monday and Wednesday: Project Presentations

Week 15: Exam Week
(Monday, May 05, 2025 - Wednesday, May 07, 2025)

Reminder: Last day of classes is Monday, May 5

Reminder: Final exam, in person, Wednesday, May 7, 9:00-11:00 a.m.