DA 210 - Data Systems (cross-listed as CS 181)

Fall 2024

Your Professor:

Matt Lavin

My Email:

lavinm@denison.edu

My Office:

Burton D. Morgan Center 411

Office Hours

9:15-10:30 a.m. WF by appointment, 2-3:30 T group drop-in (DA lab)

Our Classroom:

Talbot 222

When We Meet:

11:30 to 12:20 p.m. MWF

Course Description

This course provides a broad perspective on the access, structure, storage, and representation of data. It encompasses traditional database systems, but extends to other structured and unstructured repositories of data and their access/acquisition in a client-server model of Internet computing. Also developed are an understanding of data representations amenable to structured analysis, and the algorithms and techniques for transforming and restructuring data to allow such analysis. The primary programming language used in the course will be Python.

Learning Outcomes

At the end of the course you should be able to:

  1. Describe and make use of a "working toolkit" for data analytics in Python, including but not limited to:
    1. designing and deploying complex functions
    2. mixing standard Python approaches with well-established data analysis libraries, and
    3. employing the split-apply-combine strategy in Python
  2. Develop reproducible workflows using version control for code and data analysis.
  3. Demonstrate critical thinking related to the potential ethical concerns related to data documentation and storage, and the potential role of code and algorithms in managing these data.
  4. Identify and describe tabular, hierarchical, and relational data models, including tidy data practices.
  5. Understand and communicate the affordances of tabular, hierarchical, and relational data models.

Office Hours

I am always happy to see students during my office hours, whether it's to discuss this class, majoring in DA, how I can contribute to your learning at Denison, or your plans for life after graduation (career, graduate school, etc.). Like many professors, I offer mix of in-person appointments (via Google Calendar) and drop-in office hours.

For many of your questions related to assignments for the class, our teaching assistant will be your first and best source of assistance. Our TA for this class is Ethan Tecson (tecson_e1 [at] denison.edu). Ethan will attend some classes and will hold drop-in office hours and will help answer questions about the course, as well as particular assignments. Ethan's office hours will be held in the DA Lab (Burton Morgan 405) from 10 to 11:30 a.m. on Tuesdays and 2 to 3:30 p.m. on Sundays. In general, you should meet with our TA before coming to my office hours with a code-related or assignment-related questions. Ethan's lab hours will be first-come, first-serve, and it is expected that, for any question you bring to him, you will do some preliminary work in advance of seeking this help. If you still need help after visiting our TA or you have a question that seems more appropriate for the professor, you can make an appointment or come to drop-in hours. 

For office hours by appointment, visit my appointment page, where you will see a real-time account of when I am available. You can book the appointment with one or two clicks by selecting any time when I'm listed as available. My standard appointment slots are divided into 15-minute blocks from 9:15 to 10:30 a.m. on Wednesdays and Fridays. Note that these appointment slots will disappear from my calendar once I've been booked. Please book appointments at least 24 hours in advance.

Drop-in office hours will be held in the DA lab from 2 to 3:30 p.m. on Tuesdays. For these, you will not need an appointment, and I encourage you to drop by or work in or near the lab space whether or not you have a specific question. If you are hoping to ask me a specific question, I will see students in the order they arrive, so there is no guarantee that I will have time for everyone on a given day. In other words, if your question is very specific or time sensitive, it's best to make an appointment. 

If I ever need to cancel by-appointment office hours on a given day (say, for example, if I'm ill), I will update the calendar and email anyone with an appointment. If I ever need to cancel office hours on a given drop-in day, I will post to Canvas or e-mail the entire class.


Additional Norms and Policies


Here you will find information on required readings, import university policies, and course-specific policies like attendance and cell phone use.

Required Texts

The Pragmatic Programmer, 20th Anniversary Edition by David Thomas and Andrew Hunt (Pearson, 2019): $39.99 ($28.99 ebook). Print or eBook, buy online or at the university bookstore. Note that you need to get the 20th anniversary edition. Match to ISBN-10: 0135957052 or ISBN-13: 978-0135957059 when ordering online.
Introduction to Data Systems: Building from Python (Spring 2020). Free to read and download online. You can buy the print edition online if you want, but it's not required.
Selected readings will be made available as html or pdf, and linked to the course website or shared via Canvas

Software and Platforms

All assignments in this course will be scripted and analyzed using Python. All of my demos will use Jupyter Notebooks as a programming environment. You are welcome to use Jupyter Notebook, JupyterLab, or any other Integrated Development Environment (IDE) of your choice when writing code. Most assignments, however, will require you to turn in a .py or .ipynb file and/or a written document (.docx, .pdf). Many assignments will require specific Python dependencies, such as a particular module or library, so it is recommended that you install the Anaconda platform, which includes Python, Jupyter Notebooks, and most if not all of the libraries we will use. Lastly, for our unit on relational databses, we will use the desktop software DB Browser for SQLite, which is free to download and use.
Assignments will be shared via GitHub Classroom, which provides a collaboration and version control system via Github. Typically, Github Classroom assignments will contain datasets, assignment template files and/or starter code. In some cases, if a dataset is too large for Github's file size limits, assignments will come with instructions on how to download the data online, or from a Google Drive folder. Completed assignments will be turned in for grading on Canvas, and all grades and student feedback will be issued on Canvas in order to maintain educational privacy.

Grading and Feedback

As a general rule, the expectations in this course are high, and I'm confident you can all do great work. The feedback I provide on assignments is designed to help you get there. My goal is to provide specific, relevant, and honest feedback when I grade your work. This will include constructive criticism, strategies for improvement, and guidance on how students can achieve success. I will not do "compliment sandwiches" just to begin and end on a positive remark, but this means that, when I praise your work, it's an honest (and I think more meaningful) act of praise. 

Grade Breakdown

Item Percentage Comments
Attendance and Participation 12 Attendance will be taken every day. Late arrival counts as half an absence. Participation will be assessed using a mix of preparedness, speaking during class discussions, remaining attentive during lectures, and completing in-class assignments.
Digital Poster and Presentation 5 Pairs of two students, digital poster with a low-stakes, informal presentation in class
Quizzes and Homework 25 Individual assignments (3 pts x 5 quizzes x 3; 2 pts x 5 homework)
Midterm Exam 15 In-class, individual assignment, cumulative (covers material up to the class before the exam)
Project-Based Assignments 18 Team-based assignments (6 pts x 3)
Final Exam 25 In-person, individual assignment, cumulative (covers material from the entire course)

Late Work

If you have a legitimate emergency such as a serious illness, a mental health emergency, or a death in the family, I will grant an appropriate extension with a new due date. The trade off is that work turned in this way is probably not going end up in my hand when I grade everything else, so it's going to get very sparse feedback. If you miss a deadline entirely without getting an extension, you will automatically lose 10 points off the top of your grade for each day it is late, in addition to any points you lose for the quality of the work. Retroactive and last-minute extensions will not be granted.

Distractions

Cell phones should be off and put away. Laptops are okay for notes and such but, when laptops are being used, you should not be messaging, using Facebook, etc. I will check screens regularly give you a verbal warning on your first offense. After that, I reserve the right to ask you to leave class and mark you absent if you are creating a distraction.

Being Prepared for Class

Coming to class prepared means that you have the day's reading in hand (printed or digital) and have come to class with a way to take notes (printed or digital). If you are not prepared for class, I reserve the right to grade as if you were absent for that day. Anything due on a given day is due at the start of class. Any digital submission of material is due by the time class starts on the day the hard copy is due. 

Disability Resources

If you are a student who feels you may need an accommodation based on the impact of a disability, you should contact me privately as soon as possible to discuss your specific needs. I rely on the Academic Resource Center in 020 Higley Hall to verify the need for reasonable accommodations based on documentation on file in that office.

Peer Learning Strategists Program

The Peer Learning Strategists (PLS) program was developed by Denison students and faculty for those in introductory science classes. It is an initiative of a larger program called RAISE (Readiness and Inclusion in Science Education) and is a great resource to learn how to study more efficiently and learn more effectively. The PLS program employs peer-to-peer mentoring focused on teaching overarching learning strategies crucial to success in college science classes. Trained science majors work as PLS mentors to help hone your learning approach since skills most helpful in college often differ from skills that led to high achievement in high school. Students meet one-on-one with a PLS mentor one hour weekly for at least three weeks with some students continuing beyond the three-sessions recommendation. PLS mentors are not tutors, content is not course-specific, and conversations provide space for attaining skills for lifelong learning and success. Contact Science Initiatives Coordinator Jeni Miller or Dr. Melanie Lott with additional questions.

Academic Integrity

Proposed and developed by Denison students, passed unanimously by DCGA and Denison’s faculty, the Code of Academic Integrity requires that instructors notify the Associate Provost of cases of academic dishonesty. Cases are typically heard by the Academic Integrity Board, which determines whether a violation has occurred, and, if so, its severity and the sanctions. In some circumstances the case may be handled through an Administrative Resolution Procedure. Further, the code makes students responsible for promoting a culture of integrity on campus and acting in instances in which integrity is violated.

Academic honesty, the cornerstone of teaching and learning, lays the foundation for lifelong integrity. Academic dishonesty is intellectual theft. It includes but is not limited to providing or receiving assistance in a manner not authorized by the instructor in the creation of work to be submitted for evaluation. This standard applies to all work ranging from daily homework assignments to major exams. Students must clearly cite any sources consulted--not merely for quoted phrases, but also for ideas and information that are not common knowledge. Neither ignorance nor carelessness is an acceptable defense incases of plagiarism. It is the student’s responsibility to follow the appropriate format for citations. Students should ask their instructors for assistance in determining what sorts of materials and assistance are appropriate for assignments and for guidance in citing such materials clearly.

Note on Technology: Unauthorized use of technology (including, but not limited to, artificial intelligence sites and translation programs) in the preparation or submission of academic work can be considered a form of cheating and/or plagiarism. Instructors may at their discretion create assignments that incorporate the use of supporting technologies and will inform students of acceptable uses of technology in their courses. It is the responsibility of the student to ask the instructor for clarification whenever they are unclear about the parameters of a specific assignment and to understand that presenting the work of artificial intelligence as your own constitutes a violation of Denison's Code. Cases of suspected inappropriate use of technology may be submitted to the Academic Integrity Board to initiate an investigation of academic dishonesty. For further information about the Code of Academic Integrity, see https://denison.edu/academics/curriculum/integrity.

Our Commitment to Liberal Arts Education

Denison's mission statement articulates an explicit commitment to liberal arts education. It emphasizes active learning, which defines students as active participants in the leaning process, not passive recipients. Denison seeks to foster self-determination and to demonstrate the transformative power of education. A crucial aspect of this approach is what Denison's mission statement refers to as "a concern for the whole person," which is why the university provides a "living-learning environment" based on individual needs and an overriding concern for community. This community is based on "a firm belief in human dignity and compassion unlimited by cultural, racial, sexual, religious or economic barriers, and directed toward an engagement with the central issues of our time."

In this class, we will discuss inequality directly. In many cases, you will asked to apply quantitative reasoning skills to these subject, which can be difficult because there is always the potential for the available data to complicate or contradict something you may feel very passionate about. In these cases, you should aspire to adopt an attitude of critical skepticism, i.e. wary of claims that are not supported by evidence but potentially willing to be persuaded by evidence if you find it compelling, and willing to give that evidence a fair hearing.

How we treat one another will be a cornerstone of these conversations. Denison's "Guiding Principles" speak of "a community in which individuals respect one another and their environment." Further, "each member of the community possesses a full range of rights and responsibilities. Foremost among these is a commitment to treat each other and the environment with mutual respect, tolerance, and civility." It's easy to treat someone this way when you like them and agree with their ideas, but the real challenge is treating those who differ from us with the same compassion and respect. However, I consider disruptive, deceitful, or hateful behavior to be breaches of these responsibilities. Bullying, trolling, hate speech, and harassment of any kind will not be tolerated.

Discrimination, Sexual Misconduct, and Sexual Assault

Essays, journals, and other coursework submitted for this class are generally considered confidential pursuant to the University’s student record policies. However, students should be aware that University employees are required by University policy to report allegations of discrimination based on sex, gender, gender identity, gender expression, sexual orientation or pregnancy to the Title IX Coordinator or a Deputy Title IX Coordinator. This includes reporting all incidents of sexual misconduct, sexual assault and suspected abuse/neglect of a minor. Further, employees are to report these incidents that occur on campus and/or that involve students at Denison University whenever the employee becomes aware of a possible incident in the course of their employment, including via coursework or advising conversations. There are others on campus to whom you may speak in confidence, including clergy and medical staff and counselors at the Wellness Center. More information on Title IX and the University’s Policy prohibiting sex discrimination, including sexual harassment, sexual misconduct, stalking and retaliation, including support resources, how to report, and prevention and education efforts, can be found at: https://denison.edu/campus/title-ix.


Assignments

Digital Poster and Presentation (5% of grade)

This assignment has two main purposes: to help you practice oral presentation and communication skills, and to increase your ability to use data visualizations and other visual design elements effectively in service of a central goal. Working in pairs, students will choose a topic from a list provided by the professor, and will create a digital poster that provides a concise but thorough overview of their topic, including some information about theory and application. Pairs will also give a low-stakes, informal presentation on their poster.

Note: The intention is this assignment is to collect your poster submissions for grading and share them out to the rest of the class in the hopes that they will be useful overviews or study guides for projects and exams.

Quizzes and Homework (25% of grade)

This course has intermittent quizzes on material from readings and lectures. Quizzes are designed to measure how well you are integrating the material. There will be five quizzes in total, each of which will take place on a Friday.

During weeks without quizzes, there will usually be a code-based homework assignment, either a worksheet or a more open-ended take-home assignment.

Midterm Exam (15% of grade)

The midterm exam will be an in-class, individual assignment. It will be cumulative to the date of the exam and will focus on questions that test your synthesis of the course content rather than information recall or rote learning.

Project-Based Assignments (18% of grade)

Our three major project assignments will problem-focused and will require working with data, writing Python code to solve a problem or analyze a question, and explaining your work in the form of a written report. Each of these assignments will have more detailed written instructions, which will be shared on Github Classroom. Due dates for these projects align approximately with the ending of each of our primary units: tabular data, relational data, and hierarchical data.

Final Exam (25% of course grade)

As with the midterm, the final exam will be an in-person, individual assignment. It will take place on our scheduled final exam date, and it will cover material from the entire course. It will focus on questions that test your synthesis of the course content rather than information recall or rote learning.

Note: Denison policy does not permit me to accommodate requests to reschedule your exam, except in the case of a medical or family emergency. Travel plans such as plane tickets do not count as a valid reason to reschedule, so please plan accordingly.


Weekly Calendar

Weekly Rhythm

Monday Wednesday Friday
Lab, activity, or coding. Previous assignment due by class time. Oral presentations, lecture. Discuss readings. Occasional quizzes.

Week 1: Fluent Python
(Friday, August 30, 2024 - Friday, September 06, 2024)

Learning Outcomes: Understand professor's expectations; identify the focus of the class

By Monday: Sign up for Github, Complete Course Survey

By Wednesday: Install Anaconda; install Github Desktop or set up credential on command line

By Friday, September 6: Read "The Zen of Python" (https://peps.python.org/pep-0020/) and "Software Entropy," "Stone Soup and Boiled Frogs," "Good Enough Software," "Communicate!", Pragmatic Programmer pp. 6-13; 19-24 (on Canvas)

By Next Monday: Complete homework 1

Week 2: Fluent Python
(Monday, September 09, 2024 - Friday, September 13, 2024)

Learning Outcomes: Verify that all software dependencies are covered; review file paths, git/Github workflows, Jupyter Notebooks.

By Friday: Read "Version Control," "Debugging," *Pragmatic Programmer* pp. 84-97 (on Canvas); sign up for digital poster topic

By Next Monday: Complete homework 2

Week 3: Fluent Python
(Monday, September 16, 2024 - Friday, September 20, 2024)

Learning Outcomes: Understand and apply advanced python concepts (without pandas)

On Wednesday: Posters 1-2

On Friday: Quiz 1

Week 4: Tabular Data
(Monday, September 23, 2024 - Friday, September 27, 2024)

Learning Outcomes: Explain the philosophy of tabular data analysis in Python, pandas startup

On Wednesday: Posters 3-4

By Friday: Read "The Essence of Good Design," "DRY," *Pragmatic Programmer* pp. 28-39

By Next Monday: Complete homework 3

Week 5: Tabular Data
(Monday, September 30, 2024 - Friday, October 04, 2024)

Learning Outcomes: Use pandas to filter, subset, aggregate, arrange, and mutate rectangular data

By Wednesday: Posters 5-6

By Friday: Read "Orthagonality," "Reversibility," *Pragmatic Programmer* pp. 39-50

By Next Monday: Complete homework 4

Week 6: Tabular Data
(Monday, October 07, 2024 - Friday, October 11, 2024)

Learning Outcomes: Use pandas to filter, subset, aggregate, arrange, and mutate rectangular data

On Wednesday: Posters 7-8

On Friday: Quiz 2

Week 7: Tabular Data
(Monday, October 14, 2024 - Friday, October 18, 2024)

Learning Outcomes: More advanced pandas operations

By Wednesday: Read "Decoupling," *Pragmatic Programmer* pp. 130-137

On Wednesday: Posters 9-10

By Next Monday: Complete project 1

Reminder: No class Friday - Fall Break

Week 8: Review and Synthesis
(Monday, October 21, 2024 - Friday, October 25, 2024)

Learning Outcomes: Prepare for and complete midterm exam

On Wednesday: Posters 11-12

On Friday: Midterm exam

Week 9: Relational Data
(Monday, October 28, 2024 - Friday, November 01, 2024)

Learning Outcomes: Understand theory of relational data, entities, relationships, data types, basics of SQL syntax

On Wednesday: Posters 13-14 (if needed)

By Friday: Read "Refactoring," "Test to Code," *Pragmatic Programmer* pp. 209-224

By Next Monday: Complete homework 5

Week 10: Relational Data
(Monday, November 04, 2024 - Friday, November 08, 2024)

Learning Outcomes: Practice using SQLite/python to perform single-table data analysis operations

On Friday: Quiz 3

Week 11: Relational Data
(Monday, November 11, 2024 - Friday, November 15, 2024)

Learning Outcomes: Practice using SQLite/python to perform multi-table data analysis operations

By Friday: Read "Naming Things," Requirements Pit," *Pragmatic Programmer* pp. 238-252

By Next Monday: Complete project 2

Week 12: Hierarchical Data
(Monday, November 18, 2024 - Friday, November 22, 2024)

Learning Outcomes: Introduce hierarchical data concepts through webscraping

On Friday: Quiz 4

Week 13: Thanksgiving Break
(Monday, November 25, 2024 - Friday, November 29, 2024)

Reminder: No class

Week 14: Hierarchical Data
(Monday, December 02, 2024 - Friday, December 06, 2024)

Learning Outcomes: Practice using json/python to traverse json files (local storage, APIs)

On Friday: Quiz 5

Week 15: Data Systems Synthesis
(Monday, December 09, 2024 - Friday, December 13, 2024)

Learning Outcomes: Understand paradigmatic use cases for tabular, relational and hierarchical data

By Friday at Midnight: Complete project 3

Week 16: Exam Week
(Monday, December 16, 2024 - Friday, December 20, 2024)

Reminder: Final exam, in person, Wednesday, Dec. 18, 2:00-4:00 p.m.