Research and Teaching Notebook


Humanities Stats Books, Quantitative Methods, and Statistical Intuition


If you're an aspiring or practicing scholar of digital humanities or cultural analytics (or anything in between), you may find yourself wondering from time to time how you might advance your understanding of statistical concepts and/or statistical applications. Recently, Artjoms Šeļa asked the above question on Twitter. This version of the question spefifices a topical focus of literaery studies, and a desire to learn on one's own, but I think a broader version of this question is implied.

How to learn more about statistics as a humanist is a great question, and an intriguing one, but it's also complicated. As a book historian who became a DH teacher/scholar/practitioner and then moved into a huanities-focused role in Denison University's data analytics program, I've put a lot of thought into this topic. I've had little exposure to math and stats by way of my formal education, but I find myself teaching students quantitiative analysis methods almost every day. I've learned most of what I know from my relationships with friends and colleagues, as well as reading and studying autonomously, and I believe that my idiosyncratic background has been a strength in several ways.

With all this in mind, I'm going to start writing some notes about humanities and statistics, and I will do my best to update these notes from time to time.

To start, I don't think there's such a thing as the humanities stats book. Some books are better, more accessible, or more relevant than others, but there's such tremendous variety in DH that narrowing this subject to a single book (or even a single book per type of scholar) seems unlikely. That said, I do think there are several "must-read" books for specific areas of thought/study, but some of them have barriers to entry that will require you to know certain things before you will be able to understand them. Related, I have found that humanists predictably conflate what I view as sometimes overlapping but ultimately distinct competencies, especially the following:

  1. Learning to Code
  2. Learning Statistical Concepts
  3. Bringing Statistical Knowledge to Bear on Humanities Inquiry

Item 3 can be further broken down into the skills of:

  1. General Wayfinding, and Evaluating the Relevance of a Statistical Concept to Your Problem
  2. Designing Statistically Rigorous Humanities Research
  3. Selecting Measures, Tests, and Models for Analysis
  4. Running Your Analysis with Software or a Programming Language
  5. Understanding and Interpreting Results
  6. Avoiding Well-Known Mistakes or Poor Practices

Like many humanists, I think it's crucial to put pressue on these skills areas by asking questions about how standards came to be, and who created them. The ubiquity of the p-value in statistical analysis, for example, has a specific history that needs to be discussed when we talk about what a p-value is, and why it's important.

That said, as humanists, we can also understand that over the last 150 years or so, statisticians have done some work that it behooves us to try and understand. This is a much deeper topic than I have the time or patience to take on here, but I want to signal this premise, since it's a factor in everything else I've begun to assemble below.

Learning to Code

I'm not going to try and reinvent the wheel on this topic, but I think it's crucial to remind yourself that code isn't math and math isn't code, though they are deeply intertwined. In all cases that I can think of, when you run code, some kind of math is happening, but lots of what you do when you learn to code has little or nothing to do with math. At the same time, learning the ideas behind mathematical operations won't teach you how to run an analagous method in a programming language and, in turn, it's surprisingly easy to learn to do certain tests in R or Python without understanding what they do, or what the results might mean.

Learning Statistical Concepts

"... many of us ... went into the humanities in part because we found algebra uncongenial and calculus unkind." Michael J. Suarez, on how many book historians react to collation formulae. 

One problem here is humanists dislike mathematics or get frustrated because there seems to be some tacit knowledge needed to understand a book or article, and we can't even figure out what to search for online. This has been my experience on many occasions. One way of working around this problem is to start at the most introductory level imaginable, but the downside of this is encountering (mostly) concepts you already know about, such as the difference between mean and median, or the importance of normal distribution.

Bringing Statistical Knowledge to Bear on Humanities Inquiry

Related to the frustration of feeling like a fish out of water is the problem of relevance, broadly conceived.

A Few Books (by No Means Comprehensive)

Bruce, Bruce, and Gedeck, Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python, 2nd Edition, O'Reilly, 2020

Just got this one and haven't read it yet, but it looks very promising.

Grus, Data Science from Scratch: First Principles with Python, 2nd edition, O'Reilly, 2019

Just got this one, too!

James, Witten, Hastie, and Tibshirani, An Introduction to Statistical Learning with Applications in R, Springer, 2014/h4>

A colleague of mine recommended this textbook, but I haven't read it yet. The PDF is available for download online, free of charge.

Jarausch and Hardy, Quantitative Methods for Historians, UNC Press, 1991

A bit out of date in terms of the database tools it mentions, in particular, but there are excellent chapters here on sampling, hypothesis testing, single-variable methods (descriptive & inferential), two-variable methods, and multi-variable methods. Organized around and explains definitions of nominal, ordinal, continuous, and categorical variables. Also includes discussion of chi-squared tests, regression models, z-scores and z-tests, etc. Addresses humanities relevance directly but is focused on historical analysis.

Hothorn and Everitt, A Handbook of Statistical Analyses Using R, Taylor & Francis, 2014

I'm currently evaluating this for use in a future class.

Ignatow and Mihalcea, An Introduction to Text Mining: Research Design, Data Collection, and Analysis, Sage, 2018

This book is specific to text analysis, and it has a strong social sciences bent, but I find it very relevant to how I think about text analysis in DH. I suspect that the chapters on things like research design would be beneficial to a broad range of scholars, not just those who do text analysis.

Reinhart, Statistics Done Wrong William Pollock, 2015

Covers topics such as statistical significance, statistical power, p-values, p-hacking & data dredging, replicability, and transparency. Accessible and full of important information, but will raise questions of relavance to the humanities.

Wickham and Grolemond, R or Data Science O'Reilly, 2017

This is our introductory textbook for DA 101, and I find it quite accessible and effective. It emphasizes the tidyverse in R for data visualization, data wrangling/tidying, modeling, etc. and has comparatively little on the deeper mathematics behind key concepts. Focuses on exploratory data analysis, with rectangular data.

Verzani, Using R for Introductory Statistics, 2nd edition, Taylor & Francis, 2014

I'm currently evaluating this for use in a future class.

  1. General Wayfinding, and Evaluating the Relevance of a Statistical Concept to Your Problem
  2. Designing Statistically Rigorous Humanities Research
  3. Selecting Measures, Tests, and Models for Analysis
  4. Running Your Analysis with Software or a Programming Language
  5. Understanding and Interpreting Results
  6. Avoiding Well-Known Mistakes or Poor Practices

Last Updated:

September 28, 2020