As a literary scholar, I tried to teach myself statistics several times and tbh didn't have the best time of my life. Lots of textbooks are good, but irrelevant, that don't even scratch the problems we're dealing with
— Artjoms Šeļa (@artjomshl) September 27, 2020
So what is *the* humanities stats book for you? Is there one?
If you're an aspiring or practicing scholar of digital humanities or cultural analytics (or anything in between), you may find yourself wondering from time to time how you might advance your understanding of statistical concepts and/or statistical applications. Recently, Artjoms Šeļa asked the above question on Twitter. This version of the question spefifices a topical focus of literaery studies, and a desire to learn on one's own, but I think a broader version of this question is implied.
How to learn more about statistics as a humanist is a great question, and an intriguing one, but it's also complicated. As a book historian who became a DH teacher/scholar/practitioner and then moved into a huanities-focused role in Denison University's data analytics program, I've put a lot of thought into this topic. I've had little exposure to math and stats by way of my formal education, but I find myself teaching students quantitiative analysis methods almost every day. I've learned most of what I know from my relationships with friends and colleagues, as well as reading and studying autonomously, and I believe that my idiosyncratic background has been a strength in several ways.
With all this in mind, I'm going to start writing some notes about humanities and statistics, and I will do my best to update these notes from time to time.
To start, I don't think there's such a thing as the humanities stats book. Some books are better, more accessible, or more relevant than others, but there's such tremendous variety in DH that narrowing this subject to a single book (or even a single book per type of scholar) seems unlikely. That said, I do think there are several "must-read" books for specific areas of thought/study, but some of them have barriers to entry that will require you to know certain things before you will be able to understand them. Related, I have found that humanists predictably conflate what I view as sometimes overlapping but ultimately distinct competencies, especially the following:
Item 3 can be further broken down into the skills of:
Like many humanists, I think it's crucial to put pressue on these skills areas by asking questions about how standards came to be, and who created them. The ubiquity of the p-value in statistical analysis, for example, has a specific history that needs to be discussed when we talk about what a p-value is, and why it's important.
We should interrogate concepts like "rigorous science" & "rigorous research" for their gender, class & racial bias...who and what defines rigor?
— Kim Gallon (@BlackDigitalHum) September 27, 2020
That said, as humanists, we can also understand that over the last 150 years or so, statisticians have done some work that it behooves us to try and understand. This is a much deeper topic than I have the time or patience to take on here, but I want to signal this premise, since it's a factor in everything else I've begun to assemble below.
I'm not going to try and reinvent the wheel on this topic, but I think it's crucial to remind yourself that code isn't math and math isn't code, though they are deeply intertwined. In all cases that I can think of, when you run code, some kind of math is happening, but lots of what you do when you learn to code has little or nothing to do with math. At the same time, learning the ideas behind mathematical operations won't teach you how to run an analagous method in a programming language and, in turn, it's surprisingly easy to learn to do certain tests in R or Python without understanding what they do, or what the results might mean.
"... many of us ... went into the humanities in part because we found algebra uncongenial and calculus unkind." Michael J. Suarez, on how many book historians react to collation formulae.
One problem here is humanists dislike mathematics or get frustrated because there seems to be some tacit knowledge needed to understand a book or article, and we can't even figure out what to search for online. This has been my experience on many occasions. One way of working around this problem is to start at the most introductory level imaginable, but the downside of this is encountering (mostly) concepts you already know about, such as the difference between mean and median, or the importance of normal distribution.
One of the things I struggled with with "Six Septembers" was connecting the concepts back to how they might be applicable to DH methods. I know being applied was NOT a goal of the book, but I lacked the background to even imagine what kinds of methods some things related to.
— Quinn Dombrowski (@quinnanya) September 27, 2020
Related to the frustration of feeling like a fish out of water is the problem of relevance, broadly conceived.
Just got this one and haven't read it yet, but it looks very promising.
Just got this one, too!
A colleague of mine recommended this textbook, but I haven't read it yet. The PDF is available for download online, free of charge.
A bit out of date in terms of the database tools it mentions, in particular, but there are excellent chapters here on sampling, hypothesis testing, single-variable methods (descriptive & inferential), two-variable methods, and multi-variable methods. Organized around and explains definitions of nominal, ordinal, continuous, and categorical variables. Also includes discussion of chi-squared tests, regression models, z-scores and z-tests, etc. Addresses humanities relevance directly but is focused on historical analysis.
I'm currently evaluating this for use in a future class.
This book is specific to text analysis, and it has a strong social sciences bent, but I find it very relevant to how I think about text analysis in DH. I suspect that the chapters on things like research design would be beneficial to a broad range of scholars, not just those who do text analysis.
Covers topics such as statistical significance, statistical power, p-values, p-hacking & data dredging, replicability, and transparency. Accessible and full of important information, but will raise questions of relavance to the humanities.
This is our introductory textbook for DA 101, and I find it quite accessible and effective. It emphasizes the tidyverse in R for data visualization, data wrangling/tidying, modeling, etc. and has comparatively little on the deeper mathematics behind key concepts. Focuses on exploratory data analysis, with rectangular data.
I'm currently evaluating this for use in a future class.