The Beginning - A Very Good Place To Start

My Major Research Project seems both too big and too small. The idea is to present a holistic view of the Public History field over time, tracking the topics that participating academics write on, in order to understand the philosophical direction the field has moved in. Despite such a lofty goal, the medium is as small as it gets.

Words.

Single, individual words. Every article, book, and book chapter written in the field is comprised of then, and though so many Public History works rely on other additional mediums, there is no escaping the written word's ability to convey meaning. I plan on looking at all of them.

It's conceptually easy to do so. Digital tools, like the digital workbench and the pilot project of JSTOR Data For Research, allow you to download wholesale thousands upon thousands of articles published online. You can search by keyword, publishing journal, topic or year and download the results. Scrambled, of course, but within a couple hours of work thousands of articles, the result of even more thousands of hours of research and writing over decades in this field, are at my fingertips.

The next step is where things get a little messy.

I have all the words, but now what to do with them? Luckily, I am not the first to do this.

Literature review is not a new practice, though its utility in the digital age has skyrocketed. Academics such as Ramsay [1], Underwood and Goldstone [2], and Heuser and Le-Khac [3] have done this to, with varying topics and goals and varying results. It is most commonly done in the field of English and literature studies. The whole corpus of an author or a collection can be viewed in tandem to identify patters of word usage, trends over time, and to look at thematic elements from as broad a perspective as possible. They had different results, and different opinions on the matter.

Ramsay and Goldstone/Underwood champion the practice as a new and exciting forefront of digital humanities and the field of literature studies. The ability to look at so many works all at once is enticing. It reveals broad strokes using the smallest of brushes. Liu [4], though, has doubts. Specifically, his doubts come through because of the vehicle of analysis, as opposed to strictly the theoretical approach.

The question comes in, posed in the subtitle of Ramsay's work: what do you do with a million books? It's a startling question. You can collect all the works in the world, based on any topic or corpus of work or collection, and I have, but what now? And why?

The answer to the first is topic modelling; the second, I hope, will come up much, much later.

Topic modelling is a digital humanities technique in which a collection of words is analyzed to detect frequencies and even word relationships, depending on how the data is presented to it. From there, you can code into this model which words to exclude, or how many topics you want generated. You set the rules, it does the math. It could be a way to objectively view topic trends in a given work, and from there you can identify underlying focuses and biases that might be hidden. That's at least part of what I plan to do. But, to Liu, by setting the rules for the code to do the math, we're still choosing the results. There's something to this. By influencing the machines to do their work with my own human expectations and biases, the results are much less objective.

But there's something to that as well. Maybe I choose a bunch of different rules, I run the topic modelling scheme as many times as I can think of variations, to see what comes out. I average them against each other, or maybe just view them side by side to see what differences emerge. Maybe the trends and underpinning academic focuses are not just in the topics generated, but by the differences between different topics.

The why is a different beast. What is this for? Why is this interesting? What makes this worthwhile? I have my thoughts, and I hope to gain new ones as I put fingers to keys to code. I think this field has been growing and changing for decades now, and though not everything about it is contained in published books and journals, a snapshot of the whole field is. I think it will be worthwhile to take a step back, to paint a picture of the field in the broadest strokes but with the smallest brush possible. Or maybe in the narrowest strokes with the biggest brush I can find. Probably both.

[1] Ramsay, Stephen. “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” In Pastplay: Teaching and Learning History with Technology, edited by Kevin Kee, 111-120. Ann Arbor: University of Michigan Press, 2014.

[2] Goldstone, Andrew, Underwood, Ted. “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us.” New Literary History 45, no. 3 (Summer 2014): 359-384.

[3] Heuser, Ryan, Le-Khac, Long. “Learning to Read Data: Bringing Out the Humanistic in the Digital Humanities.” Victorian Studies 54 no. 1 (Fall 2011): 79-86.

[4] Liu, Alan. “The Meaning of the Digital Humanities.” PMLA 128, no. 2 (March 2013): 409-423.

Last updated