Topic Modelling

Distant readings are already a valuable part of the historical tradition, though not as much in the historiographical tradition, as discussed in Guldi’s work “The History Manifesto”[1]. Distant readings allow for broader stroke impressions to be informed by a wealth of relevant data points and produce unique analyses. Digital humanities as its own field has an ongoing tradition of producing distant readings of vast academic corpus’ within the field of literature studies, including the early work of Milligan on the history of the web using a digitally published web corpus [2]. Digital humanities academics have used topic modelling to read distantly, as it allows them to break down and analyze thousands of works. The data is sorted based on word frequency, and visualization models are coded and produced online to showcase any relevant patterns to be analyzed.

Academics such as Ramsay [3], Underwood and Goldstone [4], and Heuser and Le-Khac [5] have produced works exemplary of this topic modelling process to study vast corpus’ of writing. In the last case, the idea was to produce a “tabula rasa” analysis so that any patterns that emerged would be solely connected to the data. Some academics disparage this goal of finding objective meaning. Liu argues that this approach necessarily relies on human labour and human data, and therefore its claims of a “tabula rasa” approach are flawed. Moreover, Liu takes issue with the idea that topic modelling can ever produce meaning. This is due to both overarching issues with the field’s relationship to the concept of meaning, and due to the fact that the false objectivity that the machine produces must be supplemented with human interpretation.

Goldstone and Underwood, for example, reinterpret this ideal of the tabula rasa of machine-based topic modelling to contend with Liu’s arguments. They argue that topic models produced by machines are not supposed to produce blank slate objective results. They also reject Liu’s arguments that the necessary, symbiotic collaboration of human and machine is a weakness; human intervention and interpretation is instead strengthened by the broader vision and capabilities of machines. This relates to the value of distant reading championed by Guldi in terms of history.

In the examples discussed, the corpus’ analyzed with topic modelling tools were primarily focused on the field of literature studies. A public history approach studying historiography will produce different challenges and different results. The current conversation in the academic field of public history will need to be taken into account, as well as its counterparts in history to provide a counterpoint.

[1] Guldi, Jo and Armitage, David. The History Manifesto. Cambridge: Cambridge University Press, 2014.

[2] Milligan, Ian. History in the Age of Abundance: How the Web is Transforming Historical Research. Montreal: McGill-Queen’s University Press, 2019.

[3] Ramsay, Stephen. “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” In Pastplay: Teaching and Learning History with Technology, edited by Kevin Kee, 111-120. Ann Arbor: University of Michigan Press, 2014.

[4] Goldstone, Andrew, Underwood, Ted. “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us.” New Literary History 45, no. 3 (Summer 2014): 359-384.

[5] Heuser, Ryan, Le-Khac, Long. “Learning to Read Data: Bringing Out the Humanistic in the Digital Humanities.” Victorian Studies 54 no. 1 (Fall 2011): 79-86.

Last updated