Introduction
The Public Historian Corpus Dataset
Last updated
The Public Historian Corpus Dataset
Last updated
If you'd like to play along at home, the dataset I built can be accessed here.
Just under 10,000 articles were collected and jumbled all together for analysis, with 25 million words assembled for review. The Constellate dataset builder also has a few example visualization tools built in, that provided some immediate interesting results. You can enter terms into the search bar and it will generate a new chart depicting frequency changes over time, and will compare the terms together if you enter multiple.
For example, the term "archive" ticks up after 1960, and hovers at around 3% until the year 2000, at which point it upticks to 11% and continues rising up to 28% in 2020.
A few starting disclaimers: there are no data points originating earlier than 1960, and many public history articles only begin appearing after 1970, which lines up with the development of the field. Digital terms do not come into frequent use earlier than 1990, mostly only after 2000. I can analyze based on frequency in a percentage of documents, or based on frequency in a number of total documents. I will generally work in percentages, and will state where I differ directly otherwise.
There are immediately a few interesting trend relationships. The term exhibit slowly and steadily ticks up from 1960, rising to an all time high of 47% around 2000 and stabilizing in the 40s. The term "digital" first appears at 1.9% in 1990, and starts rising quickly after 2000 to reach 42% in 2020. But "digital exhibit" only appears on the scene in 2010, and has only risen to 7% in 2020. Obviously not all articles discussing the digital are discussing exhibits as well, and not all articles discussing exhibits will bother to mention the digital. But the fact that both terms have been around and frequently discussed for 20 years at least, but that the latter term has only very recently, and barely, been a point of discussion is interesting. Perhaps the field has not taken into account the full ability of the digital space to curate exhibits, and is only recently started to examine this potential.
This same problem occurs but with a different, perhaps more socially impactful set of words. The term "memory" starts at 1960 and rises quickly. It spikes in 1990 and increases from 19% to 78% in 2020. The term colonial has a similar trend of rising from 1960, then starting to increase in 1990, but only goes up from 16% to 35% in 2020. Still a dramatic and connected rise. Interestingly, "trauma" as a term remains unconnected from this trend, and spikes at all time high of 7% in 2020. Clearly, the field remains uninterested or only recently interested in the effect of colonialism and trauma compared to memory, but, in the last decade, colonial studies have been on an increased rise.