The Trees through the Forest

Working with the data becomes more complicated the more data I use

Currently, I have 6 data sets downloaded. All are archived journal articles published through the Public Historian Journal. The first one contains each and every published article from 1970 to 2020. The subsequent five each contain the articles published in the decades from 1970 to 2020. This way, smaller scale trends over time can be analyzed and compared.

For example, in 2001, and especially in 2002, articles discussing "colonialism" and "trauma" increased from previous years. In the decade analysis of articles published from 1990-2000, trauma was negligibly discussed, and colonialism peaked at 22% in 1997 before edging back downwards. In 2001, the frequency of the term "trauma" spiked to 10%, going up to 14% in 2002, and maintaining those numbers until 2004, when it dropped back down to negligible. "Colonialism" saw a similar rise from 17% in 2000, all the way up to 40% and 37% in 2001 and 2002 respectively. "Memory" as a term started fairly high, but also doubled from 2000 to 2001.

My point in raising this data point is, as is much with this project, both philosophical and methodological.

There is something incredibly interesting at taking these small points, looking at when word frequency - from the typical to the incredibly niche - sky rockets, and thinking about why. What in the culture facilitated this? Some examples are easier to explain than others. Articles about trauma, memory, and colonialism spiked in 2001 after 9/11 because North America was grappling with the military and political consequences of that event, and it rebounded into academia. The culture as a whole was looking outwards and inwards to ask itself questions about the role that collective trauma played in society, and the role that colonialism played in the past and that it was playing in their present. They were looking to historical instances of memory and trauma, and were writing about the way they were interacting now. Articles were written about the role of that event in the public consciousness, and the way its narrative in history was being written in the present.

It's an interesting bunch of trees, but the methodological point comes in and cuts to the quick here.

Even though the charts depicting the frequency of these terms at these dates is fascinating, and paints an interesting, albeit expected, picture of the academic culture of public history at that time, the ratios themselves are interesting. Here is what the number of published articles by year looks like in 1990-2000:

The number of pubslihed articles by year in 2000-2010 reveals something startling:

Despite the fact that the number of published articles increased decade to decade, in the years 2001 and 2002 incredibly low numbers of articles were published, only 155 in 2001 and 1005 in 2002, compared to 397 in 2004. So when the terms "colonial" and "memory" and "trauma" were referenced with higher frequency in 2001 and 2002, this has to be understood in the greater context of fewer articles being published those years.

When these term frequencies are examined through the number of total documents as opposed to as a percentage, in each case more instances are recorded in 2004. Despite the fact that those articles were of more importance in the academic consciousness of 2001-2002, more was said about them by volume later on.

When I compared these two versions of the data points, what struck me most clearly was where the analysis in version one would lead when applied to version two. Academics were writing about trauma and colonialism because they were grappling with this historic and impactful event that dealt with those topics. But at the same time, comparatively, during this historic and impactful time, academics simply were not writing. Or perhaps were not being published.

The forest paints a different picture.

PreviousIntroduction NextThe Big Questions

Last updated 4 years ago