This week in Digital History, we are examining information in the Digital Age and, in particular, how it will affect the historical record.
With all of the information that is being generated, stored and preserved digitally, historians of the future will no doubt be faced with more material than ever before (although, as Stuart Fox argues, the historical record will never be complete, because digital material will likely always be subject to a number of legal restrictions).
In order to deal with this massive amount of information, historians are starting to utilize new tools and methods to help them sift through the material. For example, with data mining, researchers can extract text from digital sources and quickly analyze the data to look for patterns, trends or changes. After looking at a few examples of data mining, through William Turkel’s examination of the Dictionary of Canadian Biography and Dan Cohen’s analysis of language in Victorian literature, we were encouraged to try a few data mining tools out ourselves.
The first such tool I tried was TAPoR. Unfortunately, due to my data mining immaturity or the tool’s unaddressed bugs, or some combination of the two, I was unable to return any valuable research. I could do a few basic things, like calculate how many times a word occurred in the text, but anytime I tried a more complex query, I would be faced with an error. It seemed as though I was not quite ready for TAPoR, so I moved on to the next option.
The next recommended tool was Time Magazine Corpus. This tools allows you to run a variety of text analyses on Time magazines from 1923-2010. After my failed attempt at TAPoR, I decided to start small. I plugged in the word “war” into the chart option and was given a fairly predictable graph. As one would suspect, the occurrences of the word war increased throughout the 1920s and 1930s and peaked in the 1940s, after which it declined sharply into the 1950s and didn’t increase again until the 1990s / 2000s.
For a slightly more complex, although equally predictable query, I examined the collocates around the word “travel” in the magazines from the 1930s and the 1960s. I was presented with two charts of words that essentially demonstrated the types of adjectives used to describe the word travel in the respective decades:
In the first chart, from the 1930s, you see that such words as “railroad,” “long,” and “costs” were closely associated with the word “travel.” By the 1960s, of course, travel is much different. Here, you see words such as “jet,” “interstate,” and even “space” associated with the word “travel.” Although this isn’t a ground-breaking revelation, it is very exciting to be able to run queries on the text of all of the Time magazines since the 1920s.
Of course, it should not be the historian’s goal to use such methods to test or prove existing scholarship on a matter (and to do so would be to privilege scientific or empirical methods over qualitative ones). As many digital historians or digital humanists will tell you, scholarly research is changing and it’s not enough for us to try to make new tools fit into our existing methodologies. We now have a brand new type of historical evidence and we need to find new questions and new methods to go along with it.