Historical Data in the Digital Age

This week in Digital History, we are examining information in the Digital Age and, in particular, how it will affect the historical record.

With all of the information that is being generated, stored and preserved digitally, historians of the future will no doubt be faced with more material than ever before (although, as Stuart Fox argues, the historical record will never be complete, because digital material will likely always be subject to a number of legal restrictions).

In order to deal with this massive amount of information, historians are starting to utilize new tools and methods to help them sift through the material. For example, with data mining, researchers can extract text from digital sources and quickly analyze  the data to look for patterns, trends or changes. After looking at a few examples of data mining, through William Turkel’s examination of the Dictionary of Canadian Biography and Dan Cohen’s analysis of language in Victorian literature, we were encouraged to try a few data mining tools out ourselves.

The first such tool I tried was TAPoR.  Unfortunately, due to my data mining immaturity or the tool’s unaddressed bugs, or some combination of the two, I was unable to return any valuable research.  I could do a few basic things, like calculate how many times a word occurred in the text, but anytime I tried a more complex query, I would be faced with an error.  It seemed as though I was not quite ready for TAPoR, so I moved on to the next option.

The next recommended tool was Time Magazine Corpus. This tools allows you to run a variety of text analyses on Time magazines from 1923-2010.  After my failed attempt at TAPoR, I decided to start small.  I plugged in the word “war” into the chart option and was given a fairly predictable graph.  As one would suspect, the occurrences of the word war increased throughout the 1920s and 1930s and peaked in the 1940s, after which it declined sharply into the 1950s and didn’t increase again until the 1990s / 2000s.

For a slightly more complex, although equally predictable query, I examined the collocates around the word “travel” in the magazines from the 1930s and the 1960s.  I was presented with two charts of words that essentially demonstrated the types of adjectives used to describe the word travel in the respective decades:

In the first chart, from the 1930s, you see that such words as “railroad,” “long,” and “costs” were closely associated with the word “travel.”  By the 1960s, of course, travel is much different.  Here, you see words such as “jet,” “interstate,” and even “space” associated with the word “travel.” Although this isn’t a ground-breaking revelation, it is very exciting to be able to run queries on the text of all of the Time magazines since the 1920s.

Of course, it should not be the historian’s goal to use such methods to test or prove existing scholarship on a matter (and to do so would be to privilege scientific or empirical methods over qualitative ones).  As many digital historians or digital humanists will tell you, scholarly research is changing and it’s not enough for us to try to make new tools fit into our existing methodologies.  We now have a brand new type of historical evidence and we need to find new questions and new methods to go along with it.


3 thoughts on “Historical Data in the Digital Age

  1. I find it very interesting that instances for “war” were not higher throughout the middle of the last century considering Americans were at war in Korea and Vietnam between the 1950s and 1970s.

    Great blog! : )

    • Good point, Kira. Maybe that’s an example of the new types of questions we will be faced with when working with this type of analysis. How did American rhetoric surrounding war change over time? What, if any, other words were being used in replace of the word “war?”

  2. Pingback: The Importance of Self-Reflection « Virtual Voice

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s