Gateway Drugs, Statistical Analysis, and Text Mining

Doing the readings on data mining for this week, I got a little sidetracked thinking about Professor Cohen’s analysis that “Digital Humanities needs gateway drugs. Kudos to the pushers on the Google books team.”  Personally, I could not agree more.  I don’t know if this has happened to any of my classmates, but the advanced search on Google Books has saved me countless days in waiting for the interlibrary loan of a particularly hard to find book. I could come up with many more examples, but that was probably the best example of the “gateway drug.”  If only I had gotten hooked sooner, this class would probably not give me so many sleepless nights.  Where I have a slight disagreement with Dan is that I don’t think the conversion process will be as simple as addictive gateway drugs—though admittedly it will make the process much easier, especially in the field on Civil War Era Studies because it seems that something in that area is always one of the first test runs for an advancement in the digital humanities.  Nevertheless, I wonder if any “gateway drug,” no matter how useful or idiot-proof it may be, will be strong enough to overcome what is traditionally a stubborn discipline.  In the interest of full disclosure, this particular post was influenced by a conversation I had a short time back at the Lincoln Cottage.  This one fairly well-known Lincoln scholar was present to do a book signing and lecture, and I made the mistake of remarking to him that expense and time consumed by the research for this book must have been made considerably less by the Chronicling America website.  Apparently, this remark was a mistake.  It produced a several minute response wherein the scholar detailed, among other things, that looking at primary sources online “did not constitute real historical scholarship,” and that many of his colleagues “felt exactly the same way.” Against my better judgement (which was screaming do not bite the hand that does the book signing) I responded that while nothing was a substitute for archival research, was there really that great a distance between viewing a series of newspaper articles on the LOC website and looking at the same articles on microfilm? Without getting to graphic, lets just say that this line of argumentation did not win over this particular historian.  I know that we can’t use cantankerous historian as our entire sample, but I feel like this interaction was emblematic of why the problem with the perception of digital history from within the discipline is too large to be solved by gateway drugs.  This was what some of the readings for last week outlined as well, the problem is that a significant portion of the discipline needs a change in mindset, and while useful tools will always help the fight, I’m not sure they will carry the day entirely.

Now that my ruminations on the mindset of the history discipline have entirely taken over my blog post, allow me to return to the actual readings for this week, specifically Franco Moretti’s Graphs, Maps, Trees; Abstract Models for Literary History. Moretti, if I have at least a part of the meaning of his work right (and please feel free to disagree) proposes to change the blueprint of how we read.  For Moretti, when novels are represented using quantitative means and methods, it becomes possible to chart their broader impact in a new and different way. For example, treating Mary Mitford’s Our Village (volume 1) as a map allows for a powerful visual representation of how “Mitford reverses the direction of history, making her urban readers (Our Village was published by Whitaker, Ave-Maria-Lane, London), look at the world according to the older ‘centered’ viewpoint of an unenclosed village.  And the key to this perceptual shift lies in Mitfords most typical episode: the country walk.  In story after story, the young narrator leaves the village, each time in a different direction, reaches the destinations charted, then turns around and goes home” (Moretti, 39).  Here Moretti is undoubtedly correct: making this point visually on a map is much more effective than using words.  However, some issues related to the transparency of his methodology raise questions about his work.  For example, in his “Graphs” chapter Moretti lists all the secondary sources he scoured to provide the data for his graphs, but we have no way of knowing whether the figures in those work contain any important caveats that might make his graphs less compelling or definitive than they seem.  Additionally, we have no way of checking whether Moretti’s statistical analysis (or his simple math) is accurate, an important consideration when you consider all the problems historians have had with their own statistical samples.  Perhaps most importantly, Moretti proceeds from the false premise that “quantitative data are useful because they are independent of interpretation”(Moretti, 30). Behind the independence of those numbers are the questions we ask to get the numbers, the numbers we choose to present, and how we choose to present those numbers.  Thus, quantitative data are not as “independent of interpretation” as Moretti seems to believe, and we can question his results based on flaws in that premise. (Moretti, 30).



3 responses to “Gateway Drugs, Statistical Analysis, and Text Mining

  2. I am so glad you pointed on the “gateway drug” comment from Dan Cohen’s article and shared your thoughts on it. Your anecdote perfectly highlighted why it’s going to take more than some addictive interfaces to change thinking about an entrenched methodology.

    Maybe part of the problem is that beyond a gateway drug there is not necessarily anything hard enough to get us truly and dependently hooked. What is our digital heroin?

    • As much as I love the phrase “digital heroin,” I am having trouble imagining a drug that will get the more cranky members of our discipline hooked. My best guess is that it will have come from someplace like the LOC (there digitized newspapers were/are a gateway drug for me), not because of any sort of superior technological expertise on their part, but because it has to be a place that has the most established archival chops in the discipline in order to chip away at the condescension from some of our more stubborn senior colleagues…

