Tag Archives: Voyant Text Analysis

Voyant Analysis

Having only previously read one of Doyle’s Sherlock Holmes works, the novel The Hound of the Baskervilles, I felt compelled to read the first novel, A Study in Scarlet. I initially began reading within Voyant, but found the tool too clumsy and annoying for “conventional” reading. Unless you’ve scrolled through the entire novel to cache it, the text won’t load fluidly as you read. The tool waits until you scroll, and then pauses for an annoying moment to load a new section, then jerkily jumps to a seemingly arbitrary spot, forcing you to find where you left off reading. And any inadvertent click on a word momentarily freezes the tool as it highlights that word throughout the entire text – and then you have to go and enter nothing into the search bar to clear the highlight. Maybe I’m inept and impatient, but I switched to a Project Gutenberg version about 1/3 into the novel. Voyant is clearly not meant to be used as a sophisticated text reader – it’s only for statistical text analysis and the like. (It also doesn’t appear to preserve text formatting like italics, though that might have just been a problem with the source text.)

My Voyant attempts at “surfing and stumbling” through A Study in Scarlet weren’t particularly more successful. I don’t think there’s much value in trying to figure out the contents of a single short novel that can be read a few hours by monitoring fluctuating word occurrences, though the graphs are fun to look at, I suppose. The best part about Voyar is the word frequency count that appears when you hover the cursor over words. It’s neat to instantly see how often different words are used – and what words are only used once. I noticed “brownish” appears three times, something you wouldn’t otherwise notice. I’m sure it’s relevant somehow.

The possibilities for comparing a larger body of works are much broader. Unfortunately I was disappointed to see that the Voyant link to the “entire Sherlock Holmes corpus” contained only 36 short stories – well under the 56 stories that are undisputedly canon, as well as missing the 4 novels (even A Study in Scarlet.)

 

Assignment 4

Voyant is a text analysis tool that used to analyze the word composition of a text. It uses a variety of information visualizations to show quantitative and qualitative data about each unique words within the text. I used it to run analysis on five different Sherlock Holmes stories which are all eighteen pages long. They are The Adventure of the Nobel Bachelor, The Adventure of the Beryl Coronet, The Man with the Twisted Lip, The Boscombe Valley Mystery, and The Red-Headed League.

The initial result of the analysis show that all five stories have around two thousand unique words. In addition, the word “the”, “and”, “to”, “I”, “a” , and “of” are the most frequent words used in each individual stories. The frequency count of those words within each story are approximately the same. All of these words are within the top six mostly used English words with the exception of the word “I”(http://www.duboislc.org/EducationWatch/First100Words.html ). This may be a result of the author’s inspiration for writing the Sherlock Holmes stories. The author once said that the character of Sherlock Holmes was inspired by Dr. Joseph Bell who he had worked for as a clerk at the Edinburgh Royal Infirmary. In the story, the author tells the story from the perspective of Holme’s assistant Dr. Watson. Therefore the frequent usage of the word “I” is the author’s personal projection of his real life experience observing how Dr. Joseph Bell work as his assistant.