Category Archives: Sherlock Holmes Text Analysis

Sherlock Holmes – Voyant

After utilizing Stephen Ramsay’s methodology of “screwing around”, which is something that I do most of the time when searching the internet, or utilizing internet tools, I discovered that Voyant can be exceptionally useful in analyzing text. I feel as though through Ramsay’s “screwing around” method, I was able to fully grasp this analytical tool, and fully understand how useful it can be. The screenshot that I posted above was the screen when I typed “Watson” into the Voyant search bar. This tool is very effective, and the graph in the upper right hand corner even goes as far as to track how often the particular word is used in the paper, chapter by chapter. I feel as though the graph is the most interesting and useful aspect of this analytical tool. It shows how often the name “Watson” appears in the text, and below the graph shows in which sentences he is mentioned. I also really enjoyed utilizing the “unique words” tab. There is no surprise that “the”, “and”, “of”, and “was” were words that were often used in this text, or any text for that matter. However, Voyant’s ability to differentiate between regular words and unique words, and its ability to show you those unique words, is exceptionally helpful to the reader.  A good example of a need for this tool would be the character analysis of Septimus Smith that my group did a few weeks ago. Had we known about this tool (and if Mrs. Dalloway was already programmed into Voyant) then our search for Septimus Smith in the book would have been considerably easier. This tool gives the user every instance in which a certain word is used in the text. With a few simple clicks my group would have been able to see every sentence in which Septimus Smith is used, and in which chapters he is shows up most often. This tool is immensely useful.

 

Sherlock Holmes in “The Mysterious Voyant”

When I first used Voyant I felt very much like I do when I walk into a record shop; excited, completely overwhelmed by the amount of content, and uncertain of where to begin. To make things a little easier on myself I limited my exploration to “A Study in Scarlet” rather than the entirety of the Sherlocks Holmes corpus. I began by editing the word cloud and common word tools so that they would no show common words such as “the, a, and, etc”, then I looked at the results provided by the tools and realized that they did little to increase my understanding of the text beyond point out the obvious, such as the frequent presence of the word “said” in a story that involves dialog and narration. Next I proceeded to attempt to create a custom collection of tools to analyze the text. The keyword here is attempt, because my effort proved nigh-fruitless. I decided to keep the corpus reader and the word trends tools, then I added several other information visualizers, such as the bubblelines and knots. Unfortuantely I could not discern how to work these visualizers. After failing to display anything for several minutes, attempting to access the help page and finding it empty, I decided to focus on the word trends tool. Initially I was uncertain of what to look for, then I thought “how does one go about understanding a situation/do detective work?”. The answer that I decided upon was through the use of one’s senses and thought processes. I set up a word trend graph showing results for “saw”, “heard”, “felt”, “tasted”, “smelled”, and “thought”, then proceeded to exmanine the graph in an attempt to see which senses Sherlock seemed to use most often and if there was any correlation between and subsets of the terms. Tasted and smelled both returned no results, so I ignored them. On average, saw and thought returned the most results and there seemed to be somewhat similar and overlapping changes in the frequency and locations of the appearences of saw and felt. Almost every line approached zero near the center of the graph. I proceeded to add “sherlock” to the graph, and the decrease in the frequency of the appearence of the keywords corresponded with a lenghty disappearence of the word sherlock from the text.

Sherlock Holmes Text Analysis

While “screwing around” on Voyant I found that it provides a clear insight into the  number of times a term occurs in a document. When I initially accessed Voyant through the class website, the screen showed several terms in the 36 Sherlock Holmes documents. On the bottom left side of the screen Sinclair and Rockwell provide a summary of all the unique terms in the document ranked in order from most frequently occurring to least frequently occurring. However, while this feature seems very neat and convenient,  it did not provide me with much information beyond the surface. I also could not figure out how to get the corpus to display only the unique words found in the summary portion.  Voyant does a great job of coupling the word frequencies with a word count and a graph of the trends for each term.

I mainly found myself playing around with the Cirrus tool located in the upper left hand portion of the page. I found ways to edit the word list and locate words that I felt were important in the Sherlock Holmes stories. I also took the time to look at some of the other visualization tools but the Cirrus tool ultimately proved itself to be the most useful. By using this tool I was able to see how specific terms corresponded with one another. Doing this allowed me to see how we can construct new meaning in literature through the use of these types of tools.

Overall, the Voyant interface is very clear and concise. All of my visuals loaded very quickly. However, I struggled to get a hang of all the features available within this web-based corpus. I also feel like the layout is a little misleading for first time users. While the visuals are clearly laid out, some of the key features are difficult to access.  For, example extracting the list of frequently occurring terms from the visuals was very difficult without manually stopping each word. At the end of the day Voyant seems to be one the best text analysis tools available because it presents basic information about word trends in a clear and concise way. While I had some bumps in the road, with frequent use it would become much easier to use.

Sherlock Holmes Voyant– Cirrus View

When playing with Sinclair and Rockwell’s Voyant, I decided to take a look at the entire Sherlock Holmes corpus in order to view trends that spanned across the collection. I tampered with the different formats of data visualization (the most interesting of which was the bubbles, which made a shrill, high pitched noise as it went through the entire text word by word) , but the most useful one in my perspective was the default Cirrus (word cloud) format.

With Cirrus, you can see the giant word cloud of the most frequently used words, with the size corresponding to the number of occurrences in the Sherlock Holmes corpus. After filtering out the common and irrelevant words (by selecting the Taporware option in the “Stop Words List”), the user gets left with a more relevant view of the text. One of the first things I noticed was the relatively large word “man.” Subconsciously going back to our class discussion regarding Arthur Conan Doyle’s perspective of women (a la “Scandal in Bohemia”), I found it interesting that the word “man” was so much more prevalent than any word pertaining to a female. The largest I could find was “lady,” which at 176 instances was dwarfed by both “sir” (at 323 instances) and “man” (weighing in at a whopping 902 instances).

The Cirrus view also maintains the Words in Documents window, which allows the user to see exactly how many times the word occurred in each document. I found it interesting that, out of 36 stories, the word “lady” did not come up in 12. I feel like this reveals a little bit more about Arthur Conan Doyle’s perspective on women—although he may respect them (by making one as a protagonist who outsmarts Holmes in “Scandal in Bohemia”), he still grew up in a patriarchal society that was mainly concerned with reading about men instead of women.

There was one thing I would have liked to see, though, which is having more than one word on the Word Trends graph. I’m sure Voyant has this capability, and I’m relatively sure I’ve gotten it to work before, but for some reason this time tampering with it did not yield the same results as the last time.

Sherlock Holmes in Voyant

After exploring many different tools used in the Digital Humanities such as VisualEyes and Voyant, I have come to realize that the use of these tools is mainly, as said in our assignment, “surfing and stumbling”, which is creative and challenging. Other than a few difficulties in using the program, such as not being exactly sure how to export the Holmes stories, I thought the exploration of this program was pretty interesting. While using Voyant and trying out the specific tools, certain questions and ideas also came to mind.

I can see that a lot of people using this program would be frustrated by trying to identify a need for such online tools, but sometimes screwing around with text through a program can just be for generalizing text or data, or any specific need.

When I explored Voyant with A Study In Scarlet, I really enjoyed trying the different tools and seeing what they would do when I selected them. The tools screen-shot above ^^^ are just a few examples of the flexibility of the program. Voyant gave the user so many different options such as showing where certain words were and graphing them, to having a tool with bubbles that brought up different words as the tool ran through the entire story. Voyant Tools, with a motto of “Reveal Your Text”, does just that! Voyant reveals text depending on how you ask it, and what you ask it, to reveal.

After exploring with the program, I did have a few questions though, that I thought could allow the user to get more information from the text. The main question and possibility I thought would be beneficial was, “Is there a way to extract only dialogue or quotes from the text?” With that in mind, I thought the ability to collect only the dialogue would be an extremely useful tool.

So, after messing around, or should I say surfing and stumbling my way through this online tool, I found many things extremely creative, useful, and amenable.

Assignment 4: Voyant

I’m having a lot of trouble getting beyond a superficial understanding of the collection of stories. If you use the summary tool, you can get an idea of what each story is about. For instance, the top words from a Scandal in Bohemia are photograph, king, majesty, Adler, and Irene. If you click on a story, you can perhaps get an even better idea by using the Words in Documents, Keywords in Context, and Word Trends tools together.

So, I’m definitely understanding these tools as ways to practice computer assisted reading—if I’m understanding it correctly as a way to more efficiently scan a document, but focusing on its keywords.

But, I have not been successful in making any headway in this “screwing around” methodology. There are perhaps several reasons. First and foremost, I don’t think I understand the tools and how to use them effectively. Mostly what keeps coming to my mind is, I don’t know what I’m looking for… which I realize is exactly the point of Ramsay’s essay, so I guess I don’t fully buy into his idea.

I suppose, to some degree I’m “looking for” an interesting image that makes me ask a significant question, or leads me to some kind of conclusion that brings some kind of significance to what I’m doing. But I haven’t seen anything yet, except the summarization quality of the tool, which allows me to get an idea of what the stories are about.

I thought that, perhaps, one of the more visual tools would be more engaging, so I tried both the knots tool and the collocate clusters tool. I found both these tools to be less than helpful in my understanding of the texts. When I tried to click on the various segments of the knots, the screen would reload, and the lines would draw themselves over and over again. With the collocate clusters, I liked that I could visually see what was apparent from the summary tool, but it didn’t deepen my understanding any.

In summary, I will be the first to admit that I have been hindered by a lack of interfacing skill in my ability to arrive at any conclusions from screwing around on Voyant; however, as I was screwing around, I couldn’t help but think that if I had something specific for which I was looking for support, I could definitely use these tools to help me find it.

Assignment 4

Voyant is another interesting text analysis tool that involves embedding literature and technology together. The tool seemed useful in breaking down longer novels into smaller bits. Alike tools we used in class in the past, Voyant transforms text into simpler visualizations such as graphs and clouds. The first visualization happens to be that of the word cloud, which highlights the most common words such as was, the, and and. Therefore the word cloud seemed irrelevant when analyzing a story. The second box below, the summary and words in the entire corpus followed the same roots with the word cloud, and focused mostly on frequency of words through each stories. The word trends tool where, being a dynamic query seemed quite useful as I could scroll through the story and and select any word, which would then bring out relative frequencies for that word. Keywords in context seemed most helpful when trying to understand the story because it actually wrapped up where all the occurrence of  the selected word was and had the context of those words included.

In Sherlock Holmes’ stories, death seemed to be a popular theme and topic, which led me to choose as one of the keywords for Voyant to analyze. This allowed me to view when the word ‘death’ occurred in each of the stories. I could easily navigate and view the paragraph of where the word was located and what was most impressive was that it updated the corpus reader to the spot where the each occurrences of the words were. Another search I went through was the word mystery. This led me to contexts of where Sherlock was in midst of solving a mystery or already solved a mystery.

Voyant is a useful tool to search through the story. Versus a actual hardcopy of a book, it is much easier to navigate and manipulate through. However, since the tool is primarily focused on search queries, the user needs to know what to search for. So it seems more helpful for those who have already read the story and knows what to search for when going back to it.

Voyant Analysis

Having only previously read one of Doyle’s Sherlock Holmes works, the novel The Hound of the Baskervilles, I felt compelled to read the first novel, A Study in Scarlet. I initially began reading within Voyant, but found the tool too clumsy and annoying for “conventional” reading. Unless you’ve scrolled through the entire novel to cache it, the text won’t load fluidly as you read. The tool waits until you scroll, and then pauses for an annoying moment to load a new section, then jerkily jumps to a seemingly arbitrary spot, forcing you to find where you left off reading. And any inadvertent click on a word momentarily freezes the tool as it highlights that word throughout the entire text – and then you have to go and enter nothing into the search bar to clear the highlight. Maybe I’m inept and impatient, but I switched to a Project Gutenberg version about 1/3 into the novel. Voyant is clearly not meant to be used as a sophisticated text reader – it’s only for statistical text analysis and the like. (It also doesn’t appear to preserve text formatting like italics, though that might have just been a problem with the source text.)

My Voyant attempts at “surfing and stumbling” through A Study in Scarlet weren’t particularly more successful. I don’t think there’s much value in trying to figure out the contents of a single short novel that can be read a few hours by monitoring fluctuating word occurrences, though the graphs are fun to look at, I suppose. The best part about Voyar is the word frequency count that appears when you hover the cursor over words. It’s neat to instantly see how often different words are used – and what words are only used once. I noticed “brownish” appears three times, something you wouldn’t otherwise notice. I’m sure it’s relevant somehow.

The possibilities for comparing a larger body of works are much broader. Unfortunately I was disappointed to see that the Voyant link to the “entire Sherlock Holmes corpus” contained only 36 short stories – well under the 56 stories that are undisputedly canon, as well as missing the 4 novels (even A Study in Scarlet.)

 

Sherlock Holmes Analysis

I really never had the chance to read Sherlock Holmes but only once. Even reading the novel, I had some difficulty getting into the story despite that I am very familiar with Holmes and Watson. Even then I never noticed the words that were used majority of the time. I guess when reading, you focus more on the story of the book versus the comprehensive. Using the “Voyant Tool” website helped me really see which words you would find regularly in the text. For example: the words: the, of, to, that, and, a and etc.. were the common use. When we first started this exercise, I was clueless, but after doing in together in class, I felt more comfortable. But reading the text online and using this website allows anyone to type in a specific word and see if it is used less or more. I chose to type the words “Sherlock and Watson.” surprisedly, a good amount came since they are the text’s main protagonists. (Sorry, I had difficulty coping the graphs to show. I am not a computer wiz so this is all I could do).

Type: Sherlock
20) greek_interpreter
19
27.16
10) scandal_in_bohemia
11
12.92
2) twisted_lip
10
10.87
13) red-headed_league
10
10.98
23) five_orange_pips
10
13.67
32) boscombe_valley_mystery
10
10.40
33) blue_carbuncle
10
12.80
5) speckled_band
9
9.18
7) six_napoleans
9
10.83
15) norwood_builder
8
8.69
14) priory_school
7
6.12
16) noble_bachelor
7
8.64
31) case_of_identity
7
10.04
It turns out that Sherlock is used quite a lot when you type his name in. I found it remarkable of how many times his name is used. But when I typed Holmes in, I think Holmes was spotted a little more than Sherlock. Basically throughout the story, Watson and the other characters refer to him as “Holmes,” instead of “Sherlock.”
(Bear with me, It’s not the graph but its the only thing I could copy over to show.)
Watson) yellow_…2) twisted…3) three_s…4) stock-b…5) speckle…6) solitar…7) six_nap…8) silver_…9) second_…10) scanda…11) reside…12) reigat…13) red-he…14) priory…15) norwoo…16) noble_…17) naval_…18) musgra…19) missin…20) greek_…21) golden…22) gloria…23) five_o…24) final_…25) engine…26) empty_…27) dancin…28) crooke…29) copper…30) charle…31) case_o…32) boscom…33) blue_c…34) black_…35) beryl_…36) abbey_…0.050.0Relative Frequencies
 What I found for Watson was most interesting as well. Both partners of solving mysteries and fighting crime, so why wouldn’t their names show up a lot. This was a great exercise and website to truly understand DH better. I am not going to lie. When it comes to watching movies, I could understand it a lot better than reading a book because it usually takes a while for me to get into the story because I’m very visual. So, when we watched the TV series Sherlock, it really did helped me, despite it wasn’t a movie based on the novel but a more updated version of Sherlock which I liked, understand the story and the characters a lot better. So exploring the Voyant tool website helped me see the words differently because I am a visual person.

Sherlock Name Analysis

It took me a while to settle in on what I wanted my searches to be focused on. For about 30 minutes I was just typing in arbitrary words like “clue”, “fight”, “gun”, etc to try and find the faster-paced parts of the story. I will make a note that the word trend function makes it easier to find a specific part of a book; like a deduction scene, chase scene, romantic scene, etc.. I finally tried trying in the names of each of the character to see how often they came up in conversation. Interestingly enough, the word ‘Holmes’ showed up more than the word “Sherlock”. Also, the marked incline of ‘Holmes’ after the 4th section might be symbolic of Sherlock Holmes getting closer to solving the mystery. On the bottom is Watson’s name…he doesn’t come up nearly as often as the main character (naturally), and is absent during a majority of the 2nd half of the section.

I then switch over to the whole corpus view and did the same name search but limited it to only “Sherlock”, “Watson”, and a character from Study in Scarlet named “Gregson”. This way, I would be able to see which of the stories mentioned any particular character a lot and which mentioned them only a few times. As you can see from the ‘Sherlock’ and ‘Watson’ graphs below, each character is motioned a varying number of times; there are clear upper and lower boundaries. I included the last “Gregson”  example as a means of displaying how this application can help you find even the most specific characters in a story. I ran across ‘Gregson’ in the first story I read and then wanted to see if he showed up in any other stories in the corpus. The last pane shows you that Gregson only shows up in Study in Scarlet.