Digital text-mining tools can help researchers understand document collections that are prohibitively large for close-reading. Voyant Tools is a FREE, open source, web-based application used for performing textual analysis. It is used to highlight trends discovered in the chosen text, to display the word frequency, phrase frequency and sometimes, in rare use cases, even to bring to light the various correlations between commonly seen words in the prose. It remains one of my favourite text analysis tools because it is really fast and easy to use, even for people who have no background in text analytics. I wanted to explore this platform further, since its use in the field of Digital Humanities is quite widespread today, and to that end, I decided to use Shakespeare’s famous 1596 play, ‘A Midsummer Night’s Dream’ as the corpus for my analysis. I read this play as part of my high school curriculum several years ago, and was very curious to find out if the visual displays generated by Voyant Tools did a good job of summarising the play and emphasising its themes as I remembered them. In order to perform the analysis, I downloaded the XML file containing the play’s dialogues in it and passed it into the Voyant cloud generator. Voyant’s newest version has been adapted to read efficiently from HTML and XML files so minimal polishing and filtering of the dataset was required before subjecting it to evaluation. A detailed look at the interface of the Voyant platform after it had processed the XML text is shown above. Most of the visualisations on the embedded platform are interactive and can be further explored by moving the cursor over them. The interface is discussed in greater detail in the ‘Presentation’ section below.
Source: A Midsummer Night’s Dream by William Shakespeare (1596)
Processes: The XML file containing the prose was downloaded from the shared Google Drive which is accessible to all students taking Hacking the Humanities in the Winter of 2021. Then, the file was filtered and polished using OpenRefine to enable efficient processing and was finally passed into Voyant Tools for visualisation and analysis. Certain stop-words were removed to make the obtained visualisations more interesting, and meaningful (if the word cloud was just a collection of ‘ands’ and ‘the’s, it would not be of much use to digital humanists or literary scholars, so this is where this functionality of Voyant Tools comes in handy).
Presentation: The interface of Voyant Tools is grouped into blocks, where each block represents some meaningful aspect of the data in a different way. In the embedded display above, the top left corner of the visualisation presents a multicoloured word cloud, where the most frequently occurring words in the play are depicted with the largest font sizes. The colours and the spatial proximity of the words in the cloud make it more aesthetic and appealing to users of the platform. There are several other options available including font size modifications, alternative colour palettes and different word orientations depending on how the user wants the cloud to be formatted. To the right of this piece is a TermsBerry visualisation which is just an alternative way to display the most commonly occurring words in the cloud. In the top right corner of the interface is the trends chart, which divides the document into 10 segments and then plots the frequencies of certain words across these segments. There’s a brief summary block on the bottom left side of the page and this showcases the vocabulary density and the average number of words per sentence in the play. Finally, to the bottom right, one can find features such as the bubble line of the 5 most common words, correlations between words and further literary context for the most frequent words in the prose.
Significance: Voyant’s tagline is “see through your text.” As shown through several visual elements of the Voyant platform, love is one of the most prominent themes of this particular Shakespearean play. The word appears a total of 109 times in the dataset, with the most occurrences of the word in the 6th segment of the play; this is perfectly reflective of its central idea, which is that of ‘unrequited love’, based on a web of complex relationships among the principle characters – Hermia, Lysander, Demetrius and Helena – whose names are also in the list of the most frequently occurring words in the play. Similarly, other highlighted popular words like fairies, night, and dream, are indicative of the theme of magic and a false realism. Voyant does a good job of depicting the prominent themes in a piece of prose but I’d still be reluctant to say that it helps with summarisation of the play because it would be difficult to construct an entire story or hypothesis of what the play is about, based solely on this collection of words.
Voyant is a fun and interactive way to delve deeper into an article or document that one wants to analyse. Even though it doesn’t give as complete of a picture as close reading of a text would, it certainly pin points major themes and storylines underlying the document and helps one develop a better understanding of what ideas the prose piece revolves around.