New Approaches in Digital Humanities – Digital Visualization Tools for Textual Analysis of Fan Fiction


Digital tools can be and are useful to scholars in the humanities, like myself, who work closely with various forms of text. I suggest that digital approaches to the field of literature can enhance our current practices greatly. I do understand the reluctance (to rely on data, to let computers “do the work for us”) but I also believe that digital tools do not necessarily replace our traditional analysis, but rather add another layer or dimension to the close-reading we have been taught and use. Through this project, I examine two different digital tools while working with texts that have never existed in book form or any other corporeal shape. Fan fiction are texts born and disseminated in the digital realm. In this sense, it seems to be a logical step to use digital tools to approach these texts. Matthew Jockers suggests that “though not ‘everything’ has been digitized, we have reached a tipping point, an event horizon where enough text and literature have been encoded to both allow and, indeed, force us ask an entirely new set of questions about literature and the literary record” (4). In the same vein, I also attempt to ask new questions of fan fiction, while at the same time asserting its place as valuable literary genre. Just as Franco Moretti proposes to focus on works in the canon of world literature that were neglected, I suggest that fan fiction works which are still regarded skeptically by literary scholars, might benefit from the methodology of distant reading. Fan fics as such are not commercially produced texts and while some resemble novels in length, they are often posted in installments, more akin to serial publication. Depending on the status of the work (either complete or in progress) readers will encounter it differently throughout its online life. Similarly, this project is also a WIP (work in progress) and will receive periodical updates as I begin to incorporate these approaches into my research.

The evidence gathered in my analysis is only a fraction of how much fan fiction exists out there on the web in countless fandoms. Across different media, from books to television, cinema, and comics, there are so many texts that it would be completely unfathomable to analyze even the smallest percentage of such a large corpus or even to attempt to establish a comprehensible overview. I do not necessarily locate fan fiction stories in their larger historical context, but rather attempt a more statistical analysis that will reveal more information than a close-reading would have been able to do. To situate fan fiction, certain genres, stories or pairings in a larger historical or literary framework would simply exceed the margins of this project. Rather, I will choose the pieces of fiction that I analyze through a combination of statistical inquiry and personal experience. That is not to say, that a larger project of collecting and using “big data” may not be a fruitful endeavor. I believe that undertaking such a macroanalysis, as Jockers imagines it, will yield new and interesting results for scholars in fandom studies. Fans have shown time and again that they are willing to go beyond a superficial engagement with their objects of affection, so perhaps this is not as utopian a fantasy as it might seem.

The digital visualization tools I introduce here make use of how fan fiction is gathered in archives, or digital libraries, such as ArchiveOfOurOwn (AO3). There, it is possible to scrape metadata while at the same time work closely with the text and copy its contents into digital visualization tools. The tools I am working with are easily accessible for scholars like myself who are just starting to take small steps toward a field we know as digital humanities. I am not using any programming language, as I am not yet fluent in it, and for these first steps I consciously limit myself to tools that are easily accessible and understandable, even for ‘newbs’ like me. I am partially guided by the concept of distant reading, first introduced by Moretti as a way to look beyond the canon. Distant reading means to not read texts as we have learned, as we have been taught to do, but rather to take a step, or several steps back to gain a more comprehensive view of a corpus of works. The tools I will be using in this particular instance are RAW(Density Design) and Voyant, a digital tool that allows for digital text mining. On its website, Voyant is advertised as a web-based reading and analysis environment for digital texts, or in other words, a tool that can be used to explore (new) distant reading techniques. To broaden my approach, I also attempt to visualize data in RAW that has already been collected by a popular fandom statistician – DestinationToast – to see what I could do with the data and how some visualizations work better than others.

“A macroanalytic approach helps us not only to see and understand the operations of a larger “literary economy,” but, by means of scale, to better see and understand the degree to which literature and the individual authors who manufacture that literature respond to or react against literary and cultural trends” (Jockers, 28). Indeed, such an analysis might lead to an understanding of how fan fiction authors react to certain developments or trends in the texts they engage with. There is hardly a more palpable way of exploring how fans negotiate their relationship to fictional texts than through fan fiction. It is an unprompted, often unfiltered, response that incorporates personal preferences. Fans generally write fan fiction because they are fascinated with what they see, read, or experience, while they are also frustrated by the texts they encounter. This combination of frustration and fascination is what constitutes the corpus of my analysis, and perhaps through a macroanalysis/distant reading approach these emotions might take on a more corporeal shape through the texts. I will not, however, engage with the economy of the texts, even though it might also become interesting in the future to research connections between the original texts, say books or television scripts, and compare them with their fan fiction adaptations. Perhaps such a macroanalysis will reveal where the frustrations and pleasures lie and if there is something particularly fascinating about world-building that invites more fan fiction in certain fandoms than in others. But I am getting ahead of myself. For now, I mostly focus on the texts as I selected them, which includes data and metadata, but not to a degree that would be needed in a macroanalysis.

Additionally, the connection between fan fiction and digital mining becomes apparent in the relationship between accessibility and denial of the same. Jockers suggests that digital text mining presupposes access to digital(ized) texts and that it is exactly this access that leads miners down a forked dark road of copyright law and ideas of originality (173). Fan fiction as a genre is by nature ensnared in these discussions and perhaps leads us to a different understanding of what we mean by originality and accessibility. Fan scholars always caution against using fan works as objects of study, as they were not produced with this particular thought in mind. However, if we do use them as objects of our analysis, as works to be probed with digital tools, we may even open up new ways of thinking about ‘classical’ literature and the way it is approached by traditional literary scholars. There are not only technical but also legal hurdles that need to be overcome for both canonized literature and fan fiction.



For the first part of my project, I used the digital visualization tool RAW (DensityDesign), an open web app that creates custom vector-based visualizations. I chose to use RAW because it has a rather uncomplicated interface, say in comparison to Tableau, and because it offers user-friendly instructions. I would even suggest that RAW is a good model for beginners who want to play around with digital visualizations before going into the more complex structures that a program such as Tableau offers. In order to create these visualizations, I used metadata compiled by DestinationToast on tumblr to shorten the time it would have taken me to figure out how to gather all of the information myself. DestinationToast is kind enough to make Google Spreadsheets available to anyone who is interested in fandom statistics, so I warmly suggest checking out their blog. On a technical note, I was not able to simply download the visual files from RAW. So instead, I took a small detour and grabbed the scalable vector graphics file and plugged it into Adobe Illustrator to clean up the visualization (for example making sure that the names of the fandoms did not overlap and become illegible) to export it as a web-usable PNG file. The extra effort was worth it, looking at the illustrations now.

Figure 1 – Top Fandoms on AO3

The “circle packing” of Figure 1 is used to display hierarchies and compare values.  This visualization is particularly effective to show the proportion between elements through their areas and their position inside a hierarchical structure. I attempted to locate the most popular fandoms on AO3 and their respective top pairings (relationship constellations) from which I have chosen the fan fics for my analysis. Therefore, the larger circles contain the fandoms with most fan fiction stories, Supernatural clearly taking the lead. Network analysis is an excellent way to highlight connections, or in this case to visualize the size and popularity of certain fandoms. Data is presented as nodes in a two-dimensional space, making the connections apparent and providing an overview. Nodes of higher degree, aka more fan works in the fandom, appear larger in connection to other fandoms. The questions that follow might be answered through further research or in this case close-reading. “The combined methods of network analysis based in collaborative markup and automated text mining can aid in prediction but do not remove the need for human reading of the material. … Machines are not replacing the human factor in meaning making, but they can help us to look wider and delve deeper” (Jockers, 130). In this instance, RAW can be utilized to visualize how the corpus of works can be approached and narrowed down to a point where it becomes possible to begin a textual analysis. However, this general overview of the largest fandoms still leaves many questions, regarding different fan fiction genres, unanswered.

Thus, I decided to divide the corpus by its most general definitions: slash, femslash and het. These three categories are used to label fan fic, as it is focused on either male/male (slash), male/female (het), or female/female (femslash) pairings. To stay within the proposed framework of this project, namely exploring digital visualization tools and distant reading techniques, I decided on a case study oriented approach, rather than attempting to provide a larger overview of fan fiction as a genre and its historical and economic background. Therefore, the selection for the case study remains linked to the three aforementioned fan fiction categories.

Figure 2 – Top Slash Pairings
Figure 3 – Top Het Pairings


Figure 4 – Top Femslash Pairings

In Figure 2, 3 and 4, metadata scraped from AO3 was once again used to create visualizations which highlight the most popular pairings in each respective category.  Not surprisingly, it becomes clear that slash (male/male) fan fiction is the most popular category with the television program Supernatural as its flagship. Slash as a fan fic genre has received the most scholarly attention in the past, and perhaps this distant reading approach can later be used to explain why. In the femslash category (female/female), Once Upon a Time is at the forefront of the overview. I was able to anticipate these results, but came up rather empty when trying to predict what would be the most popular fandom and pairing in the het category (male/female).

In the first two visualizations, an alluvial diagram is used to depict which fandoms have the most popular pairings, judged by the number of stories having been written and posted. Not surprisingly Supernatural is once again ahead of the curve. The alluvial diagram in Figure 3 is especially helpful to show how pairings can belong to the same fandom but are constituted through different characters. In Figure 4, the most popular femslash pairings are visualized with a cluster dendrogram and contain the respective numbers of fics posted for each string. Through these analyses, the results of the selection are as follows: Dean Winchester/Castiel of Supernatural for the slash category, Clint Barton/Natasha Romanov of Avengers for het, and Regina Mills/Emma Swan of Once Upon a Time in femslash. From this point onwards, I shifted my focus from the attempt to define the objects of inquiry to a more in-depth usage of digital text mining tools and distant reading approaches.



Jockers points out that literary and scientific analyses differ in so far that one employs “close-reading”, a sort of concentrated effort of reading a text, whereas science uses experimentation to arrive at a conclusion (6). By using Voyant, I aim to combine experimentation and close-reading to look at three select fan fiction texts of over 371k+ words as my main corpus. Each text belongs to the three categories outlined earlier, and I attempted to investigate how I could approach these fan fics separately and together with specific questions and inquiries. I have chosen these three fan fics because of their popularity on AO3 (hit count) and the first immediate result is that longer fics with more chapters receive decidedly more hits, kudos and comments than shorter stories. But where to go from there? Because the aim of this project is to begin using digital tools and to explore how they might be employed, I decided to start small (and branch out to larger questions in the future).

I have purposefully chosen fan fics that I have either read, not read, or only partially read to see if I can glean results without knowing some of the stories and their content. And also because I have personal preferences as to which fandoms or pairings I engage with. Because of the nature of fan fiction as a fan-produced text that is readily available online, but still remains a rather private matter in the sense that it is archived on specific websites, I have decided not to provide hyperlinks to these particular stories – in accordance with the suggested guidelines of the Organization for Transformative Works and Cultures. Rather, the stories can be found through a quick search for those really interested in reading more.

The first fic in the het category (with the Clint/Natasha pairing) is called “The man on the bridge” by boopboop. I have found that this particular relationship (Clint/Natasha) oftentimes appears in connection to other pairings, aka in a multi-ship environment, as is the case with “The man on the bridge.” Because of the large cast of Avengers and the relative flexibility of using comic substitutes, this might not be terribly surprising. However, it appears as if the most popular pairing in this fandom is Steve Rogers/Bucky Barnes, a slash couple that gained ground in the wake of the second Captain America film. I still decided to use the multi-ship story in addition to a fic that only contained the Natasha/Clint pairing. After plugging this story into Voyant, it became immediately obvious that Natasha and Clint do not appear nearly as often as Tony, Steve and Bucky.

The “knots” in Voyant represent selected terms across the corpus and the more they intersect the more a linkage or correspondence between the terms exists. The three main characters, all of whom are male, seem to appear as the ones with the most dialogue (indicated by the term ‘says’). This is actually not unlike the Marvel movies, where female characters are not only scarce but also very limited in their screen time. They would also clearly fail the Bechdel test, but that is neither here nor there. Perhaps though, it is not surprising that the fic follows the same structure. In comparison to another fic (the one with the highest number of hits that only references Natasha/Clint as the main pairing) called “we were emergencies” by gyzym, the knot visualization appears quite different.  It is also much shorter, but mentions Natasha more times than the multi-ship fic.


For those interested in learning more about this particular pairing and the texts I have mentioned, simply follow the link provided to the website that allows others to engage with the same set up as me.

I employed the same selective strategy for the Dean/Castiel (Supernatural) slash fic and landed on “Twist and Shout” by Gabriel/standbyme, which blows the other fics out of the water in terms of hits, kudos and comments. After comparing this fic, according to metadata the most popular one, in Voyant to the other two stories, I found that it has the lowest vocab density and lexical richness (unique word forms in slash: 6,393 versus het: 8,290 and femslash: 8,352). That does not mean, of course, that this story is any less well-written or engaging than any other, but it simply indicates that popularity of a fic is not necessarily related to a particular writing style. The Avengers fic has highest vocab density, but it is noteworthy that it is also partly a slash fic. The overall word most used in all three fics is “Cas”, clearly favoring the appearance of the Supernatural character Castiel above any other. It is difficult to make larger assumptions for different fandoms despite the massive size of these samples. Nonetheless, a further analysis of fan fiction’s lexical richness across time and fandoms might reveal how fan fiction has changed from the first generation to write fan fiction online, maybe even starting from fanzines, to how authors are lexically equipped today and how these changes may have affected the genre and vice versa. For now I am content to leave it at a superficial observation with an eye on future possibilities.

The prominent femslash pairing is Regina/Emma from Once Upon a Time and it is interesting to note that the top two fics on AO3 are AU (alternate universe) stories. Because the television program focuses on fairy tales and already promotes the idea of parallel realms, I would suggest that this is an indication of how the primary text influences fan fic production. The same is true when comparing how often and in which context the word “love” is used in all three stories.

While the cirrus cloud on Voyant measures word frequency, it is also easy to search for specific terms and their contexts. “Somewhere, someone must know the ending” by maleficently contains the word “love” or its derivatives most often (more than twice as many times as the “sex*”). Because fan fics are mainly focused on pairings, aka partnerships and relationships that might or might not appear in the primary text, this particular word could be of importance. Especially in Once Upon a Time love is a main story element, as fairy tales often end with “true love’s kiss” that breaks every curse. In the fan fic, this relation also appears as the word “love” is frequently used in the context of true love, whereas in comparison, the slash fic references love for music or food and also uses it more freely in terms of expressing feelings, rather than explaining its meaning and therefore situates it in different contexts.

In general, the names of the respective main characters appear most frequently throughout all texts, but frequencies keep changing across text segments as the focus seems to shift from one character to another (especially visible in the Avengers fic). Such observations are easily discernible while using Voyant and seem to be a good step toward further investigation. I have found that some of the Voyant tools are dependent on the number of works in the corpus. It also appears to be of utmost importance to approach a corpus of works with a very specific set of questions to quickly and effectively use Voyant and to support a larger analysis. For example, through examining word frequencies in Voyant, it might suggest places where close-reading could be fruitful. As I have just begun using Voyant as a digital text mining tool, my analysis mainly contains superficial observations, rather than in-depth inquiries. Because it appears that fan fiction has yet to be approached in this way on a larger scale, I am merely taking small steps forward without an overarching concept in mind. To conclude this section of my project, I will briefly discuss how Voyant might enhance a textual analysis, how fandom studies have already begun using digital tools for other projects, and where potential future projects might lead.



Bhattacharyya et al. discuss the fragmented nature of digitized texts in their essay “A Fragmentizing Interface to a Large Corpus of Digitized Text: (Post)Humanism and Non-Consumptive Reading Via Features,” as those texts are not always available in complete form. The same is true for many fan fiction stories, as they remain incomplete or simply disappear from their original posts. I do not want to discuss the ephemerality of the internet, but rather point out that stories which are fragmented in one way or another still allow for a distant reading of the text, to later focus or zoom in on a particular feature across a large corpus of works. Not surprisingly, such technological changes (and the discursive practices associated with them) that seem to favor a fragmentation of the reading experience at a sociotechnical level are interpreted as a threat to ‘philosophic or literary’ inquiry (Bhattacharyya, 68). In this instance, the reading practice is both fragmented by the digital and serial nature of the text, as well as the digital analytical tools which are employed. I suggest that it is important to highlight this idea of fragmentation in regards to text mining tools such as Voyant. Voyant, especially, separates and breaks down the text to a series of numbers and visualizations that strip it from a reading experience that traditional scholars might deem irreplaceable. As an example to suggest how digital text mining tools, such as Voyant, may have been used as an addition to a fan fiction analysis, I took a closer look at Veerle Van Steenhuyse’s essay “The Writing and Reading of Fan Fiction and Transformation Theory” in which she provides a close-reading of fan fiction, 32 texts in total. It becomes apparent that some of the information which might have enhanced her analysis is missing. How long are the texts? How many comments have they received? Where are the differences between the texts on a non-textual level (metadata)? Now that I have engaged more closely with digital visualization tools and the idea of distant reading, I do believe that such an analysis would greatly benefit from a few select additions. “This fluid concept of genre (an understanding constituted by texts fans consume) makes it possible to explain how generic themes can appear in fan fiction texts while their writers are unfamiliar with the genre texts they “belong” to (Steenhuyse, 5). I suggest that such a statement could’ve been underlined with an attempt at topic modeling, or at the very least with an investigation of how themes in fan fiction correlate with say a romance novel. Steenhuyse successfully argues that immersion into a fan-produced world might be facilitated by the fact that fans engage with characters and a fictional world they have learned to appreciate over long stretches of time (for example a television program), so I think that a digitally-enhanced comparison between fan fiction and the texts from which they are derived would yield interesting results.

In relation to this, DeKosnik et al. have attempted to bridge the gap between close-reading and distant reading approaches with the “fan data project”, which they outline in “Watching, Creating, and Archiving: Observations on the Quantity and Temporality of Fannish Productivity in Online Fan Fiction Archives.” By scraping the two largest fan fiction archives, and AO3, for metadata, the fan data project has compiled a large databank of numbers to be interpreted. The aim of this project is to investigate how and why certain media products, television programs, movies, or books, hold fans’ interests longer than others. Fan fiction serves as an approachable measure simply because it consists of and exists in relation to metadata that can be effectively evaluated by data analysis. Such an analysis might also be possible on a smaller scale, perhaps by tracing the emergence and growth of a single media fandom for a certain period of time. The outcomes can then once again be presented through digital visualization tools as I have done earlier on a very small scale. De Kosnik et al. argue that it must be acknowledged that fan production, as a realm of online activity, is large and increasing and will therefore be a rich site of data analysis for years to come (160). Perhaps it is even feasible to conduct interviews with authors to add to data analysis. But that is another thought for a possible project in the future. There certainly are different avenues of using digital tools, either to scrape a website or to use tools to get a closer look at the corpus of works, which allows for varying levels of analysis. The reason why I introduced this fan data project is to show that fan communities and fan scholars are indeed interested in numbers and statistics, and that digital text mining tools might be another step toward a digital humanities approach.

Ultimately, I will continue using Voyant and other visualization tools to see how my own research might benefit from distant reading approaches. The more I engage with these particular tools, the easier it will become to receive results and perhaps make interesting new discoveries through the texts I examine. This project was only the first step of a much longer journey, a work in progress that will never quite end.


Baker-Whitelaw, Gavin. “Unpacking the unofficial fanfiction census.” Daily Dot. Web. July 15 2013. URL:

Bhattacharyya, Sayan, Peter Organisciak, and J. Stephen Downie. “A Fragmentizing Interface to a Large Corpus of Digitized Text: (Post)Humanism and Non-Consumptive Reading Via Features.” Interdisciplinary Science Reviews 40.1 (2015): 61-77. Web.

Burke, Aisling. “Metadata, Searching and Retrieval on: FanFiction.Net, LiveJournal and Archive of Our Own.” Web. N.d. URL:

De Kosnik, Abigail, et al. “Watching, Creating, and Archiving: Observations on the Quantity and Temporality of Fannish Productivity in Online Fan Fiction Archives.” Convergence 21.1 (2015): 145-64. Web.

DestinationToast. “ToastyStats: Fandom statistical analyses”. Web. 04 March 2016. URL:

Drouin, Jeffrey. “Close- and Distant-Reading Modernism: Network Analysis, Text Mining, and Teaching the Little Review.” The Journal of Modern Periodical Studies 5.1 (2014): 110-35. Web.

Jockers, Matthew Lee. Macroanalysis: Digital Methods and Literary History. Urbana: University of Illinois Press, 2013.

Khadem, Amir. “Annexing the Unread: A Close Reading of “distant Reading”.” Neohelicon 39.2 (2012): 409-21. Web.

Rubin, Victoria and Vanessa Girouard. “Comparative Stylistic Fanfiction Analysis: Popular and Unpopular Fics across Eleven Fandoms.” Research Gate. Web. 09 July 2014. URL:

Van Steenhuyse, Veerle. “The Writing and Reading of Fan Fiction and Transformation Theory.” CLCWeb: Comparative Literature and Culture -13.4 (2011) Web.

Leave a Reply

Your email address will not be published. Required fields are marked *