Michał Choiński Talks about Stylometry

Michał Choiński’s latest book, Southern Hyperboles: Metafigurative Strategies of Narration, melds an innovative use of rhetorical theory with fine-­grained analysis of literary texts. Here, he talks about stylometry, a form of quantitative analysis applied to literary works that helped prove the authorship of Harper Lee’s To Kill a Mockingbird and Go Set a Watchman.

Photo by Aaron Hare

In July 2015, the publication of Go Set a Watchman rekindled global interest in Harper Lee. As her only longer publication apart from To Kill a Mockingbird, the novel made headlines all over the world and sparked controversy as Alabama police investigated charges of elder abuse toward Lee (eighty-eight years old at the time of publication). Leaving aside the artistic merits of the book, which have been questioned by many, it is fascinating to see how the story surrounding Lee’s writing of the novels and the validation of their authorship received a life of its own. Two years ago, I was given the opportunity to be part of a research team conducting a literary investigation of the authorship of the two books.

Harper Lee’s To Kill a Mockingbird and Go Set a Watchman are an odd couple. The latter book is the first version of the former, but it reads like a sequel in terms of plot timeline. After Lee submitted her manuscript to J. B. Lippincott in 1957, Tay Hohoff, a cat-loving, chain-smoking editor, famously said that the novel looked more like a “series of anecdotes than a fully conceived novel.” After a painful process of revision, the manuscript became Go Set a Watchman, which subsequently metamorphosed into the To Kill a Mockingbird we all know today. Yet, when the book became a blockbuster, rumors surfaced that Lee had relied heavily on the guidance of her editor, as well as her childhood friend Truman Capote—by that time already an experienced writer and a character in the final version of the novel. Even now, as argued by Charles Shields, “fifty years after To Kill a Mockingbird appeared, the rumor persists that Nelle Harper Lee didn’t write the novel herself. Truman Capote, so goes the whisper campaign, wrote large portions—or maybe all of it.” Capote never openly denied these speculations; interestingly, he and Lee parted ways when he failed to properly acknowledge her contributions to the manuscript of In Cold Blood. The recent publication of Go Set a Watchman has created the opportunity to put said speculation to rest with stylometric authorship attribution.

Stylometry, which allows computer-assisted and quantitative analysis of linguistic features, is now routinely applied to problems of plagiarism and uncertain authorship. Like many of its other tools, the Stylo package—developed by Maciej Eder (Polish Academy of Sciences), Mike Kestemont (University of Antwerp, Belgium), and Jan Rybicki (Jagiellonian University, Kraków)—made it possible to establish the “authorial fingerprints” of any possible candidates, and to trace them in the text of To Kill a Mockingbird. The package only counts frequencies of highly used words; this was the successful approach used by Frederick Mosteller and David Wallace in determining authorship of some parts of the Federalist Papers in 1964. This approach has also been used in the latest attributions of Shakespearean texts, and to discover the identity of the mysterious Italian author who writes under the pen name of Elena Ferrante.

In our study (an early version was done by Eder and Rybicki for the Wall Street Journal), we visualized the patterns of similarity and difference among the more or less plausible rivals to Harper Lee in the authorship of her bestselling novel (and several other authors of the American South for background). The diagram below, called a network analysis, shows the strength of similarity between texts by those writers. The closer two texts are, and the thicker the line that joins them, the more resemblance there is between them.

reproduced by permission of the Mississippi Quarterly

Given the strength of the connection between Lee’s Watchman and Mockingbird (the thick light blue line), we have absolutely no doubt that the two texts were authored by the same person. Neither Tay Hohoff, with her Cats and Other People and A Ministry to Man, nor Truman Capote, with his various works, comes close to Lee. This visualization also shows a very frequent result in this type of “reading by counting”—quite often, an author’s stated preference for another is reflected in their relative similarity. This is the case with Lee and one of her favorite writers, Eudora Welty.

Stylometry also makes it possible to look for authorial traces within a single work authored or edited by more than one hand. Eder’s “rolling classify” technique tests the authorial signal of consecutive samples of a novel against that of individually written texts by the candidate authors.

reproduced by permission of the Mississippi Quarterly

In the diagram above, three authorial signals are juxtaposed against one another on the axis of story development for To Kill a Mockingbird. The horizontal line represents the unfolding text and consecutive chapters. While—as can be seen from the previous analysis—the overall authorship of both books leaves no doubt, the authorial fingerprints in particular sections of the book seem to belong either to Harper Lee (in green), dominating most of the text, or, in some fragments, to Tay Hohoff (in blue). While this mixture is quite a natural effect of the women’s two years of collaboration, the presence of Truman Capote’s signal (in red) in the opening chapters of To Kill a Mockingbird is somewhat surprising. Could this residue be the outcome of their shared narrative of their upbringing in Monroeville, Alabama?

Did Capote indeed help his younger colleague a bit with the very opening sections of the book? This stylometric study encourages us to give more thought to Capote’s role in the early stages of Lee’s writing endeavor. Of course, the three writers concerned are no longer among us, so we will never know for sure what transpired. With stylometric analysis, some speculations are put to rest, while others are born.

Read more about the stylometric study of Harper Lee in Michał Choiński, Maciej Eder, and Jan Rybicki’s article, “Harper Lee and Other People: A Stylometric Diagnosis,” Mississippi Quarterly 70 (3/2017), 355–74.

Jacket image Southern Hyperboles

Available Now

In Southern Hyperboles: Metafigurative Strategies of Narration, Michał Choiński confronts the often paradoxical and excessive elements of southern literature, focusing on dominant narrative modes and representation strategies in works produced from the early 1930s to the late 1950s. With renewed attention to renderings of the gothic and grotesque, Choiński argues that modernist literature from the U.S. South often deploys the trope of hyperbole, which escalates contrasts and disrupts the sense of the normal.

Michał Choiński is assistant professor in the Institute of English Studies at Jagiellonian University in Kraków, Poland. His previous books include The Rhetoric of the Revival: The Language of the Great Awakening Preachers. In 2021, he will work as a senior Fulbright Fellow at Yale University.

Follow LSU Press on Twitterand Facebook.