How should symbolic data like text and images be analyzed? This question is now front and center, instigated by the appearance of Biernacki’s Reinventing Evidence. This question is important because symbolic data continue to proliferate in advanced capitalism. How to capitalize on this proliferation is an aspect of both contemporary social science and private business action.
Biernacki’s book strives to be an empirical critique of an empirical approach to handling symbolic data, namely, the coding of textual data. It is an empirical work because it examines actual events — the publications of what the author seeks to demonstrate as flawed findings based on the coding of textual data — and uses this “data” to make the argument.
Researchers, many of whom want to take advantage of the proliferation of symbolic data, are faced with a question of method. What does my research question call for? Should I turn symbolic data like text and images into numerical representations, allowing me to study oodles of data? Or should I keep symbolic data “whole,” deducing meanings one by one, or rather, case by case?
So, these are two options: the numerical and wholistic approaches.
Biernacki’s book argues in favor of the wholistic approach. Actually, it tries to demonstrate this superiority, by showing that particular empirical studies using the numerical approach have led to results unconfirmed and even rejected by the wholistic approach. The central accusation, then, is on the level of validity: the numerical approach, in key instances, lacks it.
The author is criticizing an approach, if not widely used, widely believed to be legitimate in sociology research. Among sociologists who spend time on the internet, perhaps this legitimacy is even more ingrained.* So, structurally, a broad audience for the book would (or will) require the engagement of those with already formed beliefs and incentives to disagree. This book is meant for debate, and will be engaged by an audience tilted against the argument from the start.
Right now on the internet there are two main critics.
One is Fabio Rojas. His is the more moderate of the two. At Orgtheory, he writes:
Richard Biernacki claims that coding textual materials (books, speech, etc) is tantamount to committing gross logical errors that mislead social scientists. Overall, I think this point is wrong but I think that Reinventing Evidence does a great service to qualitative research by showing how coding of texts might be critiqued and evaluated.
Rojas then proceeds to defend coding by pointing to the replicability of the data coding produces. Other researchers can put under scrutiny the very same data. Which is an important point, except the argument Bierbacki presents in the book is on the level of “validity” — that these very replicable data are producing wrong or flawed insight.
So, Rojas, fairly I think, spends his next paragraph on validity. And, even though he says he disagrees with the thesis of the book, when it comes to “validity,” Rojas concedes a lot of ground.
Assuming that Biernacki reports his results correctly, he’s persuaded me that we need better standards for coding text. For example, he finds that Bearman and Stovel use an abbreviated version of the memoir – not the whole thing. Big problem. Another issue is how the network of text is interpreted. In traditional social network analysis, centrality is often thought to be a good measure of importance. Biernacki makes the reasonable argument that this assumption is flawed for texts. Very important ideas can become “background,” which means they are coded in a way that results in a low centrality score. This leads to substantive problems. For example, the Nazi mentions anti-semitism briefly, but in important ways. Qualitatively we know it is important, but the coding misses this issue.
I would characterize Rojas’s position to be that previous attempts to code symbolic data — to turn text and images into numerical representations — have indeed been flawed, but this is a problem of poor execution, not necessarily unredeemable methods or methodological theorizing.
The second critique in one sense appears to give less ground, because it is more dismissive and antagonistic in spirit. Andrew Perrin, at Scatterplot, concluded rather harshly (especially compared to Rojas) that:
In short, while there are some apt points in the book, in general it is pompous in style, muddled in evidence, vastly overstated in scope, mean-spirited in approach, and epistemologically indefensible.
He also points to some academic politics behind the book, or as he calls it, the “controversy surrounding its publication.” I cannot speak on any of that. But on the central question of validity, Perrin concedes a ground similar to Rojas. On Biernacki’s argument that “each of the three studies made problematic interpretive choices,” he writes:
Within each replication, there are numerous examples of problems Biernacki locates with the sampling or analytic decisions. One example among many: “For all we know, had Bearman and Stovel calculated the role of anti-Semitism in comparison to variables defined more concretely, anti-Semitism might pop out as ‘high’ in power centrality” (44). There is an extended discussion of selectivity in quote selection by Griswold, who “expounded on how several reviews treated the Trumper ‘scene’ in a novel, whereas most of her preferred examples barely alluded to it” (128).
Biernacki’s criticisms here seem generally believable, but given the overall hostile tone of the book and the fact that there is no response from the authors, I’d tend to withhold judgment on the specifics. Evans has provided a comprehensive, and IMHO convincing, response in the prior book.
[bold is mine]
Even taking into consideration its many qualifiers, that last paragraph makes it apparent that Perrin thinks Biernacki made a solid argument on validity. And that the argument deserves further response.
I hope this further response takes place on the web.
In future posts I will discuss other engagements with Biernacki’s argument, including the comments underneath the blog posts, and if there is anything to add after that I will review the book in light of my own theoretical dispositions.
*In terms of my own dispositions, and in full disclosure, I tend to see coding as obviously valuable, but dependent on the research question. The value depends on the unit of analysis. Coding oodles of data makes sense when trying to generalize to populations, which encompass and cross multiple contexts. If one is studying a particular case, however, keeping symbolic data “whole” and deducing meaning straight from the source, in my experience, leads to the best — i.e. most valid — insight.