analyzing symbolic data: a sociology web debate

How should symbolic data like text and images be analyzed? This question is now front and center, instigated by the appearance of Biernacki’s Reinventing Evidence. This question is important because symbolic data continue to proliferate in advanced capitalism. How to capitalize on this proliferation is an aspect of both contemporary social science and private business action.

Biernacki’s book strives to be an empirical critique of an empirical approach to handling symbolic data, namely, the coding of textual data. It is an empirical work because it examines actual events — the publications of what the author seeks to demonstrate as flawed findings based on the coding of textual data — and uses this “data” to make the argument.

Researchers, many of whom want to take advantage of the proliferation of symbolic data, are faced with a question of method. What does my research question call for? Should I turn symbolic data like text and images into numerical representations, allowing me to study oodles of data? Or should I keep symbolic data “whole,” deducing meanings one by one, or rather, case by case?

So, these are two options: the numerical and wholistic approaches.

Biernacki’s book argues in favor of the wholistic approach. Actually, it tries to demonstrate this superiority, by showing that particular empirical studies using the numerical approach have led to results unconfirmed and even rejected by the wholistic approach. The central accusation, then, is on the level of validity: the numerical approach, in key instances, lacks it.

The author is criticizing an approach, if not widely used, widely believed to be legitimate in sociology research. Among sociologists who spend time on the internet, perhaps this legitimacy is even more ingrained.* So, structurally, a broad audience for the book would (or will) require the engagement of those with already formed beliefs and incentives to disagree. This book is meant for debate, and will be engaged by an audience tilted against the argument from the start.

Right now on the internet there are two main critics.

One is Fabio Rojas. His is the more moderate of the two. At Orgtheory, he writes:

Richard Biernacki claims that coding textual materials (books, speech, etc) is tantamount to committing gross logical errors that mislead social scientists. Overall, I think this point is wrong but I think that Reinventing Evidence does a great service to qualitative research by showing how coding of texts might be critiqued and evaluated.

Rojas then proceeds to defend coding by pointing to the replicability of the data coding produces. Other researchers can put under scrutiny the very same data. Which is an important point, except the argument Bierbacki presents in the book is on the level of “validity” — that these very replicable data are producing wrong or flawed insight.

So, Rojas, fairly I think, spends his next paragraph on validity. And, even though he says he disagrees with the thesis of the book, when it comes to “validity,” Rojas concedes a lot of ground.

Assuming that Biernacki reports his results correctly, he’s persuaded me that we need better standards for coding text. For example, he finds that Bearman and Stovel use an abbreviated version of the memoir – not the whole thing. Big problem. Another issue is how the network of text is interpreted. In traditional social network analysis, centrality is often thought to be a good measure of importance. Biernacki makes the reasonable argument that this assumption is flawed for texts. Very important ideas can become “background,” which means they are coded in a way that results in a low centrality score. This leads to substantive problems. For example, the Nazi mentions anti-semitism briefly, but in important ways. Qualitatively we know it is important, but the coding misses this issue.

I would characterize Rojas’s position to be that previous attempts to code symbolic data — to turn text and images into numerical representations — have indeed been flawed, but this is a problem of poor execution, not necessarily unredeemable methods or methodological theorizing.

The second critique in one sense appears to give less ground, because it is more dismissive and antagonistic in spirit. Andrew Perrin, at Scatterplot, concluded rather harshly (especially compared to Rojas) that:

In short, while there are some apt points in the book, in general it is pompous in style, muddled in evidence, vastly overstated in scope, mean-spirited in approach, and epistemologically indefensible.

He also points to some academic politics behind the book, or as he calls it, the “controversy surrounding its publication.” I cannot speak on any of that. But on the central question of validity, Perrin concedes a ground similar to Rojas. On Biernacki’s argument that “each of the three studies made problematic interpretive choices,” he writes:

Within each replication, there are numerous examples of problems Biernacki locates with the sampling or analytic decisions. One example among many: “For all we know, had Bearman and Stovel calculated the role of anti-Semitism in comparison to variables defined more concretely, anti-Semitism might pop out as ‘high’ in power centrality” (44). There is an extended discussion of selectivity in quote selection by Griswold, who “expounded on how several reviews treated the Trumper ‘scene’ in a novel, whereas most of her preferred examples barely alluded to it” (128).

Biernacki’s criticisms here seem generally believable, but given the overall hostile tone of the book and the fact that there is no response from the authors, I’d tend to withhold judgment on the specifics. Evans has provided a comprehensive, and IMHO convincing, response in the prior book.

[bold is mine]

Even taking into consideration its many qualifiers, that last paragraph makes it apparent that Perrin thinks Biernacki made a solid argument on validity. And that the argument deserves further response.

I hope this further response takes place on the web.

In future posts I will discuss other engagements with Biernacki’s argument, including the comments underneath the blog posts, and if there is anything to add after that I will review the book in light of my own theoretical dispositions.

*In terms of my own dispositions, and in full disclosure, I tend to see coding as obviously valuable, but dependent on the research question. The value depends on the unit of analysis. Coding oodles of data makes sense when trying to generalize to populations, which encompass and cross multiple contexts. If one is studying a particular case, however, keeping symbolic data “whole” and deducing meaning straight from the source, in my experience, leads to the best — i.e. most valid — insight.

This entry was posted in book reviews, contextualized vs aggregative data, hard data, Media and knowledge, sociology, Symbolic data, symbolic vs hard data. Bookmark the permalink.

6 Responses to analyzing symbolic data: a sociology web debate

  1. andrewperrin says:

    Thanks for your thoughful discussion. However I think you’ve missed a central point in Biernacki’s book, and therefore in my review. Specifically, Biernacki abstracts from the specific failures of the three studies to a generic, epistemological critique of coding. Essentially: because the three studies considered showed flaws in their use of coding, coding itself must be flawed, not just in its implementation but in its very theory. This claim is certainly not defensible based on the book, and it appears from your review that you, too, reject it, in which case you must reject the claims made in the book.

  2. Thomas says:

    @Andrew: I think Biernacki’s point is a bit stronger than you suggest. He’s not saying that three arbitrary failures show that there is something wrong with coding as a method. He’s arguing that three *celebrated* cases of coding, which reveal obvious flaws when scrutinized, show that there’s something amiss in the coding community. That’s why he talks about coding as a “ritual”. It’s not just that the three studies “showed flaws” when he happened to look at them; it’s that these flaws had not been noticed by peers.

    • markaustenwhipple says:

      I think the distinction Thomas makes is important. Andrew Perrin appears to suggest that the three works aren’t a large enough sample to generalize to all coding efforts. (For what it is worth, I agree.) But qualitative researchers become skilled in picking out informants. For example, if I can locate the right participant, I can get the equivalent of five or more in-depth interviews by doing just one. People who ask questions themselves, who have varied contacts, who are themselves curious, etc. Such “informants” are an underrated way of increasing efficiency while not surrendering any validity.

      My point is these three works generally come off to me (though I am not the best judge) as especially important, or “celebrated,” as in Thomas’s formulation. In other words, these books potentially possess qualities that are “informant”-like: they represent significant parts of the whole. When multiple informants are telling me similar things is when I begin to get confident in the validity of the hypothesis.

  3. markaustenwhipple says:

    I think you are right. I do reject the most literal reading of Biernacki’s argument.

  4. andrewperrin says:

    The issue, in my mind, is about the extrapolation from three exemplary cases (leaving aside, for the moment, whether these three really are so exemplary as Biernacki claims). Biernacki aims to make a theoretical point about the potential and pitfalls of coding in general. Even if we grant his wholesale dismissal of these three studies, we would still need to accept his theoretical claim that chunks of text in general cannot be interpreted outside their specific textual and cultural contexts. That claim does not follow from the dismissal of the three studies, and IMHO is utterly indefensible. It is certainly not demonstrated in Reinventing, and to the extent that the specific criticisms of each study are correct, these criticisms undermine the claim that coding itself should be rejected on theoretical grounds.

  5. Pingback: A second post on the comments around Biernacki’s Reinventing Evidence | price of data

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s