Empirical Sociology (parts one, two, and three)

(These are my working thoughts on empirical sociology. They should not be taken as generally representative of how sociologists think about their own field. Nor should they be seen as anywhere near fully worked out.)


In my view there are two dominant methods of ‘being empirical’ in sociological analysis, and they are distinguished most fundamentally by the kind of data they study — either variables or symbols. Variables are things like race, class, gender, age, education, in the idea that with highly developed skills you can control for these variables and isolate the relative significance of each variable you are able to measure. Variable data are meant to generalize to populations from random selectedly samples of the population — to see, over time, across differences, how a set of measurable factors tend to create a general reality of human action.

In contrast, symbolic data provide relatively less ability to generalize to large populations and across differences. Symbolic data are meant to uncover specific and local contexts creating a reality of human action under certain cases. Symbolic data are things like text, images, pictures, graphs, code — summed up by the word ‘knowledge.’

Both data are accessible to social scientific analysis and can justifiably be called, at the moment, a basis of empiricism.

Just for fun, I will point out one advantage in the quest to be empirical unique to symbolic data: they allow for the immediate observation of measurable instances of actual activity; a symbol is a direct representation of a human behavior. The proliferation of symbolic data in advanced societies provides a remarkable infrastructure for an empirical behaviorism whether in the vein of American pragmatists Dewey, Mead, and Mills; or in the contemporary multidimensionality of social theory heavyweights Alexander, Habermas, Bourdieu, and Giddens. Even Foucaultian categories like power/knowledge, biopower, and governmentality can today be used to analyze data, though I continue to read his categories as philosophical fictions, and Bourdieu’s habitus, symbolic power, and cultural capital as the analytical categories most readily applicable to the advanced society of the US.


Let me acknowledge these ideas are not empirical sociology as empirical sociology is predominantly understood. The prevailing paradigmatic view understands empiricism as the study of random samples generalizable to the larger population using data that are replicatable and static across time. This view of ‘being empirical’ as dependent upon being “generalizable” is represented in the prolific sociology blogger Fabio Rojas’s recent comments on how to make ethnographic data more empirical:

“Ethnography is generalizable – just not within a single study…. The solution? Increase the number of field sites. Of course, this can’t be done by one person. However, there can be teams. Maybe they aren’t officially related, but each ethnographer could contribute to the field of ethnography by randomly selecting their field site, or choosing a field site that hasn’t been covered yet.

Thus, over the years, each ethnographer would contribute to the validity of the entire enterprise”

This is not the only way to be empirical in sociological analysis. But I’ll get to that in a moment. First, Rojas misuses the word “validity” in the last sentence. His hypothetical has nothing to do with validity. He is wondering aloud how to bring random sampling into ethnographic analysis. It is a question of how to use ethnography to study populations. His last sentence should read, “over the year, each ethnographer would contribute to the generalizability to populations of the entire entire enterprise.” I don’t think pointing out the misuse of the word “validity” here is a mere nitpick.

Regardless, using random samples and high numbers of cases to generalize to populations is not the only way to be empirical. In my case, being empirical means understanding that:

1. The most important data are often unreplicatable. Data are meanings (of objects) that shift and hide and reappear and allow for the maintenance of some level of an identifiable portfolio (i.e databases).

2. At best, data produce hypotheses. Data cannot, however, definitively answer hypotheses. They can provide hypothetical, or theoretical, answers to hypotheses.

3. Theories are analytical categories that (a) come before data, shaping the research question and methods, and (b) come after data, as applications to data to make sense of them. The researcher must be reflective upon the distinction between these two aspects of theory — theory both biases the research and allows truth to emerge from data.

4. Most data are text and/or symbols: the transcripts of in-depth interviews (and quick comments); media; code; pictures, images, graphs.

None of the above is meant as criticism, for standpoints can change and are contingent upon circumstances. No doubt in my mind that Rojas’s statement reflects a version of legitimate science. My reason to present an alternative version is to explain my assertion that what I do is ‘empirical.’ I think all researchers need to continually be aware of their own claims to objectivity, the basis of which changes over time. All the above comes from a sociology of knowledge standpoint in which culture, politics, and economics represent fields of textual and symbolic performance in interaction with mediated audiences.


Social science depends on the theory of data underlying the enterprise. The types of data that prevail, and get deemed by collectors and clients as valuable and insightful, correspond to the social forces that, at least according to our best accounting, move events, shape action, and create reality.

The type of data that, at the moment, socio-historically prevail, that today proliferate, include social-psychological, knowledge-based symbols, text, and images: qualitative data. The valuable insight is of the object’s image and context, not of the essence of the object itself. If I could know anything about a social science research question I would need to know the images of objects and text in relation to the general beliefs of people. The type of data that lose value and insight in the moment are the kind of static, generalizable data that lead to having a replicatable version of causation, rather than a valid one. Causation can never be valid; only theoretically imagined. Even in your best accounting, the causation-positivist makes a leap when he makes a statement of explanation or draws any kind of a conclusion. Far better than causation is a picture of a context — an empirical drawing. Such a picture can not only explain the past but help you anticipate the future, by seeing the range of your interactions with institutions, social structures, and other actors.

How to ‘be empirical’ is a key topic; more parts to this inquiry are to come.

Part one, two, and three originally published here, here, and here.

This entry was posted in hard data, Symbolic data, symbolic vs hard data. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s