is Big Data a bubble?

The Case for Little Data

Data-based business models are here to stay, I really believe that, so please do not take what I am about to say as something other than constructive criticism: Big Data is at risk of suffering over-valuation, that is, of becoming a bubble. The main problem is that Big Data are too abstract for most economic actors to extract maximize value. Most economic actors — say, all the managers of all the particular Target stores, rather than the one CEO of the whole Target Corporation — operate in non-aggregative contexts: little neighborhoods within big data streams. One Target store is not the same as any other. Pinpointing the parameters of these localized neighborhoods, pinpointing the relevant data, ignoring the irrelevant data, and seeing patterns in small numbers of cases — this is the future of data analysis. Moving between big and little data is probably the better way to put it.

Two examples in other fields: First, I’m preparing to do some analysis of the NBA playoffs, which start Saturday. I’ll blog about this analysis here. Anyway, as I research, I am struck by the correlative power of two stats — offensive and defensive efficiency. The top ten teams ranked by offensive efficiency (points scored per 100 possessions) are in the playoffs. Defensive efficiency is even more correlative: the top thirteen teams in def eff are in the playoffs. So far, so good. The problem is, the stat measures too much data — all 66 games played in the regular season. I want more localized data: I want the measure of efficiency ONLY against other playoff teams. I want to make sure a team like 76ers (third in def eff) is not padding this stat against non-playoff teams. In this case, I want less data, not more.

Second example: A debt collector I recently interviewed for a research project described being prepared to talk to a debtor in two parts: first, having a statistical sense of how to approach a call with a debtor. Where does the debtor live (a thousand dollar bill is one thing in NYC and another thing in Alabama); what is the debtor’s credit rating; is the debtor employed; etc. Reflecting upon these data very quickly GOING INTO the call so that you establish for yourself a baseline strategy. However, the collector reported that these pre-conceived strategies based on variable data almost invariably spring leaks as soon as the call begins. All debtors are different, all phone conversations are different, so on and so on. So, second, according to the collector, you have to set aside the baseline strategy and respond to the immediate data of the call: what the debtor is saying, and how the debtor is saying it, tells you how to approach the call, whether or not it squares with the pre-conceived notions you had, based on data, going into the call. Use the big data as a starting point, but, for a successful collection call, operate according to the little data provided by the n of 1.

Similarly, the Target managers need to see their stores as both part of the aggregate, and as an n of 1.

Bottom line: data-based, pre-conceived notions matter. Big data matter. But so does the process of contextualizing these notions as you move through the operation collecting little data.

Debt collection, as well as basketball analysis, and retail, are abstract and contextual, and require skill at using big and little data, at moving between big and little data.

This entry was posted in contextualized vs aggregative data, Symbolic data, symbolic vs hard data, theoretical drivel, Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s