Not denying the data

A friend of mine and I recently discussed a comparison between how researchers research and how programmers program. The focus was on “finding effects” and “debugging.” I excerpt his comment here:

Most programmers are poor debuggers who guess at fully-formed theories about why their programs are misbehaving (e.g., “could it be that, in module A, X is happening when Y?“) and then try to prove them right, learning very little–or often nothing, because they forget the random little facts they have learned–each time they guess wrong. Truly good debuggers try to prioritize the experiments that can most easily yield the most information (e.g., “is there a quick way to establish whether the problem is in module A or module B?“). They are relatively rare.

I really like this point. Many social scientists have a theory and try to figure out why they’re not finding the results that they “know” they should find.

This isn’t necessarily a bad thing. Eric Leifer has a classic piece about Denying the Data (Gated, sigh), one that presages Judea Pearl’s recent and very good The Book of Why. Their point is that data can tell you very little without a theory. (Pearl is excellent on how the correlational structure of observed data can never tell you about counterfactual outcomes, which are the building blocks of virtually all causal theories.) The irony though is that, even as many researchers espouse a sometimes-misplaced belief in listening to their data, they don’t listen to it. People often skip any theoretical integration of the empirical “anomalies” they find in their data. Yet these are important! Indeed, these are what Thomas Kuhn pointed to as the fuel of new theories–of paradigm shifts, in his now-overused parlance.

I obsess on this point because one of my “tricks” or heuristics when doing research is to pay attention to how the limitations of data or empirical methods can subtly bias our theorizing. There’s a good enough encapsulation of this idea in Dana Mackenzie’s The Universe in Zero Words, which I re-read yesterday, that it’s worth quoting at length. Mackenzie is discussing one of the roots of chaos theory, Edward Lorenz’s attempted simulation of weather using a system of non-linear partial differential equations.

To recap for those just joining us: Lorenz ran his simulation one day in 1963, then had cause to run it again later on. When he ran it the second time, the (deterministic!) simulation produced wildly different results. Lorenz eventually realized that, when he programmed the second run of the simulation, he’d truncated the starting values for his parameters. The truncations were tiny–like, the sixth and seventh decimal places–and prevailing thinking held that they just didn’t matter for the long-run behavior of the system. Yet his results said otherwise. Here was one of the first hallmarks of what is now often called chaos theory: extreme sensitivity to initial starting conditions. It is this idea, combined with fundamental limits on the precision of our measurement, that rules out things like long-run weather prediction. Now I’ll hand the mic to Mackenzie:

The importance of Lorenz’s paper was not immediately apparent; it was buried in a specialist journal, read only by meteorologists. However, the same process repeated in other disciplines. Michel Hénon, an astronomer, discovered chaos in the equations governing stellar orbits around a galaxy’s center. David Ruelle, a physicist, along with mathematician Floris Takens, discovered strange attractors in turbulent fluid flow. Robert May, a biologist, discovered chaos in the simplest system of all: a one-variable equation that had been used for years to model populations competing for scarce resources.

Each of these pioneers was isolated at first, and they all faced disbelief from other scientists. A colleague of Lorenz, Willem Markus, recalled in James Gleick’s bestselling book Chaos: Making a New Science what he told Lorenz about his equations: “Ed, we know–we know very well–that fluid convection doesn’t do that at all.”

This incredulity is perhaps a typical reaction to any paradigm-altering discovery. In the case of chaos there were specific reasons why mathematicians and other scientists had been so blind for so long. When mathematicians teach their students differential equations, they concentrate on the simplest, most understandable cases. First, they teach them to solve linear equations. Next, they might teach them about some simple two-variable systems, and show how the behavior of solutions near a fixed point can be described by linearizing. No matter what the number of variables, they will always concentrate on equations that can be solved explicitly: x(t) is given by an exact formula involving the time t.

All of these simplifying assumptions are perfectly understandable, especially the last one. Solving equations is what mathematicians do…or did, in the years BC (before chaos). And yet these assumptions are collectively a recipe for blindness. Chaos does not occur in a continuous-time system with less than three variables; and it does not occur in any system where you can write a formula for the solution.

It is as if mathematicians erected a “Danger! Keep out!” sign at all of the gates leading to chaos. Scientists from other disciplines–biologists, physicists, meteorologists–never went past the “Keep out!” signs, and so when they encountered chaos it was something utterly unfamiliar.

The bolded emphasis is mine. Notice how empirical simplifications become implicit and lead to false certainty. This happens all the time, and it is rarely appreciated. It is a process that blocks off research on all sorts of questions; yet it is a pernicious process precisely because it is so implicit. It is pernicious because what amounts to a barricade at the frontier of knowledge is not recognized as a barricade. Willem Markus did not tell Edward Lorenz that no one knew how fluid convection worked; he told him that they knew how it worked, and it didn’t work the way Lorenz’s model suggested. This, as Mackenzie writes, is a recipe for blindness.

I have encountered instances of this several times over the years.

Employer illegality and union elections

When I was in graduate school, I started looking into the effect of employers’ intimidating or firing workers for union activity on the success of union organizing. The prevailing wisdom, based on two decades’ analysis of union-election records, was that this had almost no effect: elections where there’d been a ULP charge against the employer were won by unions at about the same rate as elections where there hadn’t been. This had led to an entire body of thinking that assumed workers’ preferences were basically fixed, that employers couldn’t affect them much, and therefore declines in success rates had mostly to be explained in terms of why workers didn’t want to join unions as much anymore.

The problem with this thinking was that researchers had been using election files that were recorded and published by the National Labor Relations Board, but these files only recorded the elections that had actually taken place! Petitioners can withdraw their requests to hold elections. Petitioners are usually unions, and they often withdraw when they think they are going to lose. Once you recognize this, then before you even do the analysis you can understand why people had gotten the results they did: they were comparing elections where there’d been no employer shenanigans to elections where there had been, but where the union decided to go ahead with the election regardless. If union organizers are OK at gauging worker sentiment, these two groups should have similar success rates–but that doesn’t tell you whether the employer’s actions raised the withdrawal rate. Lo and behold, it did, massively. And once you allow for employer actions to affect organizing drives before the elections, the whole theory of fixed worker preferences falls apart.

Recombination and innovation

There’s a theory in innovation research that inventions that combine more disparate ideas are more impactful, in terms of future inventions that build on them. This sort of “recombination” has been demonstrated with patents and their citation patterns to prior work, and it is widely celebrated. Yet there’s another body of organizational research that suggests that diversified product offerings are harder for evaluators to make sense of and to legitimate.

This is what I call the Black Baseball Player Effect. If you were to compare major-league ball players in the 1950s, you’d find that black players out-performed white ones on almost every dimension. Does this mean that being black makes you a better ball player? Not necessarily. Black players had to clear a higher threshold to appear in the major leagues. You have to account for that selection bias. Indeed, as Chris Rock has joked, you can tell that baseball truly integrated by the late 1960s/early 1970s, because that’s when we started to see mediocre black players!

I thought something similar might happen with inventions: “recombinant” inventions might be more impactful, but they might also be harder to get patents for, in which case the two effects would be conflated. I you jointly modeled application and citation, and adjusted for this differential selection, perhaps you’d find that the innovative impact of recombination was overstated. I discussed the idea with a colleague of mine who worked with patents. He told me that it was a neat idea but a non-starter–because “Virtually every patent application is approved.” That is, there wouldn’t be enough variation to do a meaningful analysis.

I took this to heart–he was the expert! and put the idea aside for more than a year. Later I had reason to start working with the European Patent Office’s data with my colleague Gianluca Carnabuci. We quickly discovered that, while almost no patent applications were explicitly rejected, only about 70 percent were approved. The remainder were mostly patent applications that were “deemed withdrawn.” These are applications where the patent examiner came back with a search report that (usually) suggested the applicants had ignored significant prior work, and either that the patent would not get granted or would have a much narrower scope (and thus be worth less). At this point, the applicants abandoned their application, and the application was “deemed withdrawn.”

This sort of thing happens all the time. There are far fewer convictions for crimes than there are crimes committed, or even than there are arrests made. Most arrests and investigations end in pleas or settlements. But if what you care about is whether an invention is granted patent, surely these applications that are deemed withdrawn are relevant. Why had researchers ignored them?

For many years, the USPTO, whose data most researchers used, did not systematically record their failed applications. Thus the USPTO data file only had successful patents in it. People eventually worried about selection bias, but when they asked about the rejection rate, they found it was extremely low. Thus it seemed not to be a problem, and soon enough the belief took hold that “every application is approved.”

For what it’s worth, yes, differential selection does bias citation rates upward for recombinant inventions.

Racial segregation between workplaces

There’s a classic argument, coming from Gary Becker, that more intense competition between firms should reduce employment segregation. You can think of segregation as a taste employers have, one that leads them to hire, not the best workers for positions, but the best workers of their preferred groups. This produces a less efficient organization, one that should be driven out of business when competition is stiff. The late and lamented Devah Pager had recently published a study that seemed to support this idea: employers who discriminated more in her audit studies were less likely to survive.

A couple years ago, Rem Koning and I thought we could look at this using the EEOC’s employment-composition surveys. In a population of firms, there are two main ways you can integrate. First, individual firms can integrate over time. Second, less-integrated firms can be replaced by more-integrated ones. We knew that racial segregation between workplaces had declined over time; if the Becker hypothesis were true, then the within-firm component should have accounted for more of the decline in less-competitive industries.

After some searching for the appropriate metric, we settled on Theil’s information statistic, and wrote Stata code to calculate between-firm racial segregation. We ran a scatter plot for yearly averages, and decided we’d written our code wrong–because it was going up.

Much investigation followed, but eventually we concluded that our code was right: racial segregation between workplaces was higher than it had been a generation earlier. How had no one noticed this?

Here again, the issue seemed to be with how data had been gathered. Most work on employment segregation leveraged individual surveys, like the GSS, CPS, ACS, and so on. In a survey, you cannot ask, “Alice, what is the racial breakdown of your employer’s workforce?” and expect a good answer. But you can ask, “Alice, what do you do for a living?” Thus almost all large-scale research studied occupational segregation. And this has declined over time–we could reproduce everyone else’s results with our data. But occupational segregation asks how predictive race is of what you do, conditional on where you work. It cannot tell you how predictive race is of where you work.

Now, the workplace of the typical white worker (including the typical white social scientist!) has diversified over the last generation. But the workforce has diversified much faster; hence the rising metrics. If you put an assumption of declining establishment-level segregation together with empirical data on declining occupational segregation, you have one of Mackenzie’s recipes for blindness.

I hasten to add that we were blind, too–recall that we “knew” that racial segregation between workplaces had declined over time at the start of the project! The trick or heuristic, here as in the other cases, was not privileged knowledge on our part. It was a willingness to pay attention to how empirical anomalies in our data might alter our theories.

That was a lot of text

I don’t have a whizz-bang conclusion to this post. There’s a lot of argument from example here. Still I think this point is under-appreciated. Often, I am described as an empiricist, but I don’t think that’s quite right. I care a lot about theory, but theories are ultimately generated from empirics. It is in the frequently staccato interaction of the two that I find the most interesting work can be done.