This has to do with a paper that I published in 2015 but that I started working on in late 2012 and substantively wrote in 2013: “The Control of Managerial Discretion: Evidence from Unionization’s Impact on Employment Segregation.” In many ways, that paper riffs off of work that had been published a few years earlier, work that investigated the effect of firms’ diversity policies on their actual workforce composition. That work is both very good and limited: very good because it looked at longitudinal, within-firm changes rather than just changes across a population of firms; limited because the authors could not model and adjust for the self-selection of firms into adopting such policies. (All good research is limited in some way, but that’s a topic for another post.) My idea was to look at union-representation elections, because unions impose many of the same constraints on arbitrary management that diversity policies do, and yet unionization isn’t self-selected by the employer. It is self-selected by the employees, but because they do so through election records, I could focus on very close elections and use a regression-discontinuity design to identify the treatment effect net of self-selection.
Let me pause and emphasize that I was expecting to find effects like that earlier work had found. I figured I’d show similar trends, but support them with cleaner causal identification. That’d be a real contribution!
I was by myself on Christmas Eve 2012, and I spent most of the day confirming I’d built the dataset correctly and then running the analysis. Indeed, when I looked at the full dataset, I found results that looked like those earlier studies!
…And the moment I started zooming in on the closer elections, all those results went away. When you adjusted for self-selection, there appeared to be no effects at all.
I spent much of that Christmas Day in something like panic. I had this design for a paper, I had this thing I was going to show, and the results were actually the opposite. I felt like I’d just wasted several weeks–or months, if you count the time of getting the data in the first place. Then my father and step-mother came into town, and I put the paper aside for a week while they visited.
Shortly after the new year, back at Stanford, I opened up the paper and looked at what I’d written. I’d written most of the front end of the paper already. That front end wasn’t about why it was theoretically important that the control of managerial discretion improved workforce diversity. Rather, it discussed this theory, but then explained how the evidence for it had this weakness around self-selection. Then I explained my own research design, and how it would help with that problem.
At some point, reading this, the lightbulb went off. I’d gotten the opposite results from what I’d expected, but I didn’t really have to change the front of the paper at all!
That this was a revelation to me, 3.5 years into my assistant professorship, says several things about how we were implicitly taught to do research.
First, we were taught that good papers made a theoretical contribution. But a theoretical contribution was almost never couched as a contribution to a theory, such as better evidence. Rather, it was couched as a new theory. It might build theoretically on existing work, but if it only built empirically on what was out there, it wasn’t interesting.
Second, we were taught that replication studies were boring and uncreative. These were the mark of a workmanlike but probably uninspired student who couldn’t come up with their own ideas. (This probably isn’t obvious today, as the social sciences are roiled by the replication crisis, but when I was starting graduate school sixteen years ago, it was assumed that replication studies would replicate.)
Third, we were taught that “You can’t learn anything from a null result,” full stop.
Today I disagree with all of these points, which I’ll detail in a moment; but what’s really striking here is that I’d learned about causal identification and research design since my first classes in graduate school. In my first research design class, we’d devoted a ton of time to the Fundamental Problem of Causal Inference. In labor economics, Carolyn Hoxby had drilled us on the Program Evaluation Problem and the Heckman/Lalonde debate. Chris Winship taught a whole class on “The New Causal Analysis,” preparatory to overhauling the core sociology methods class. (Yes, I got my PhD at MIT, but I did a lot of coursework at Harvard Economics and Sociology.) We read Rubin on the potential-outcomes model, Pearl on directed acyclic graphs, Angrist on instrumental variables, Van der Klaauw on regression discontinuity…hell, it was a paper with a regression-discontinuity design that had prompted this freak-out!
This all points to something I almost never see talked about. Within organizational research, and in business schools more generally, we absorbed many of the arguments for and techniques of causal identification without necessarily updating our assumptions about how knowledge generation works. We had been educated in a framework that presumed routine theory generation and predictable empirical support for those theories. Causal identification was imported as a way strengthen that support, and maybe to raise the minimum bar for what would be considered support. But little else changed. And in most places, I think it still hasn’t.
Consider those three points:
First, today I think that there are many types of contributions that research can make. New evidence in support of or against an existing theory should be considered a theoretical contribution. After all, our faith in theories is not binary. It is, basically, Bayesian. Or at least it should be.
Second, replication studies are neither boring nor uncreative. Indeed, one of the reasons they are not boring is because often earlier studies cannot be replicated. But the term “replication study” is itself over-applied. Trying to test an existing theory with a new method, with better data, with cleaner identification–none of these things is rote replication. Such studies often involve considerable creativity of their own.
Third, it is correct that you cannot learn anything from a null result in an unidentified study with observational data. But of course you can learn from a null result in a well-designed experiment. Even a quasi- or natural experiment’s null results can tell you something. The experimentum crucis, since its coining by Hooke and Newton, has been a core piece of the scientific method. The logic as it applies here is simple: if we have a theory that makes predictions, and if we agree in advance that a study design is adequate for testing those predictions, then a null result in that study should reduce our prior confidence in that theory. (I think it’s canonical to reference the Michelson-Morley Experiments here.)
Hence me, about a week after killing the results in that paper, coming to the realization that I had learned something. I’d reproduced the earlier results when I didn’t control for self-selection, then killed them off when I did control for it. This shouldn’t make me despair over the study; it should reduce my confidence in the theory.
This was the most liberating moment of my early career. I’d been socialized to have one of two responses at this point. Having assembled my data and found null results, I was either supposed to abandon the project (maybe put it in a file drawer) or continue my “exploratory analysis” until I found out why there wasn’t an effect–in the process finding an effect elsewhere–and “reframe” the paper around that. (At this point someone may say that that wasn’t what I was “supposed” to learn from my training. Maybe. But I’m a reasonably intelligent man and a good student, and this all seemed pretty unequivocally communicated to me. More, I’ve talked to enough of my colleagues to know that I wasn’t the only one who imbibed these beliefs.) But, I now realized, I didn’t have to do either of those things. I’d found a null result, one that contradicted earlier research, but I thought it was right. That null finding was a contribution in its own right. Yes, the paper would be harder to publish, probably, but that didn’t matter. I’d found something I thought was real and should stand by it.
Which brings us to the vow. That day, I vowed never again to start a project unless I thought that its question was interesting however the answer shook out.
Perhaps this sounds banal. This after all is how science is “supposed” to work. But my experience is that it still hasn’t really sunk in in my field. When I present null-results papers, for example, I still get suggestions of different ways to slice the data such that I’d be more likely to find an effect, which (it usually follows) will make the paper easier to publish. But we’re not in this business just to publish papers, or for that matter to find effects. We’re in this business to answer questions.