Saturday, November 28, 2009

Peer review, Climategate, qualitative research

At Smart Mobs, Judy Breck points to ClimateGate's implications for peer review. "Regardless of one’s opinion on global warming, the leaked emails now making headlines are interesting for another issue. They are shining light into a growing challenge to scientific publication peer review."

Right: Like her, I don't intend to engage in the question of global warming, but rather in what the incident means for peer review in a networked world.

On the one hand, peer review allows people who specialize in a particular branch of study to bring that specialty to bear. A narrower conversation means a greater chance that everyone shares the same unspoken warrants, knows the same controversies, and works from the same set of theoretical frameworks. Researchers still have disagreements on these, but the disagreements are manageable and allow the researchers to build on previous work rather than rehashing basic issues over and over. When I use activity theory to analyze my workplace studies, for instance, I don't have to write 120 pages describing the basics of that complicated theory or the basics of collecting observational notes; I can just throw in a few cites and know that my readers - and peer reviewers - will get the gist. Without that narrower conversation, it's very difficult to make much progress; you spend all of your time explaining and/or defending base assumptions.

But on the other hand, the problem with a narrower conversation is that those base assumptions don't get challenged much. The narrow conversation becomes too narrow, the assumptions become too assumed, and consequently the conversation becomes vulnerable to outside rebuttals. Notice that I'm not saying the conversation necessarily goes wrong, but that its interlocutors forget how to, or neglect to, consider and preemptively rebut arguments that don't share those warrants. It's a problem with the design of the argument.

One side effect is that data are locked down rather than being made transparent. And that's turned into a major issue with ClimateGate, with allegations that original data haven't been shared with other researchers - and that original analyses can't be reproduced from them. The result, of course, is uncertainty and distrust. (As I've discussed elsewhere, one key element of netwar is introducing uncertainty into the declarations of your enemies.)

The antidote, one would think, is exposure to one's original dataset. In the past, this was difficult to achieve, since the dataset had to be published and distributed, and that incurred expenses that were sometimes significant. Now, however, it's easy to post datasets. The genie is out of the bottle; there's no solid excuse not to publish datasets online and show one's work, preferably by publishing open source code so that others can reproduce your results. It's a basic confidence-building move - especially when your work can substantially drive public policy and expenditures.

But as I discuss in a recent chapter, field researchers are in a bind. On one hand, we can't share our raw data. Unlike temperature data from climate stations, our data are tied directly to human subjects and we can't expose data that would identify them or endanger them (even if the danger involved making them vulnerable to, say, mild reprimands in their jobs). On the other hand, the human subjects are free to comment on our work. In the past, they didn't have an avenue to make such comments unless they were willing to buy their own printing presses; now they can post on blogs or in comments. Like the climate change scientists, field researchers could make a better case by exposing their datasets - but unlike them, we are forbidden to do that by our institutional review boards. Two sets of ethics collide.

So what can we do, as field researchers, to avoid such collisions?

Expose what we can. I am personally very interested in the notion of pushing the bounds of what we can expose. How far can we go in publishing our raw field notes, interviews, and artifacts, and in showing how our coding schemes have been applied to them? Can we set up protocols that anonymize data sufficiently that we can publish lightly modified raw data? I think we need to be prepared to do so, if IRBs approve.

Triangulate with public data. Another alternative would be to triangulate field research with vast corpuses. For instance, this summer I became very interested in the health care town hall protests, which had been characterized by various sides as (a) centrally coordinated astroturfing and as (b) pure, grassroots outrage. I blogged a theoretically informed interpretation, but I would love to see investigations that combine field research (observations of town halls, interviews with those who attended) with principled examinations of the public communications that coordinated the protests (websites, Facebook groups, Twitter hashtags). The field data may not be publishable in raw form, but it could be triangulated with the public communications, which researchers could summarize and code publicly. If both sets point to the same conclusions, readers may have more confidence in the closed field data.

These are just two starter ideas for opening research data and avoiding the insularity that too often comes with specialized research communities. If you have further suggestions, please leave them in the comments.