Saturday, November 28, 2009

Peer review, Climategate, qualitative research

At Smart Mobs, Judy Breck points to ClimateGate's implications for peer review. "Regardless of one’s opinion on global warming, the leaked emails now making headlines are interesting for another issue. They are shining light into a growing challenge to scientific publication peer review."

Right: Like her, I don't intend to engage in the question of global warming, but rather in what the incident means for peer review in a networked world.

On the one hand, peer review allows people who specialize in a particular branch of study to bring that specialty to bear. A narrower conversation means a greater chance that everyone shares the same unspoken warrants, knows the same controversies, and works from the same set of theoretical frameworks. Researchers still have disagreements on these, but the disagreements are manageable and allow the researchers to build on previous work rather than rehashing basic issues over and over. When I use activity theory to analyze my workplace studies, for instance, I don't have to write 120 pages describing the basics of that complicated theory or the basics of collecting observational notes; I can just throw in a few cites and know that my readers - and peer reviewers - will get the gist. Without that narrower conversation, it's very difficult to make much progress; you spend all of your time explaining and/or defending base assumptions.

But on the other hand, the problem with a narrower conversation is that those base assumptions don't get challenged much. The narrow conversation becomes too narrow, the assumptions become too assumed, and consequently the conversation becomes vulnerable to outside rebuttals. Notice that I'm not saying the conversation necessarily goes wrong, but that its interlocutors forget how to, or neglect to, consider and preemptively rebut arguments that don't share those warrants. It's a problem with the design of the argument.

One side effect is that data are locked down rather than being made transparent. And that's turned into a major issue with ClimateGate, with allegations that original data haven't been shared with other researchers - and that original analyses can't be reproduced from them. The result, of course, is uncertainty and distrust. (As I've discussed elsewhere, one key element of netwar is introducing uncertainty into the declarations of your enemies.)

The antidote, one would think, is exposure to one's original dataset. In the past, this was difficult to achieve, since the dataset had to be published and distributed, and that incurred expenses that were sometimes significant. Now, however, it's easy to post datasets. The genie is out of the bottle; there's no solid excuse not to publish datasets online and show one's work, preferably by publishing open source code so that others can reproduce your results. It's a basic confidence-building move - especially when your work can substantially drive public policy and expenditures.

But as I discuss in a recent chapter, field researchers are in a bind. On one hand, we can't share our raw data. Unlike temperature data from climate stations, our data are tied directly to human subjects and we can't expose data that would identify them or endanger them (even if the danger involved making them vulnerable to, say, mild reprimands in their jobs). On the other hand, the human subjects are free to comment on our work. In the past, they didn't have an avenue to make such comments unless they were willing to buy their own printing presses; now they can post on blogs or in comments. Like the climate change scientists, field researchers could make a better case by exposing their datasets - but unlike them, we are forbidden to do that by our institutional review boards. Two sets of ethics collide.

So what can we do, as field researchers, to avoid such collisions?

Expose what we can. I am personally very interested in the notion of pushing the bounds of what we can expose. How far can we go in publishing our raw field notes, interviews, and artifacts, and in showing how our coding schemes have been applied to them? Can we set up protocols that anonymize data sufficiently that we can publish lightly modified raw data? I think we need to be prepared to do so, if IRBs approve.

Triangulate with public data. Another alternative would be to triangulate field research with vast corpuses. For instance, this summer I became very interested in the health care town hall protests, which had been characterized by various sides as (a) centrally coordinated astroturfing and as (b) pure, grassroots outrage. I blogged a theoretically informed interpretation, but I would love to see investigations that combine field research (observations of town halls, interviews with those who attended) with principled examinations of the public communications that coordinated the protests (websites, Facebook groups, Twitter hashtags). The field data may not be publishable in raw form, but it could be triangulated with the public communications, which researchers could summarize and code publicly. If both sets point to the same conclusions, readers may have more confidence in the closed field data.

These are just two starter ideas for opening research data and avoiding the insularity that too often comes with specialized research communities. If you have further suggestions, please leave them in the comments.

4 comments:

Aimee Roundtree said...

Thought-provoking post, as usual. Sorry I missed your presentation, "Visualizing Patterns of Group Communication in Digital Workspaces." Looking forward to proceedings or transcripts, if available.

Researchers in healthcare-related qualitative analysis also regularly practice the methods that you recommend for dataset transparency (including triangulation, reflexivity, respondent validation, attention to negative cases, etc.) Still, I'm not sure to what extent increasing transparency resolves some of the peer review issues that you mentioned. Provide the entire dataset to a peer from the same "narrow conversation," and they still might be inclined to agree or strongly sympathize with the findings. Journals often assign reviewers from the same "narrow conversation."

I'd add suggestions to your list of correctives:

(1) We should encourage co-authorship and teams of researchers who are expert in different "narrow conversations." When such diverse teams collaborate, it only refines your research as argument--even in media res of the research project.

(2) We should extend the peer review process using social networking and other technology at our disposal to solicit opinions from others outside the "narrow conversation" beyond reviewers and letters to the editor.

Clay Spinuzzi said...

All good points, Aimee. One thing I really appreciate about interdisciplinary research is that we're compelled to enter less narrow conversations!

Alan Rudy said...

Ahhh, peer review... well, as you surely know, it not only fosters shared research programs but also reinforces existing, and generally quite conservative, power dynamics within such terrain. Not always are peer reviewers assigned from within the author's conversation for example. This is not something to be addressed by making data transparent but by... admitting to and addressing issues of power.

Unfortunately, at least at many many schools and in many traditional fields, one solution - the proliferation of niche and interdisciplinary journals - runs afoul of the lower mainstream (read citation) measures increasingly used to legitimate junior faculty research AND of administrators looking for large external grants to provide overhead rather than low-budget historical and qualitative research in the social sciences.

Furthermore, as you implicitly warn might be the case, the IRB at the Research I university where I once worked mandated the physical and electronic destruction of all data after the publication of our qualitative research.

Moreover, some of what you suggest and, I think, some of the weakness of the argument being made by those complaining about climate data availability (a HUGE portion of which - including most of what skeptics have asked for - is in fact available) is that availability does little or nothing under these circumstances to resolve controversies, it simply displaces the controversy into questions of the operationalization of variables, the selection of one among a range of viable statistical measures, and/or the number and patterns of coded terms, phrases and readings - much less the interpretation of their relationships.

These kinds of staggeringly technical debates then generate a further de-democratization of exchanges over technoscientific questions... and can only really be joined by those with the status (high or low), time (personal or that of their minions/graduate students) and/or funds (where would this come from, esp in a large qualitative study?) to regularly re-assess - in microscopic detail - the work of other scientists?

I like the idea of data transparency, and I really like Aimee's encouragement of co- and team-authorship - as well as social networking our research - but how many departments and universities are anywhere near ready for that... particularly as more and more "public" universities all-but intentionally rely less and less on funds from their state legislatures?

Thanks for making me think, as always.

Clay Spinuzzi said...

Great points as usual, Alan. When writing this post, I didn't really feel that I had resolved it, and you put your finger on why. In particular, exposing data is a confidence-building move in that it allows scientists to "show their work," but as you say, it also displaces the controversy to operational measures that nonspecialists may not be able to evaluate. Unfortunately, the CRU has veered toward Scylla for so long that they now must overcorrect and sail closer to Charybdis.

It's a question of black boxing: the CRU has elected to keep the black box closed tightly, which avoids second-guessing over the messy uncertainties of analysis and the warfare over analytical tools - but also allows skeptics to suggest that the box is empty. And what is so devastating about the CRU emails, programs, and more recent disclosures is that, when opened, the box really does appear to be empty - although, yes, the data should be reconstructable from other sources and, yes, surely the CRU's data aren't the sum total proof for anthropogenic global warming. It still looks terrible; the public expected CRU scientists to have their act together, yet their data management is worse than the average American's management of their home finances.

The black box worked only because of the public's trust, and that trust is so much weaker now that the CRU must now veer toward greater transparency. Rhetorically, they will have to begin defending their specialized choices in the terms of nonspecialists, and they will have to take a lot of time to address some criticisms from nonspecialists, some of whom will never be satisfied. This is a turning point in the debate over anthropogenic global warming - not in the sense that the debate will be resolved either way, but in the sense that the CRU's arguments will need to change shape to address this shift toward transparency and the displacement of controversy from access to analysis methods.

That is, I think public trust is a component of power in the Latourean sense, and the CRU has lost a lot of that power. Where does the power go? Get ready for crowdsourcing to provide an alternate, though sporadic, unpredictable, and treacherous, alternative to peer review.