Ethics or Economics

Having not blogged for a while I've missed a couple of juicy controversies that others have cheerfully piled into. So I'm left playing catch-up, and feeling like I'm not quite as cutting edge as I used to be. But sometimes even half-warmed leftovers can prove surprisingly tasty, so I decided that it was worth revisiting both stories in a pair of posts.

Controversy #1 relates to a paper published in Science by Ben Minteer and colleagues back in April (see what I mean about being late to the party?) which said, and I paraphrase, that there really was no reason for biologists to kill things anymore because you can get all the data you need to describe a new species from DNA samples and digital sound and image records. If you want to read the paper - and frankly there are better uses of your time - you can find it here:10.1126/science.1250953

There were only two surprising things about the paper, namely that Minteer et al bothered to write it and that Science thought it worth publishing. This particular issue has been round the block so many times, particularly in the ornithological community, that it has no tread left on its tires. I've even covered it myself in an earlier post. There really isn't any case to answer as far as the impact of scientific collecting on endangered species is concerned and those involved should have known better.

Nonetheless, an impressive array of worthies from the biocollections community formed a line to beat the living daylights out of Minteer's thesis in various blogs, interviews, etc., including nearly a hundred people who signed a riposte to the original paper that was also published in Science. The mainstream media, who like nothing better than the sight of two groups of egg-head scientists pulling what's left of each other's hair, dutifully took notice and the whole mess was extensively reported in a wide range of venues including NPR, Slate, and the CSM

As it happened, this turned out to be a good opportunity to emphasize the vital importance of collections to our understanding of the natural world, and many people from the collections community did so, eloquently and effectively, so I'm no going to rehash the arguments again. But it did make me think about a related issue, which is our oft-repeated mantra that much of the value of natural history specimens lies in their associated data.

I'm currently talking to some colleagues about a potential project on the economics of museum collections as large-scale distributed research facilities (yeah, yeah, I know… it doesn't sound all that interesting. You'll just have to take my word for it that it is). Anyway, it's made me think a lot about cost/benefit calculations.

Suppose that what we say is true, and that most of the value of natural history specimens is in their data.   Now consider the fraction of curation costs per specimen that is devoted to data storage and distribution versus physical storage and specimen access. My guess - and it is only a guess, I haven't quantified it (yet) - is that the cost of storing and serving data is significantly less than the cost of housing and maintaining a physical collection. So if you say that most of the value lies in the data… do I have to draw you a picture of where this is leading?

Now obviously, I'm not the first person to have thought of this. In fact, since we began the major effort to digitize the nation's biocollections, there has been a small, but persistent niggle of concern about what the long term implications will be for the collections we curate. It's usually expressed in terms of diverting some grant funds away from physical collections care and towards data capture. Since most of us are already doing some form of data capture, what we're talking about here is a relatively short-term injection of funds to accelerate the process and deal with the (admittedly gigantic) backlog. But I don't think that, as a community, we've really got to grips with what the much longer-term implications of mass digitization might be. Are we making physical collections redundant?

Clearly there's a strong counter argument, in that specimen data, in isolation, are actually not that valuable. The value is contextual - it's linked to the specimen. The specimen without data is much less valuable than the specimen with data, but the reverse is also true. Having data allows you to better interpret the specimen. It also improves your ability to study the specimen and generate more data.  To some extent, data-minus-specimen is a bit of a dead end.

This is particularly true when we enter the bright and shiny new world of "Big Data." As I'm sure you all know by now, Big Data is all about correlation, not causality. It reveals patterns, but it doesn't provide explanations for why those patterns came about, or even if they are "real" patterns, as opposed to statistical artifacts. To answer those sorts of questions, you have to go back and reexamine the sources of the data, which our case are the specimens.

But what exactly are those specimens, or rather, what should they be? Traditionally they might have been a skin and a skull, a whole animal in fluid, leaves and flowers on a herbarium sheet, a pinned insect, a microscope slide, or something else depending on the discipline concerned. Now these "traditional" preparations are likely to be supplemented by tissue samples, digital imagery, sound and video recordings, etc. And, of course, data - because the data are an integral part of the specimen.

There's a cost/benefit curve to every specimen. The cost is what it takes to collect, prepare, house, maintain, and provide access. The benefit is what you get out of it in terms of research, education, entertainment, etc. The calculations are complex and "value" may be positively or negatively impacted by a number of factors: for example, the number of other specimens in existence, changing research priorities, the invention of new analytical techniques. But just because its hard to do this, doesn't mean that it shouldn't be done.

What we have't really grasped about digitization, IMO, is that the curve is changing. We're still stuck with a paradigm that assumes that most users will want/need to physically access a traditional prep type housed in a museum. That might be true, but we need to quantify and justify it if we're to continue to argue for resources. At the moment, our most sophisticated argument seems to be that we can't predict how collections will be used in the future, so we'd better not change anything now. If pressure continues to build on funding, as it likely will do, then we need a more nuanced and better supported position.

Minteer et al used an ethical argument to challenge our traditional methods of collecting, but perhaps they'd have been more successful deploying an economic one.  As a community it behoves us to think about these issues before someone else does it for us….

