Science —

DOIs and their discontents

Why Ars has standardized providing reference information using DOIs in its …

Those of you who are regular readers of Ars' science content are probably aware of our use of Digital Object Identifiers, or DOIs, which act as online reference information, taking readers directly to the papers being discussed. Readers almost never comment about that feature, except when it fails, in which case we invariably hear about it—and it fails at least once a month. We've tried explaining both our reasons for using them and the reasons they break in the forums, and have recently linked to Ed Yong's excellent discussion of the system and its problems. Within a week, we were dealing with complaints due to a broken DOI. So, this is an attempt to provide a comprehensive description of the DOI system, why we use it, and why it doesn't always work smoothly.

Referencing, effort, and reward

For most of our readership, reading an Ars science article is the beginning and end of their exposure to a topic. But we also have a notable population of scientists who read, and they may find themselves interested in reading the academic paper that led to our coverage. There are any number of good reasons for doing that: the paper may be relevant to their work, they may want details we did not provide in our coverage, they suspect we might have gotten something wrong and want to correct us, etc. As a result, some form of reference to the paper is a definite good—it's a benefit for some of our readership, and may help correct errors that are read by the rest of our audience.

Traditionally, academic references have been handled with text that identifies (at a minimum) the journal, authors, and time of publication. There are several problems with using this. For starters (as anyone who can remember the pre-PubMed days knows), it's error prone. Since it's a lot of work to get right, it would add significantly to the workload of our authors, who already go well beyond the call of duty when it comes to effort. It also adds to the effort involved in navigating to the appropriate issue of the journal and finding the paper, so it doesn't serve our readership.

Finally, this approach is simply becoming obsolete. Many online-only journals, including a number from the Public Library of Science, have given up on the idea of publishing in volumes and issues—when a paper is ready, they simply publish it. It's also future-proof, as DOI's should still function if scientific publishing switches to video or some format you can just download directly to your brain.

The DOI system as a solution

For precisely this reason, PLoS handles publishing documents using DOIs. Each manuscript, prior to publication, is given a string that acts as a unique identifier and helps you locate the document. In short, it is a reference, updated for the digital age. If you have a DOI, you can visit dx.doi.org, plug it in, and you will be taken to the paper. This will happen even if the journal that originally published it has changed names, changed owners, or moved its servers to a different country. Better yet, you can simply append the DOI to dx.doi.org, and you'll be redirected to the document.

This works because the first portion of a DOI identifies the organization that owns the document, and what the relevant URL is. So, if a publisher decides to move or rename their server, they can simply update the DOI record for that organization, pointing it to the new location. On the receiving end, the second part of the DOI record comes in, as the server must implement a document resolver that takes the unique identifier and directs a browser to the appropriate document.

If everything's working properly, the DOI system should be the perfect balance of convenience (it's relatively easy to create a DOI link for an article) and utility (the link is more convenient than a traditional reference, and more likely to persist than a standard URL to a publisher's website).

Why DOIs fail

If it's the perfect system, why the regular complaints that a DOI isn't working? To begin with, a DOI doesn't eliminate the possibility of error, either on our end or at the journal's. We simply copy and paste the DOI from the original document, however, so the errors on our end have been rare.

Instead, problems typically arise because, as press, we're given full access to both papers and their DOIs well before they appear online. Most journals provide this access under an embargo: we agree not to release our articles until the time that the journals specify. Unfortunately, even for the most fast-moving journals, that time is typically several hours before the actual articles appear on the journal's website. So, anyone who's reading quickly is likely to find that the DOI fails.

But that rule only applies to the fast-moving journals, like Nature and Science. Many other journals can take a few days between when they allow journalists to write about a paper and when it becomes available to the scientific community—PNAS, which is a major source of material for us, falls in that category. Ed Yong noted one case where it took months.

Ethically, it's debatable whether it makes sense to give journalists access to papers ahead of the community that those papers are intended to inform. But it's definitely not appropriate to allow extensive public commentary on a research publication without allowing any working scientists the opportunity to vet the accuracy of that commentary. When the embargo lifts, the paper should be available.

If it's not, your best course of action isn't a complaint to our writers or in our forums. It's to contact the journal involved, and let them know that their system is creating problems. Unless they hear from the community that they are ostensibly serving (scientists), they'll have little impetus to fix the system.

Channel Ars Technica