Friday, November 21, 2008

Open Access: The question of quality

Does Open Access (OA) publishing mean having to accept lower-quality peer-reviewed journals, as some claim, or can we expect OA to improve quality? How good are the current tools used to measure the quality of research papers in any case, and could OA help develop new ones?

I started puzzling over the question of quality, after a professor of chemistry at the University of Houston, Eric Bittner, posted a comment on Open & Shut in October. Responding to an interview I had done with DESY's Annette Holtkamp, Bittner raised a number of issues, but his main point seemed to be that OA journals are inevitably of lower quality than traditional subscription journals.

With OA advocates a little concerned about the activities of some of the new publishers — and the quality of their journals — we need perhaps to ask the question: could Bittner be right?

"The problem with many open-access journals is a lack of quality control and general noise," Bittner wrote. "With so many journals in a given field, each competing for articles — most of which are of poor quality — it's nearly impossible to keep up with what's important and sort the good from the bad."

He added, "I try to only publish in journals with high impact factors. For grant renewals, promotion and annual merit raises, an article in PRL or Science counts a lot more than 10 articles in a no-named journal."

The Impact factor

Like most researchers, Bittner appears to believe that the best tool for measuring the quality of published research is the so-called journal impact factor (IF, or JIF). So apparently does his department. Explained Bittner:

"[O]ur department scales the number of articles I publish by the impact factor of the journal. So, there is little incentive for me to publish in the latest 'Open Access' journal announced by some small publishing house."

What Bittner didn't add, of course, is that some OA journals have an IF equal to, or better than, many prestigious subscription journals. The OA journal PLoS Medicine, for instance, has an impact factor of 12.6, which is higher than the 9.7 score of the widely-regarded British Medical Journal (BMJ).

PLoS Biology, meanwhile, has an impact factor of 13.5.Another point to bear in mind is that many OA journals are relatively new, so they may not have had sufficient time to acquire the prestige they deserve, or an IF ranking that accurately reflects their quality — not least because there is an inevitable time lag between the launch of a new journal and the point at which it can expect to acquire an impact factor score, and the prestige that goes with that.

As OA advocate Peter Suber put it in the September 2008 issue of the SPARC Open Access Newsletter (SOAN), "If most OA journals are lower in prestige than most TA [Toll Access, or subscription] journals, it's not because they are OA. A large part of the explanation is that they are newer and younger. And conversely: if most TA journals are higher in prestige than most OA journals, it's not because they are TA."

In short, if some OA journals appear to be of lower quality than their TA counterparts, this may simply be a function of their youth, and say very little about their intrinsic value.

Blunt instrument

In order to properly assess Bittner's claim we also need to ask how accurate impact factors are, and what they tell us about the quality of a journal.

Devised by Eugene Garfield over fifty years ago, a journal impact factor is calculated by dividing the number of citations the papers published in a journal receive in a given year by the total number of articles published in the two previous years.

Journal impact factors are published each year in the Journal Citation Reports — produced by ISI, the company founded by Garfield in 1960 (and now owned by the multinational media company Thomson Reuters). The scores are then pored over by journal publishers and researchers.

How much does an IF tell us about the quality of a journal? Not a lot, say critics — who believe that it is too blunt an instrument to be very useful. Moreover, they argue, it is open to misuse. As the Wikipedia entry puts it, "Numerous criticisms have been made of the use of an impact factor. Besides the more general debate on the usefulness of citation metrics, criticisms mainly concern the validity of the impact factor, how easily manipulated it is and its misuse."

We also need to be aware that ISI's Master Journal List consists of 15,500 journals, which is only a subset of the circa 25,000 peer-reviewed journals published today.

Importantly, the impact factor was designed to measure the quality of a journal, not the individual papers it publishes, and not the authors of those papers (Even though the IF is based on the number of citations to individual papers). This means that researchers and their papers are judged by the company they keep, not their personal quality.

Since the papers in a journal tend to attract citations in a very uneven fashion, the IF is even less satisfactory than it might at first appear to be — certainly in terms of measuring the contribution an individual researcher has made to his subject, or his value to an institution. As Per O Seglen pointed out in the BMJ in 1997, "Use of journal impact factors conceals the difference in article citation rates (articles in the most cited half of articles in a journal are cited 10 times as often as the least cited half)."

Elsewhere Seglen concluded (in the Journal of American Society for Information Science) that this "skewness of science" means that awarding the same value to all articles in a journal would "tend to conceal rather than to bring out differences between the contributing authors."

In other words, given the significant mismatch between the quality of any one paper and the other papers published alongside it, a journal impact factor says little about particular authors or their papers.

Citation impact

This means that when Bittner's department scale his articles against the IF of the journals in which he has published they are conflating his personal contribution to science with the aggregate contribution that he and all the authors published alongside him have made.

In reality, therefore, Bittner is being rewarded for having his papers published in prestigious journals, not for convincing fellow researchers that his work is sufficiently important that they should cite it. Of course, it is possible that his papers have attracted more citations than the authors he has been published alongside. It is equally possible, however, that he has received fewer citations, or even no citations at all. Either way, it seems, Bittner's reward is the same.

It is also possible to count an author's personal citations, and calculate his or her own personal "citation impact" (along perhaps with something like an h-index), but Bittner's post did not say that this is something his department does.

In any case, as the Wikipedia entry indicates, traditional citation counting is controversial in itself. Critics — including Garfield himself — have pointed to a number of problems, not least the noise generated by negative citations (where papers are cited not in order to recommend them, but to draw attention to their flaws) and self-citation. We also know that researchers routinely cite close colleagues, either in a "You scratch my back, I'll scratch yours" fashion, or perhaps in the hope that if they flatter their seniors they might win powerful new allies.

Suber sums it up in this way: "IFs measure journal citation impact, not article impact, not author impact, not journal quality, not article quality, and not author quality, but they seemed to provide a reasonable surrogate for a quality measurement in a world desperate for a reasonable surrogate."

Or at least they did, he adds, "until we realised that they can be distorted by self-citation and reciprocal citation, that some editors pressure authors to cite the journal, that review articles can boost IF without boosting research impact, that articles can be cited for their weaknesses as well as their strengths, that a given article is as likely to bring a journal's IF down as up, that IFs are only computed for a minority of journals, favouring those from North America and Europe, and that they are only computed for journals at least two years old, discriminating against new journals."

In the circumstances, it is surprising that researchers and their institutions still place so much stress on the IF. They do so, suggests Suber, because it makes their jobs so much easier. "If you've ever had to consider a candidate for hiring, promotion, or tenure, you know that it's much easier to tell whether she has published in high-impact or high-prestige journals than to tell whether her articles are actually good."

For OA journals this is bad news, since it leaves them vulnerable to the kind of criticism levelled at them by Bittner.

However, the good news is that, in the age of the Web, new tools for measuring research quality can be developed. These are mainly article-based rather than journal-based, and they will provide a far more accurate assessment of the contribution an individual researcher is making to his subject, and to his institution.

The Web, says OA advocate Stevan Harnad, will allow a whole new science of "Open Access Scientometrics" to develop. "In the Open Access era," he explains, "metrics are becoming far richer, more diverse, more transparent and more answerable than just the ISI JIF: author/article citations, author/article downloads, book citations, growth/decay metrics, co-citation metrics, hub/authority metrics, endogamy/exogamy metrics, semiometrics and much more. The days of the univariate JIF are already over."

In order to exploit these tools effectively, however, the research corpus will first need to be freely available on the Web (i.e. Open Access), not locked behind subscription firewalls. Consequently, the scholarly community at large will need to embrace OA before it can hope to benefit greatly from them.

It means, however, that in addition to making all research freely available, OA promises to make it much easier to evaluate and judge the quality of published research, along with the authors of that research.

The main challenge, of course, is to persuade researchers to make their papers OA in the first place!

Citation advantage

For those still in doubt there are two other factors to consider. First, it is not necessary to wait until suitable OA journals emerge in your area before embracing OA. It is possible to publish a paper in a TA journal and then self-archive it in a subject-based or institutional repository (a practice referred to as "Green OA"). This allows you to embrace OA immediately, and without having to forego a desire to publish in a high-impact journal. Since most TA publishers now permit self-archiving this means that researchers can usually have their cake and eat it.

Second, whether they choose to self-archive or to publish in an OA journal ("Gold OA"), researchers can expect to benefit from the so-called "citation advantage". This refers to the phenomenon in which papers made OA are cited more frequently than those hidden behind a subscription paywall.

In a paper published in the BMJ in 2004, for instance, Thomas V Perneger reported, "Papers that attracted the most hits on the BMJ website in the first week after publication were subsequently cited more often than less frequently accessed papers. Thus early hit counts capture at least to some extent the qualities that eventually lead to citation in the scientific literature."

This suggests that free early availability of a paper leads to greater recognition in the long run. While the citation advantage is not (yet at least) a precise science, Suber reports that OA articles are cited "40-250% more often than TA articles, at least after the first year."

In confirmation of this phenomenon publishing consultant Alma Swan reported recently that after chemist Ray Frost deposited around 300 of his papers in the Queensland University of Technology institutional repository they were downloaded 165,000 times, and the number of citations to them grew from around 300 to 1,200 a year. "[U]nless Ray’s work suddenly became super-important in 2004, the extra impact is a direct result of Open Access," concluded Swan.

Growing improvement

Open Access scientometrics also raise the intriguing possibility that if research becomes widely available on the Web the quality of papers published in OA journals may start to overtake, not lag, the quality of papers published in TA journals.

Why? Because if these tools were widely adopted the most important factor would no longer be which journal you managed to get your paper published in, but how other researchers assessed the value of your work — measured by a wide range of different indicators, including for instance when and how they downloaded it, how they cited it, and the different ways in which they used it.

Given that this would provide a much more accurate assessment of quality, scientists could be expected to spend more time perfecting their research, and writing up the results as accurately as possible, and less time trying to second-guess what the gatekeepers of a few select journals deemed suitable for publication. In short, we could expect to see a growing improvement in the quality of published papers.

Moreover, since these new tools would require that research was freely available on the Web papers published in TA journals would not benefit from them.

Indeed, if research began to be judged by the value of the cargo (the research paper) not the perceived value of the vehicle used to distribute it (the journal), scholars might even prefer to publish in what Bittner dismisses as, "the latest 'Open Access' journal announced by some small publishing house". After all, trying to get a paper published in a prestigious journal is a difficult process, and one that frequently comes with the indignity of being judged by establishment figures resistant to new ideas.

Certainly many would now agree that traditional peer review is a far from flawless process, and one that often leads to good papers being rejected. As The Scientist pointed out in 2006, reviewers are known to often "sabotage papers that compete with their own ... [send strong papers] ... to sister journals to boost their profiles, and editors at commercial journals are too young and invariably make mistakes about which papers to reject or accept."

True, researchers might still feel the need to continue publishing in prestigious journals in order to benefit from the greater visibility that they provide. But as Web 2.0 features like tagging and folksonomies become more prevalent, and as the number of institutional repositories grows, it will be possible to obtain visibility by other means.

Many a slip

That's the theory. But there is many a slip twixt the cup and the lip. Leaving aside the need for OA to prevail first (and there remains no shortage of opponents to OA), the above scenario could only be realised if research institutions and funders embraced the new evaluation tools.

For the moment, as Bittner's experience demonstrates, university promotion and tenure (P&T) committees remain addicted to the journal impact factor, as do research funders. And as Suber points out, so long as funding agencies and P&T committees continue to reward researchers who have a record of publishing in high-prestige journals, "they help create, and then entrench, the incentive to do so."

For the foreseeable future, therefore, sceptical voices will surely continue to argue that OA journals lack quality control, and so are best avoided.

Fortunately, this does not prevent individual researchers from self-archiving. And by doing so they will not only make their research free to all but, like Ray Frost, start to enjoy the benefits of the citation advantage.

Of more immediate concern, however, is the danger that the actions of a few OA publishers might yet demonstrate that OA journals do indeed publish lower quality research than TA journals. And unless the OA movement addresses this issue quickly it could find that the sceptical voices begin to grow in both volume and number. That is a topic I hope to examine at a later date.

In my next post, however, I want to look more closely at peer review.


Stevan Harnad said...

Peer Review Selectivity Determines Quality, Not OA vs. TA

Richard writes:

"Open Access scientometrics... raise the intriguing possibility that if research becomes widely available on the Web the quality of papers published in OA journals may start to overtake, not lag, the quality of papers published in TA journals... Why? Because if these tools were widely adopted the most important factor would no longer be which journal you managed to get your paper published in, but how other researchers assessed the value of your work — measured by a wide range of different indicators, including for instance when and how they downloaded it, how they cited it, and the different ways in which they used it."

All true, but how does it follow from this that OA journals will overtake TA journals? As Richard himself states, publishing in an OA journal ("Gold OA") is not the only way to make one's article OA: One can publish in a TA journal and self-archive ("Green OA"). OA scientometrics apply to all OA articles, Green and Gold; so does the OA citation advantage.

Is Richard perhaps conflating TA journals in general with and top-TA journals (which may indeed lose some of their metric edge because OA scientometrics is, as Richard notes, calculated at the article rather than the journal level)? The only overtaking I see here is OA overtaking TA, not OA journals overtaking TA journals. (Besides, there are top-OA journals too, as Richard notes, and bottom-rung TA ones too.)

It should also be pointed out that the top journals differ from the rest of the journals not just in their impact factor (which, as Richard points out, is a blunt instrument, being based on the journal average rather than individual article citation counts) but in their degree of selectivity (peer revew standards). If I am selecting members for a basketball team, and I only accept the tallest 5%, I am likely to have a taller team than the team that is less selective on height. Selectivity is correlated with impact factor, but it is also correlated with quality itself. The Seglen effect (that about 80% of citations go to the top 20% of articles) is not just a within-journal effect: it is true across all articles across all journals. There is no doubt variation within the top journals, but not only are their articles cited more on average, but they are also better quality on average (because of their greater selectivity). And the within-journal variation around the mean is likely to be tighter in those more selective journals than the less-selective journals.

OA will give richer and more diverse metrics; it will help the cream (quality) to rise to the top (citations) unconstrained by whether the journal happens to be TA or OA. But it is still the rigor and selectivity of peer review that does the quality triage in the quality hierarchy among the c. 25,000 peer reviewed journals, not OA.

(And performance evaluation committees are probably right to place higher weight on more selective journals.)

Anonymous said...

"... editors at commercial journals are too young and invariably make mistakes about which papers to reject or accept."

The Scientist is 100% right: JHEP (that I don't consider commercial) is entirely run by the scientific communitiy without any interference by internal editorial staff. And JHEP has an high rejection rate, a good reputation and an higher IF than the competitors. Some 20% of the papers we publish are OA, the rest is easily accessible thru the arXiv.