The
introduction I wrote for the recent Q&A with Clifford Lynch has attracted some
commentary from the institutional repository (IR) and open access
(OA) communities. I
thank those who took the time to respond. After reading the comments the
following questions occurred to me.
(A print version of this text is available here)
1.     Is the institutional repository dead or dying?
Judging
by the Mark Twain quote with which COAR’s Kathleen Shearer
headed her response (“The reports of
our death have been greatly exaggerated”), and judging by CORE’s Nancy Pontika insisting
in her comment that we should
not give up on the IR (“It is my strong belief that we don’t need to abandon
repositories”) people might conclude that I had said the IR is dead. 
Indeed,
by the time Shearer’s comments were republished on the OpenAIRE blog (under the title
“COAR counters reports of repositories’ demise”) the wording had strengthened –
Shearer was now saying that I had made a number of “somewhat questionable
assertions, in particular that institutional repositories (IRs) have failed.”
That
is not exactly what I said, although I did quote a blog post by Eric Van de
Velde (here) in which he
declared the IR obsolete. As he put it, “Its flawed foundation cannot be
repaired. The IR must be phased out and replaced with viable alternatives.”
What
I said (and about this Clifford Lynch
seemed to agree, as do a growing number of others) is that it is time for the
research community to take stock, and rethink what it hopes to achieve with the
IR. 
It
is however correct to say I argued that green OA has “failed as a strategy”. And
I do believe this. I gave some of the reasons why I do in my introduction, the
most obvious of which is that green OA advocates assumed that once IRs were
created they would quickly be filled by researchers self-archiving their work.
Yet seventeen years after the Santa Fe meeting, and 22 years
after Stevan Harnad began his long campaign to persuade
researchers to self-archive, it is clear there remains little or no appetite
for doing so, even though researchers are more than happy to post their papers
on commercial sites like Academia.edu and ResearchGate.
However,
I then went on to say that I saw two possible future scenarios for the IR. The
first would see the research community “finally come together, agree on the
appropriate role and purpose of the IR, and then implement a strategic plan
that will see repositories filled with the target content (whatever it is
deemed to be).”
The
second scenario I envisaged was that the IR would be “captured by commercial
publishers, much as open access itself is being captured by means of
pay-to-publish gold OA.”
Neither
of these scenarios assumes the IR will die, although they do envisage somewhat
different futures for it. That said, what they could share in common is a
propensity for the link between the IR and open access to weaken. Already we
are seeing a growing number of papers in IRs being hidden behind login walls –
either as a result of publisher embargoes or because many institutions have
come to view the IR less as a way of making research freely available, more as a
primary source of raw material for researcher evaluation and/or other internal
processes. As IRs merge with Research Information Management (RIM) tools and Current
Research Information Systems (CRIS) this darkening of
the content in IRs could intensify.  
What
makes this darkening likely is that the internal processes that IRs are
starting to be used for generally only require the deposit of the metadata
(bibliographic details) of papers, not the full-text. As such, the underlying documents
may not just be inaccessible, but entirely absent.
This
outcome seems even more likely in my second scenario. Here the IR is (so far as
research articles are concerned) downgraded to the task of linking users to
content hosted on publishers’ sites. Again, to fulfil such a role the IR need host
only metadata.
2.     So what is the role of an institutional repository?
What should be deposited in it, and for what purpose?
As
I pointed out in my introduction, there is today no consensus on the role and
purpose of the IR. Some see it as a platform for green OA, some view it as a
journal publication platform, some as a metadata repository, some as a digital
archive, some as a research data repository (I could go on).
It
is worth noting here a comment posted on my blog
by David Lowe. The reason why the IR will persist, he said, “is not related to
OA publishing as such, but instead to ETDs.” Presumably this means that Lowe
expects the primary role of the IR to become that of facilitating ETD
workflows. 
It
turns out that ETDs are frequently locked behind login walls, as Joachim Schöpfel and Hélène Prost pointed
out in a
2014 paper called Back to Grey: Disclosure and Concealment of
Electronic Theses and Dissertations. “Our paper,” they wrote “describes a
new and unexpected effect of the development of digital libraries and open
access, as a paradoxical practice of hiding information from the scientific
community and society, while partly sharing it with a restricted population
(campus).”
And
they concluded that the Internet “is not synonymous with openness, and the
creation of institutional repositories and ETD workflows does not make all
items more accessible and available. Sometimes, the new infrastructure even
appears to increase barriers.”
In
short, the roles that IRs are expected to play are now manifold and
sometimes they are in conflict with one another. One consequence of this is
that the link between the repository and open access could become more and more
tenuous. Indeed, it is not beyond the bounds of possibility that the link could
break altogether.
3.     To what extent can we say that the IR movement – and the
OAI-PMH standard on which it was based – has proved successful, both in terms
of interoperability and deposit levels?
As
I said in my introduction, thousands of IRs have been created since 1999. That
is undoubtedly an achievement. On the other hand, many of these repositories remain
half empty, and for the reasons stated about we could see them increasingly being populated
with metadata alone.
Both
Shearer and Pontika agree that more could have been achieved with the IR. With
regard to OAI-PMH Pontika says that while it has its disadvantages, “it has
served the field well for quite some time now.”
But
what does serving the field well mean in this context? Let’s recall that the
main reason for holding the Santa Fe meeting, and for developing OAI-PMH, was
to make IRs interoperable. And yet interoperability remains more
aspiration than reality today. Perhaps for this reason most research papers are
now located by means of commercial search engines and Google Scholar, not
OAI-PMH harvesters – a point Shearer conceded when I
interviewed her in 2014. 
Of
course, if running an IR becomes less about providing open access and more
about enabling internal processes, or linking to papers hosted elsewhere, interoperability
begins to seem unnecessary. 
4.     Do IR advocates now accept that there is a need to re-think
the institutional repository, and is the IR movement about to experience a
great leap forward as a result? 
Most
IR advocates do appear to agree that it is time to review the current status of
the institutional repository, and to rethink its role and purpose. And it is
the Confederation of Open Access Repositories (COAR) that is leading on this. 
“The
calls for a fundamental rethink of repositories is already being answered!” Tony
Ross-Hellauer –  scientific manager at OpenAIRE (a member of COAR) –  commented on my blog.  “See the ongoing work of the COAR next-generation
repositories working group.”
Shearer,
who is the executive director of COAR (and so presumably responsible for the working
group), explains in her response that the group has set itself the task of
identifying “the core functionalities for the next generation of repositories,
as well as the architectures and technologies required to implement them.” 
As
a result, Shearer says, the IR community is “now well positioned to offer a
viable alternative for an open and community led scholarly communication
system.”
So
all is well? Not everyone thinks so. As an anonymous commenter pointed
out
on my blog: “All this is not really offering a new way and more like reacting
to the flow. Maybe that has to do with the kind of people working on it, the IR
crowd is usually coming from the library field and their job is not to be
inventive but to archive and keep stuff save.”
Archiving
and keeping stuff save are very worthy missions, but it is to for-profit publishers that
people tend to turn when they are looking for inventive solutions, and we can see that legacy publishers are
now keen to move into the IR space. This suggests that if the goal is to create a community-led
scholarly communications system COAR’s initiative could turn out to be a case
of shutting the stable door after the horse has bolted.
5.     What is the most important task when seeking to
engineer radical change in scholarly communication: articulating a vision,
providing enabling technology, or getting community buy-in?
“Ultimately,
what we are promoting is a conceptual model, not a technology,” says Shearer
“Technologies will and must change over time, including repository
technologies. We are calling for the scholarly community to take back control
of the knowledge production process via a distributed network based at
scholarly institutions around the world.”
Shearer
adds that the following vision underlies COAR’s work:
“To position distributed repositories as
the foundation of a globally networked infrastructure for scholarly
communication that is collectively managed by the scholarly community. The
resulting global repository network should have the potential to help transform
the scholarly communication system by emphasizing the benefits of collective,
open and distributed management, open content, uniform behaviors, real-time
dissemination, and collective innovation.”
As
such, I take it that COAR is seeking to facilitate the first scenario I
outlined. But were not the above objectives those of the attendees of the 1999
Santa Fe meeting? Yet seventeen years later we are still waiting for them to be
realised. Why might it be different this time around, especially now that legacy
publishers are entering the market for IR services, and some universities seem minded to outsource the hosting of research papers to commercial
organisations, rather than work with colleagues in the research community to create an interoperable network of distributed repositories?
What
has also become apparent over the past 17 years is that open movements and initiatives
focused on radical reform of scholarly communication tend to be long on
impassioned calls, petitions and visions, short on collective action. 
As
NYU librarian April Hathcock put it when reporting on
a Force11 Scholarly
Commons Working Group
she attended recently: “As several of my fellow librarian colleagues pointed
out at the meeting, we tend to participate in conversations like this all the
time and always with very similar results. The principles are fine, but to me,
they’re nothing new or radical. They’re the same things we’ve been talking about
for ages.” 
Without
doubt, articulating a vision is a good and necessary thing to do. But it can
only take you so far. You also need enabling technology. And here we have
learned that there is many a slip ‘twixt the cup and the lip.” OAI-PMH has not
delivered on its promise, as even Herbert Van de Sompel, one of the architects
of the protocol, appears to have concluded. (Although this tweet suggests that he too
does not agree with the way I characterised the current state of the IR
movement). 
Shearer
is of course right to say that technologies have to change over time. However, choosing
the wrong one can at derail, or significantly slow down, the objective you are working towards. 
But
even if you have articulated a clear and desirable vision, and you have put the
right technology in place, in the generally chaotic and anarchic world of
scholarly communication you can only hope to achieve your objectives if you get
community buy-in. That is what the IR and self-archiving movements have surely
demonstrated. 
6.     To what extent are commercial organisations colonising
the IR landscape?
In
my introduction I said that commercial publishers are now actively seeking to colonise
and control the repository (a strategy supported by their parallel activities aimed
at co-opting gold open access). As such, I said, the challenge the IR community
faces is now much greater than in 1999.
In
her response, Shearer says that I mischaracterise the situation. “[T]here are
numerous examples of not-for-profit aggregators including BASE, CORE, SemanticScholar, CiteSeerX, OpenAIRE, LA Referencia and SHARE (I could go on),” she said. “These
services index and provide access to a large set of articles, while also, in
some cases, keeping a copy of the content.”
In
fact, I did discuss non-profit services like BASE and OpenAIRE, as well as PubMed
Central, HAL and SciELO. In doing so I pointed out that a high percentage of the
large set of articles that Shearer refers to are not actually full-text
documents, but metadata records. And of the full-text documents that are deposited, many are locked behind login walls. In the
case of BASE, therefore, only around 60% of the records it
indexes provide access to the full-text. 
In
addition, many consist of non-peer-reviewed and non-target content such as blog
posts. That’s fine, but this is not the target content that OA advocates say they want to see made open access. Indeed, in some cases
a record may consist of no more than a link to a link (e.g. see the first item
listed here).
So
the claims that these services make about indexing and providing access to a large
set of articles need to be taken with a pinch of salt.
It
is also important to note that publishers are at a significant advantage here,
since they host and control access to the full-text of everything they publish.
Moreover, they can provide access to the version of record (VoR) of articles.
This is invariably the version that researchers want to read. 
It
also means that publishers can offer access both to OA papers as well as to paywalled
papers, all through the same interface. And since they have the necessary funds
to perfect the technology, publishers can offer more and better functionality,
and a more user-friendly interface. For this reason, I suggested, they will
soon (and indeed some already are)
charging for services that index open content, as I assume Elsevier plans to do
with the DataSearch service it is
developing. This seems to me to be a new form of enclosure of the commons.
Shearer
also took me to task for attaching too much significance to the partnership between Elsevier
and the University of Florida – in which the University has agreed to outsource
access to papers indexed in its repository to Elsevier. I suggested that by signing
up to deals like this, universities will allow commercial publishers to increasingly
control and marginalise IRs. This is an exaggeration, says Shearer “[O]ne repository
does not make a trend.”
I
agree that one swallow does not a summer make. However, summer does eventually arrive,
and I anticipate that the agreement with the University of Florida will prove the
first swallow of a hot summer. Other swallows will surely follow. 
Consider,
for instance, that the University of Florida has also signed a Letter of
Agreement
with CHORUS in a pilot initiative intended to scale up the Elsevier project “to
a multilateral, industry effort.” 
In
addition to Elsevier, publishers involved in the pilot include the American Chemical Society, the American
Physical Society,
The Rockefeller University Press and Wiley. Other publishers
will surely follow.
And
just last week it was announced that Qatar University
Library
has signed a deal with Elsevier that apes the one signed by the University of
Florida. I think we can see a trend in the making here. 
As
things stand, therefore, it is not clear to me how initiatives like COAR and SHARE can hope to match the collective power of
legacy publishers working through CHORUS. 
Let’s
recall that OA advocates long argued that legacy publishers would never be able
to replicate in an OA environment the dominance they have long enjoyed in the
subscription world. As a result, it was said, as open access commodifies the services
they provide publishers will experience a downward pressure on prices. In
response, they will either have to downsize their operations, or get out of
the publishing business altogether. Today we can see that legacy publishers are
not only prospering in the OA environment, but getting ever richer as their
profits rise – all at the expense of the taxpayer.
But
let me be clear: while I fear that legacy publishers are going to co-opt both
OA and IRs, I would much prefer they did not. Far better that the research
community – with the help of non-profit concerns – succeeded in developing COAR’s
“viable alternative for an open and community led scholarly communication
system.” 
So
I applaud COAR’s initiative and absolutely sign up to its vision. My doubts are
that, as things stand, that vision is unlikely to be realised. For it to happen
I believe more dramatic changes would be needed than the OA and IR movements appear
to assume, or are working towards.
7.     Will the IR movement, as with all such attempts by the
research community to take back control of scholarly communication, inevitably
fall victim to a collective action dilemma?
Let
me here quote Van de Sompel, one of the key architects of OAI-PMH. Van de
Sompel, I would add, has subsequently worked on OAI-ORE (which Lynch mentions in the
Q&A) and on ResourceSync (which Shearer
mentions in her critique).
In
a retrospective on repository
interoperability efforts published last year Van de Sompel concluded, “Over the
years, we have learned that no one is ‘King of Scholarly Communication’ and
that no progress regarding interoperability can be accomplished without active
involvement and buy-in from the stakeholder communities. However, it is a
significant challenge to determine what exactly the stakeholder communities
are, and who can act as their representatives, when the target environment is
as broad as all nodes involved in web-based scholarship. To put this
differently, it is hard to know how to exactly start an effort to work towards
increased interoperability.”
The
larger problem here, of course, is the difficulties inherent in trying to get
the research community to co-operate. 
This
is the problem that afflicts all attempts by the research community to, in
Shearer’s words, “take back control of the knowledge production process.” What inevitably
happens is that they bump up against what John Wenzler, Dean of Libraries California
State University, has described as a “collective
action dilemma”. 
But
what is the solution? Wenzler suggests the research community should focus on trying
to control the costs of scholarly communication. Possible ways of doing this he
says could include requiring pricing transparency and lobbying for government
intervention and regulation. “[T]he government can try to limit a natural
monopoly’s ability to exploit its customers by regulating its prices instead.”)
He
concedes however: “Currently, the dominant political ideology in Western
capitalist countries, especially in the United States, is hostile to regulation,
and it would be difficult to convince politicians to impose prices on an
industry that hasn’t been regulated in the past.” 
He
adds: “Moreover, even if some kind of International Publishing Committee were
created to establish price rates, there is a chance that regulators would be
captured by publisher interests.”
It
is worth recalling that while OA advocates have successfully persuaded many
governments to introduce open access/public access policies, this has not put
control of the knowledge production process back into the hands of the research
community, or reduced prices. Quite the reverse: it is (ironically) increasing the power and dominance
of legacy publishers.  
In
short, as things stand if you want to make a lot of money from the taxpayer you
could do no better than become a scholarly publisher!
I
don’t like being the eternal pessimist. I am convinced there must be a way of
achieving the objectives of the open access and IR movements, and I believe it
would be a good thing for that to happen. Before it can, however, these
movements really need to acknowledge the degree to which their objectives are being
undermined and waylaid by publishers. And rather than just repeating the same old
mantras, and recycling the same visions, they need to come up with new and more
compelling strategies for achieving their objectives. I don’t claim to know
what the answer is, but I do know that time is not on the side of the research
community here.
 

Repositories vs. Quasitories, or Much Ado About Next To Nothing: I
ReplyDelete“I have a feeling that when Posterity looks back at the last decade of the 2nd A.D. millennium of scholarly and scientific research on our planet, it may chuckle at us…. I don't think there is any doubt in anyone's mind as to what the optimal and inevitable outcome of all this will be: The Give-Away literature will be free at last online, in one global, interlinked virtual library.. and its [peer review] expenses will be paid for up-front, out of the [subscription cancelation] savings. The only question is: When? This piece is written in the hope of wiping the potential smirk off Posterity's face by persuading the academic cavalry, now that they have been led to the waters of self-archiving, that they should just go ahead and drink!” (Harnad, 20th century)
Richard Poynder notes that 17 years on, Institutional Repositories (IRs) are still half-empty of their target content: peer-reviewed research journal articles.
He is right. Most researchers are still not doing the requisite keystrokes to deposit their peer-reviewed papers (and their frantic librarians' efforts are no substitute).
The reason is that researchers' institutions and funders still have not got their heads around the right deposit mandates.
They will, but they will not get historic credit for having done it as soon as they could have.
Richard also says authors are more willing to deposit in Academia.edu and ResearchGate.
Not true. In percentage terms those central Quasitories are doing just as badly as IRs. But their visible recruiting efforts (software that keeps reminding and cajoling authors) is clever, and something along the same lines should be adopted as part of funder and especially institutional deposit mandates. (Keystrokes are keystrokes, whether done for one's own institutional repository or a third party Quasitory.)
The biggest Quasitory of all is the Virtual Quasitory called Google Scholar (GS). GS has mooted most of the fuss about interoperability because it full-text-inverts all content. It's a nuclear weapon, but it is in no hurry. Unlike institutions and funders, GS is under no financial pressure. And unlike publishers, it does not have the ambition or the need to capture and preserve publishers' obsolete, parasitic functions (even though, unlike publishers, GS is in an incomparably better position to maximise functionality on the web). GS is waiting patiently for the research community to get its act together.
Institutions and funders are not just sluggish in adopting and optimizing their deposit mandates but they are making Faustian Little Deals with their parasites, prolonging their longstanding dysfunctional bondage.
Repositories vs. Quasitories, or Much Ado About Next To Nothing: II
ReplyDeleteCan't blame publishers for striving at all costs to keep making a buck, even if they no longer really have any essential service or expertise to offer (other than managing peer review). Publishers' last resort for clinging to their empty empire is the OA embargo -- for which the antidote -- the eprint-request button (the IR's functional equivalent of Academia.edu and ResearchGate -- is already known; it's just waiting to be used, along with effective deposit mandates.
As to why it's all taking so excruciatingly long: I'm no good at sussing that out, and besides, Alma Swan has forbidden me even to give voice to my suspicion, beyond perhaps the first of its nine letters: S.
Vincent-Lamarre, P, Boivin, J, Gargouri, Y, Larivière, V & Harnad, (2016) Estimating Open Access Mandate Effectiveness: The MELIBEA Score. Journal of the Association for Information Science and Technology (JASIST) 67 (in press)
Swan, A; Gargouri, Y; Hunt, M; & Harnad, S (2015) Open Access Policy: Numbers, Analysis, Effectiveness. Pasteur4OA Workpackage 3 Report.
Harnad, S (2015) Open Access: What, Where, When, How and Why. In: Ethics, Science, Technology, and Engineering: An International Resource. eds. J. Britt Holbrook & Carl Mitcham, (2nd edition of Encyclopedia of Science, Technology, and Ethics, Farmington Hills MI: MacMillan Reference)
Harnad, S (2015) Optimizing Open Access Policy. The Serials Librarian, 69(2), 133-141
Sale, A., Couture, M., Rodrigues, E., Carr, L. and Harnad, S. (2014) Open Access Mandates and the "Fair Dealing" Button. In: Dynamic Fair Dealing: Creating Canadian Culture Online (Rosemary J. Coombe & Darren Wershler, Eds.)
Harnad, S (2014) The only way to make inflated journal subscriptions unsustainable: Mandate Green Open Access. LSE Impact of Social Sciences Blog 4/28
I just wanted to add that CORE, listed among the OAI-PMH harvesters, is a free not-for-profit service, which indexes and keeps a cached copy of research papers harvested from repositories and journals via OAI-PMH (and other protocols). It carries out no pulling of full-texts at the time of access, unless specifically requested by the user, and the user is also not going to hit a pay-walls. CORE has close to 4.5 million full-texts available and about 37 million metadata records. The access to the full-text content is, as opposed to other commercial services, provided both via a user interface as well as an API or data dumps.
ReplyDeleteMy argument here is that I don't think that it can be said that OAI-PMH has failed and does not enable interoperability or the development of aggregations. Clearly this is possible. However, I have highlighted before certain issues with OAI-PMH that make interoperability more difficult to achieve, see (Knoth, 2013). I believe the way forward is to constructively address these issues through the development of common practice or better open protocols. This approach is completely different from the strategy of most existing commercial services that create solutions on top of which it is almost impossible to develop anything new. For example, in the domain of text mining research papers, there is no commercial service providing an acceptable solution to the provision of research papers. The role of repositories in enabling this (and other) important use cases should not be underestimated. In fact, achieving interoperability across the content from publishers is an order of magnitude more complicated than across repositories, see (Knoth & Pontika, 2016).
Knoth, P. (2013) From Open Access Metadata to Open Access Content: Two Principles for Increased Visibility of Open Access Content, Open Repositories 2013, Charlottetown, Prince Edward Island, Canada
Knoth, P. and Pontika, N. (2016) Aggregating Research Papers from Publishers' Systems to Support Text and Data Mining: Deliberate Lack of Interoperability or Not?, Workshop: INTEROP2016 at 10th Language Resources and Evaluation Conference