Sunday, May 04, 2014

Interview with Kathleen Shearer, Executive Director of the Confederation of Open Access Repositories

In October 1999 a group of people met in New Mexico to discuss ways in which the growing number of “eprint archives” could co-operate.
Kathleen Shearer
Dubbed the Santa Fe Convention, the meeting was a response to a new trend: researchers had begun to create subject-based electronic archives so that they could share their research papers with one another over the Internet. Early examples were arXiv, CogPrints and RePEc.

The thinking behind the meeting was that if these distributed archives were made interoperable they would not only be more useful to the communities that created them, but they could “contribute to the creation of a more effective scholarly communication mechanism.”

With this end in mind it was decided to launch the Open Archives Initiative (OAI) and to develop a new machine-based protocol for sharing metadata. This would enable third party providers to harvest the metadata in scholarly archives and build new services on top of them. Critically, by aggregating the metadata these services would be able to provide a single search interface to enable scholars interrogate the complete universe of eprint archives as if a single archive. Thus was born the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). An early example of a metadata harvester was OAIster.

Explaining the logic of what they were doing in D-Lib Magazine in 2000, Santa Fe meeting organisers Herbert Van de Sompel and Carl Lagoze wrote, “The reason for launching the Open Archives initiative is the belief that interoperability among archives is key to increasing their impact and establishing them as viable alternatives to the existing scholarly communication model.”

As an example of the kind of alternative model they had in mind Van de Sompel and Lagoze cited a recent proposal that had been made by three Caltech researchers.

Today eprint archives are more commonly known as open access repositories, and while OAI-PMH remains the standard for exposing repository metadata, the nature, scope and function of scholarly archives has broadened somewhat. As well as subject repositories like arXiv and PubMed Central, for instance, there are now thousands of institutional repositories. Importantly, these repositories have become the primary mechanism for providing green open access — i.e. making publicly-funded research papers freely available on the Internet. Currently OpenDOAR lists over 3,600 OA repositories.

Work in progress

Fifteen years later, however, the task embarked upon at Santa Fe still remains a work in progress. Not only has it proved hugely difficult to persuade many researchers to make use of repositories, but the full potential of networking them has yet to be realised, not least because many repositories do not attach complete and consistent metadata to the items posted in them, or they only provide the metadata for a document, not the document itself. As a consequence, locating and accessing content in OA repositories remains a hit and miss affair, and while many researchers now turn to Google and Google Scholar when looking for research papers, Google Scholar has not been as receptive to indexing repository collections as OA advocates had hoped.

For scholars, the difficulties associated with accessing papers in repositories is a continuing source of frustration. Meanwhile, critics of green OA argue that the severe shortage of content in them means that any hope of building an effective network of OA repositories is a lost cause anyway.

For their part, conscious that green OA poses a potential threat to their profits, publishers have responded to the growing calls for open access by offering pay-to-publish gold OA journals as an alternative.

It was against this background that in 2012 the Finch Committee concluded that in order for the UK to make an effective transition to OA “a clear policy direction should be set towards support for publication in open access or hybrid journals, funded by APCs, as the main vehicle for the publication of research.”

Explaining the decision to prioritise gold OA, Finch argued that repositories had failed to deliver on their promise. “Despite the best efforts of repository managers and librarians … rates of deposit and usage of published materials remain fairly low; and a number of issues will need to be addressed if institutional repositories are to fulfil a bigger and more effective role in the research communications landscape.”

For that reason, Finch added, repositories should in future be viewed as being merely “complementary to formal publishing, particularly in providing access to research data and to grey literature, and in digital preservation”

The Finch Report proved highly controversial, particularly when Research Councils UK (RCUK) responded by introducing a new gold-preferred OA Policy conforming to its recommendations. Many OA advocates in particular felt betrayed.

But we need to ask: did Finch have a point?

We should not doubt that huge challenges remain in getting content into repositories. However, the whys and wherefores of this have been well rehearsed elsewhere, so we won’t dwell on them here.

Instead, let’s consider the current state of the repository infrastructure, particularly with regard to interoperability and discoverability. Why, for instance, do many repositories not expose adequate metadata?  Why do they sometimes provide just the metadata and not the full text? When will the sophisticated search functionality that researchers need become standard in repositories? Will it? And what new developments might help here? More generally, what does the future hold for the OA repository?

Investing for the long term

Who better to put these questions to than Kathleen Shearer, Executive Director of the Confederation of Open Access Repositories (COAR)? Launched in October 2009, COAR’s mission is to “enhance the visibility and application of research outputs through a global network of open access digital repositories” and its membership currently includes over 100 institutions from around the world.

Reading Shearer’s replies below one has to conclude that there is much still to be done. Scholars and scientists will therefore clearly need to be patient. And while new repositories are constantly being created, and existing ones improved (as are cross-repository search services like BASE), the truth is that if the vision articulated in New Mexico fifteen years ago is to be fully realised the research community is going to have to invest a great deal more time, effort and money to developing its repositories.

But should it? Now that most if not all scholarly publishers offer gold OA is further investment in repositories justified?

Shearer believes it is — for two reasons. First, she says, wide-scale take up of green OA would contain publishers’ prices; second, the time has in any case come for the research community to take back control of the scholarly communication system, and repositories will be vital in doing that.

As Shearer puts it, “[T]he Green Road is key. We must collectively build and maintain a global system of repositories. It introduces competition into the system and will act as an important deterrent to arbitrary price increases by publishers.”

She adds, “It will also demonstrate the important role that institutions play in the stewardship of research outputs. To that end, institutions should devote more resources to their repository operations in order to improve repository services and increase the size of their collections.”

As I read it, the promise is that any investment made in OA repositories today will more than pay for itself in the long term.

The interview begins

RP:  Can you say who you are, where you are based and what role you play within COAR?

KS: I am the Executive Director of COAR and I am based in Montreal, Canada, although the COAR office is located in Göttingen, Germany. I have been working in the area of open access and digital repositories for about a dozen years now, mainly in the Canadian context as a consultant and a research associate with the Canadian Association of Research Libraries. In June 2013, I became the Executive Director of COAR.

RP: Briefly, what is COAR, how is it funded, and what is its purpose?

KS: COAR, the Confederation of Open Access Repositories, is an association of repository initiatives with an international membership.

We have over 100 members in 35 countries around the world. Our members come from a variety of communities including universities/libraries, research institutions, funding agencies, intergovernmental organizations and government departments — any organization that may have an interest in repository development and wants to be connected with the international community.

COAR’s mission is to raise the visibility of research outputs through a global network of repositories. We are active on two levels: (1) At the practical level, we support communities of practice around areas of importance for our members mainly in terms of best practices, interoperability and monitoring trends in the repository landscape and (2) At the strategic level, we aim to facilitate greater alignment of regional and national repository networks around the globe.

COAR is funded mainly through membership fees, although we receive in-kind support for our office space from the University of Göttingen and some partnership funding as well.

We are quite a light-weight organization with about 1.5 full time positions in total and an Executive Board chaired by Norbert Lossau, Vice-President of the University of Göttingen. Most of our activities are undertaken by the active participation of our members. 

RP: The mission of COAR, you said, is to “raise the visibility of research outputs through a global network of repositories”. I think it might help if we tried to clarify what this means in practice. In other words, what do we mean by repository here, and what role exactly do we expect that repository to play? Are we talking about a global network of institutional repositories, or does repository here encompass more than that (i.e. central subject-based repositories like PubMed Central and arXiv too, and perhaps other content management systems and databases?)

Likewise, should we assume the role of the repository remains as it was originally conceived — a tool to support green OA by providing a place where papers published in subscription journals can be self-archived in order to ensure that free copies are always available outside the subscription paywall?

Or do we assume that the repository can now also act as a publishing platform on which institutions can publish their own journals — as currently planned, for instance, by University College London?

Alternatively, perhaps the assumption is that today the repository should be viewed as little more than what the Finch Report assumed it to be: something “complementary to formal publishing, particularly in providing access to research data and to grey literature, and in digital preservation” (A model that assumes open access is provided by means of gold rather than green OA)?

KS: Repositories are evolving and play a number of roles. At their core, a ‘repository’ could be theoretically defined as a set of services that provide open access to research outputs (along the lines of Cliff Lynch’s original definition in 2003). However, in practice, repository services and infrastructures are diverse and there is a lot of overlap with other systems. Perhaps most significantly, practices and technologies are changing quickly, making it a challenge to concretely define their services. My feeling is that we need to be flexible in the way we conceptualize repositories.

In terms of COAR, we are a community brought together by a set of shared principles and common practices rather than by a narrowly delineated concept of repository. So yes, we would include disciplinary repositories and content management systems (if they provide open access to full text) in our global network.

In terms of a complement to formal publishing, I expect that traditional publishing will soon be going through some pretty big transitions, likely some very disruptive changes. I agree with Dominique Babini, Jean-Claude Guédon and others that we should aim for a basic, open, and interoperable system that is free to both access and contribute to. Value-added services by publishers and others can be built on top of this content.

One way of thinking about repositories is that they represent an institutional commitment to the stewardship of research outputs. In this sense, they address two important problems in the current system: sustainability and stewardship.

I believe institutions should assume greater responsibility for managing, providing access and preserving the content created through research. It will alleviate some of the inflationary aspects of scholarly publishing and enable us to have more influence on future directions. This was the traditional mission of libraries in the print world, which has been somewhat lost in the transition to digital content. How this plays out in terms of models will likely vary according to content type, discipline, and region.


RP: I would like to focus on the issue of interoperability. I am aware of a number of current initiatives devoted to getting institutional repositories to interact/interoperate, including DRIVER, DRIVER II, euroCRIS, OpenAIRE and no doubt there are others too. How do these various initiatives fit together (do they?), and why are there so many initiatives that — to the layperson at least — might seem to be duplicating effort?

KS: There are several initiatives that have evolved from different requirements, regions, and with differing aims.

DRIVER and DRIVER II were European Commission-funded projects to support the implementation of repositories in EU countries. The aim was to have repositories adopt common guidelines for organizing their content so they could be harvested and searched through the DRIVER search service. 

OpenAIRE has built upon work of DRIVER to implement further standards that enable the European Commission to track the open access research output they fund. Each of these three projects required some level of interoperability between participating repositories.

There are similar initiatives in other regions, such as La Referencia in Latin America and SHARE in the US that will also require some level of interoperability across those repository networks.

COAR is a forum whereby all of these regional initiatives can work together to identify issues in common and, where appropriate, agree on standardized practices. COAR will be intensifying efforts in this area and has just launched an initiative to address some of the differences between repository networks that are evolving.

EuroCRIS is a European association that is looking at interoperability between research administrative systems. The objective of these systems is to manage and report on research activities. Unlike repositories, CRIS systems do not usually manage full text content.

We have seen in the last few years some merging between CRIS systems and repositories, with some repositories being integrated with CRIS's, or at least interoperability between repositories and CRIS. 

COAR has also been working with EuroCRIS to identify strategies for greater interoperability between research administration systems and repositories.

RP: The concept of networking repositories dates back at least to 1999, and the Santa Fe Convention. I believe it was in the wake of the Santa Fe meeting that the OAI-PMH protocol was developed. However, I assume that both the thinking and the technology have developed somewhat since then.

As I understand it, for instance, OAI-PMH was based on the principle that services would be developed to harvest metadata from repositories in order to aggregate their holdings and provide a centralised discovery service. I guess this assumed that records in repositories would consist of metadata but not the full text (so the goal presumably was to signal where papers were held, not to provide direct access to them).

I would think that the emphasis today is more on providing direct access to full-text documents not just their metadata. Briefly, therefore, can you say how thinking has developed since 1999, and how the technologies and protocols have changed to reflect this?

KS: OAI-PMH was developed on the principle that a service would harvest the metadata record that would then point the user back to the full text content in the repository. So in that sense it does facilitate access to the full text, but without having to aggregate the content into a central archive.

OAI-PMH is still the common denominator for metadata exposure in repositories and it remains standard practice for cross-repository search services to harvest metadata and then point the user back to the repository to access the full text. Full text harvesting is much more demanding, requiring large storage space to house the content in a central location and there are other technical challenges attached to full text harvesting.

The disadvantage of metadata harvesting is that the search services are based on the metadata supplied by the repositories, which isn't always comprehensive, complete or consistent. COAR aims to improve the current situation by identifying and encouraging the adoption of common standards and metadata globally. However, for better discoverability, and especially for other services such as text mining, using full text search is highly desirable.

In terms of discovery, repository managers have found that most users find the content in repositories through search engines such as Google and Google Scholar, not from metadata harvesting services or by directly searching the repository. Therefore, the repository community has put significant efforts into exposing their content to commercial search engines through various optimization techniques. 

Beyond discoverability, there are other areas of repository networking and interoperability, like content transfer, usage data, etc. where new technologies and standards/protocols have been created. COAR is a forum whereby interoperable practices can be agreed upon globally.

Full text

RP: You say that it remains standard practice for cross-repository search services to harvest metadata and then point back to the full text in the repository, and you said that COAR assumes OA repositories will “provide open access to full text”. This would seem to imply that an OA repository always now includes the full-text as well as the metadata (and indeed most people would presumably expect that of an OA repository).

However, not all records in OA repositories do provide access to the full-text, and many seem to offer little more than the bibliographic details. Even a poster child of the OA movement — Harvard’s DASH repository — has been criticised for not providing the full text (e.g. here). These criticisms were made a few years ago, but DASH does still today contain records without any full-text attached. Moreover, some do not even provide a link to the full-text (and DASH does not seem to have a RequestCopy Button). When I looked in DASH the other day, for instance, I found (at random) five examples of this (one, two, three, four, five).

I think this cannot be a consequence of publisher embargoes since the articles concerned date back as far as 1993, with the two most recent published five years ago (and in any case the Harvard OA Policies claim to moot publisher embargoes). Moreover, where in a couple of cases the DASH records do point to the full-text this is a link to the publisher’s version, where the user is asked to pay for access ($35 in one case). This cannot be described as OA.

You may not want to comment specifically on DASH, but do you think it problematic when records in OA repositories do not always provide access to the full-text, and maybe don’t even link to a free copy of it? If so, what can/is COAR do/doing to address the situation, in concrete terms?

KS: Ideally, all records in the repository will have the full text attached. However, as you point out, this isn’t always the case. I’m not sure about the specific case of DASH, but this really speaks to the collection policy of the individual repository.

As I said earlier, more and more repositories are now being used to track research output. In that case the objective may be to collect information about all of the publications at the institution, regardless of whether they are open access or not. Still other repositories may be inputting metadata records without the full text as a strategy to encourage authors to upload their documents.

If we look at the OpenAIRE portal as an example, they are currently harvesting 8.4 million records from over 400 sources (mostly repositories, but also open access journal articles). Over 8.2 million of those records are open access. So, I believe that the vast majority of content in repositories is open access, with a small percentage of metadata-only records. The portion of open access, of course, will vary depending on the repository.

In my opinion, the most effective way to improve the proportion of full text in repositories is to continue to advocate for open access policies at funding agencies and institutions around the world. These are the levers that will have a real influence on the policies and practices of the individual repositories. More staffing and resources directed towards repository operations would also help.

RP: You said that rather than searching directly in repositories, or exploiting metadata harvesting services (like OAIster perhaps?), researchers tend to rely on search services like Google and Google Scholar for the discovery of scholarly content in repositories.

Does this mean that the repository community tends today to assume that the research community should rely on mainstream search services, rather than trying to build sophisticated repository search services itself?

If so, I am conscious that OA advocates frequently complain that Google is not supportive enough of their needs, and not as keen to index repository collections as they would like. Would you agree? What is the current situation with regard to mainstream search services like Google, Bing and Yahoo in terms of indexing repositories, and what future developments do you envisage that might improve the situation so far as searching repositories is concerned?

KS: It’s not really about what the repository community believes is the best solution, but rather a practical response to user behaviour.

It would be erroneous to assume all information seekers are the same. However, we do know that even for well-developed disciplinary services, such as PubMed Central and Medline, the majority of users access articles directly from commercial search engines like Google and Google Scholar.

According to my COAR colleague Eloy Rodrigues, Director of the University of Minho Documentation Services, most well developed institutional repositories have about 3/4 of their traffic coming from Google and other generic search engines. Repository managers take that as very positive sign of the visibility and accessibility of the content in the repository. 

In terms of mainstream search engines and Google Scholar there has been ongoing discussion about their efficacy in retrieving scholarly content. It really depends on if you are looking for something you know exists (i.e. you search the title or author’s name) or you are searching using key words.

As reported in an article published in the Online Journal of Public Health Information (Giustini and Boulos, 2013), “Google Scholar’s constantly-changing content, algorithms and database structure make it a poor choice for systematic reviews.”

If you are looking for a specific document in a repository and you know the title, the search engine will likely point to it. However, searching by key words, content in repositories are not always high in the rankings.

The problem of visibility is likely even more acute for repositories with non-English content as there does seem to be a bias towards English language content in these search engines. 

This will remain an ongoing challenge for repositories as technology continues to change rapidly.

Inherent tension

RP: Certainly there seems to be some disappointment amongst researchers that 15 years after the Santa Fe meeting they still find it extremely difficult, if not impossible, to search effectively in and across OA repositories. I saw this view expressed most recently by Cambridge University chemist Peter Murray-Rust who tweeted, “IF libraries provide modern search I'd change my mind; but articles in repos are difficult to discover”. His conversation can be viewed here.

Does Murray-Rust have a point? What can you say to convince him that his needs will be met soon? Can you? If so, how will they be met?

KS: There is an inherent tension that exists in the repository community. On the one hand, we aim to make the deposit process as easy as possible so that creators will contribute (or repository staff costs are manageable); on the other hand, we want to assign good quality metadata (which takes time and effort) because we know it will enable greater interoperability and improve discoverability of content. So far, the former has been a greater priority.

There is some truth to Peter Murray-Rust’s comments in that complex search services, such as those developed for some discipline-based repositories, require quite a high level of curation, especially for non-textual material. Datasets, for example, need to be accompanied by fairly comprehensive metadata describing them and those metadata elements need to be standardized across each item.

It is a far greater challenge to develop complex searching across numerous repositories containing different disciplines, languages and formats. To facilitate advanced searching in this context, there needs to be interoperability across repositories. COAR has been working on this and this is one of our top priorities; but it takes time to realize this across a very diverse repository landscape.

That being said, there are already a number of cross-repository search services, for example BASE, CORE, and OpenAIRE, which are working to improve the retrieval of content in repositories. They have advanced search options that allow you, for example, to limit your search to publication type, geographic location, publication year and so on. You can’t do all of these things in Google Scholar.

OpenAIRE enables users to identify publications related to the projects for which they are funded. These services (and others) will continue to develop and will incorporate more sophisticated tools to improve discovery in the future.

Personally, I can envision a time not too far in the future when more complex search services are built on top of repository networks. What individual repositories should focus on, in my opinion, is ensuring that their content is open, can be indexed, and is attached with the necessary metadata in order to facilitate the development of these services.

RP: From what you have said would it be accurate for me to conclude the following: Users tend to prefer using commercial search engines and Google Scholar for discovering research papers in repositories. However, this is not always the best approach.

We don’t yet know exactly what the role of the OA repository will be, nor what form it might eventually take (indeed, repositories will likely take a number of different forms, and play a variety of different roles).

For these reasons it is important that repository managers ensure their content is open, that it has appropriate metadata attached, and that it can be indexed. Doing this will provide sufficient flexibility for future developments.

Finally, we are still some years out from the point where researchers with sophisticated search needs can expect the level of discoverability that they want/need?

Have I understood correctly?

KS: Yes, you are for the most part correct in summarizing my opinion.

A couple of small clarifications: We know from repository managers that the majority of users are coming to repositories from commercial search engines and not through harvesting services or the search facility built into the repository; and we know from user studies that the starting point to find information for many researchers is through Google or Google Scholar.

Currently, as things stand, the content in repositories is not highly ranked in Google Scholar, and in terms of Google, repositories are indexed alongside billions of other pages. So, no, this is not ideal for the discoverability of repository content, particularly for key word or topic-based searching.

I note that in the early days of Google Scholar, the open access community advocated for the search results to be tagged as open access (or not). Obviously we were not successful, but this would have enabled users to limit results to open access content and certainly been a boost for the visibility of repository content in this context.

I do believe the discoverability of repository content will improve greatly in the coming years. Refining the cross-repository search services, those that are based on harvested metadata, will depend on improving the standardization and comprehensiveness of metadata records. Technology will help with this. There are new, automated methods for assigning metadata and repository software platforms can build-in standard vocabularies and metadata elements.

The greater challenge is coming to an agreement about common terminologies and approaches across the entire repository community. COAR will play an important role by acting as a forum whereby the repository community can make these kind of collective decisions. 

There will also likely be a number of services developed in the coming years to facilitate full-text searching through harvesting the content. According to Petr Knoth (Knowledge Media Institute, The Open University, UK) who has been doing research in this area through the CORE initiative referenced earlier, there still are a number of technical and legal barriers to full text harvesting from repositories.

However, in the coming years, I expect that the repository community will begin to address these barriers, especially the technical ones.

Again, I hope that COAR can play a role in developing solutions and disseminating best practices.


RP: You said (or at least implied) that repositories should be viewed as tools to enable the research community to “assume greater responsibility for managing, providing access and preserving the content created through research”. And you cited SHARE as an example of an initiative focussed on providing interoperability between repositories.

It is worth noting that SHARE is a response by librarians to the OSTP Memorandum, which directs US Federal agencies to develop plans to ensure that the published results of research they have funded is made OA. As such, SHARE could be viewed as a good example of how research institutions can try to take greater responsibility for scholarly communication, since it would put librarians in charge of managing access to papers released as a result of the OSTP Memorandum.

However, you will know that publishers have proposed an alternative model based on CHORUS. The aim of CHORUS is to ensure that it is publishers rather than librarians who manage access to these papers, and it demonstrates their wish to remain firmly in control of scholarly communication, even after research papers have been made OA.

How would you respond to someone who argued the following: Since the research community is finding it difficult to fill repositories (a point frequently made, not least by the Finch Report), and both difficult and time-consuming to create the necessary infrastructure to ensure repository content is optimally discoverable, might it not make more sense to outsource the task to publishers via initiatives like CHORUS? After all, CHORUS will deliver OA, and since publishers have greater resources they might be expected to undertake the task more effectively, and more quickly. Moreover, since it is they who publish the papers in the first place, they already have all the content in place.

KS: My major concern about CHORUS is that the publishing community would have too much control of the scholarly communication system. A number of large publishers have already demonstrated that they don’t support the principle of open access (remember PRISM).

Frankly, the interests of publishers often lie elsewhere and they may be motivated by things such as profit margin not the public good.

On the other hand, at the core of the mission of the university and the library is the advancement and dissemination of knowledge. It seems to me that the world’s collective knowledge created through research should rest in the hands of long-term actors whose raison d’etre is to ensure that it is preserved and remains accessible to all.

CHORUS may seem like an appealing option for the US agencies at the moment, but the long-term implications are that the research community will have little control or ability to influence the future directions of scholarly communication if we take that route.

I’m also very concerned about the costs of such a system. Article processing fees are already way too high for many researchers, especially in developing countries. The recent study of APCs undertaken by the Wellcome Trust and others found that the average per article APC is $1,418 USD for open access publishers. I don’t believe this can scale globally and will ultimately result in disadvantaging a large number of researchers who can’t afford to pay.

RP: You are right that speed and effectiveness is one thing, cost and ownership something else. And as you suggested earlier, if the research community were to take greater responsibility for managing access to research it could hope to “alleviate some of the inflationary aspects of scholarly publishing and enable us to have more influence on future directions.”

This reminds me of what your colleague Eloy Rodrigues said to me last year. The future of scholarly communication, and its cost to the research community, he suggested, will depend on whether there is a “research-driven”’ transition to open access or a “publishing-driven” transition (in order words, whether the transition prioritises the needs of the research community or the needs of publishers). I would think that the competing SHARE and CHORUS initiatives are representative of these two approaches, and this suggests to me that in the coming years we will see publishers and librarians jostling for control of the scholarly communication system. And if that is right, the institutional repository will surely become a key battleground in the struggle.

Would you agree? And if it wants to ensure a “research-driven” transition to OA what should the wider research community be doing in your view?

KS: The choices that institutions make now about how they are going to invest in scholarly communications are absolutely critical.

First of all, I think the Green Road is key. We must collectively build and maintain a global system of repositories. It introduces competition into the system and will act as an important deterrent to arbitrary price increases by publishers.

It will also demonstrate the important role that institutions play in the stewardship of research outputs. To that end, institutions should devote more resources to their repository operations in order to improve repository services and increase the size of their collections.

Secondly, we should encourage and sponsor the development of new publishing models and value-added services that conform to our vision.

In terms of repositories, this would include better cross-repository discovery services, text mining capabilities, disciplinary views, and the development of overlay journals. Leslie Chan, for example, makes the case that the distinctions between “journal” and “repository” are increasingly blurred and that “mega-journals” are essentially repositories with overlay services.

We should be participating in projects that demonstrate the added value of repositories and repository networks across the research life cycle. Of course, this will require that we take some risks, which is a difficult case to make in hard economic times to (often) risk adverse organizations.

Global discussion

RP: You said that the way in which scholarly communication develops will vary “according to content type, discipline, and region.” Certainly, as OA develops we do appear to be seeing distinctive regional differences emerging. For instance, where the pay-to-publish gold OA model is being pushed heavily by the UK and The Netherlands there is still more of a focus on green OA in North America. Meanwhile, in Africa and Latin America a repository-based publishing model currently appears to dominate.

As things stand I would expect to see the Global North increasingly move to a pay-to-publish gold OA model and the Global South to a free-to-publish/free-to-read repository-based publishing model similar to that pioneered by SciELO and AJOL. If that proves the case, however, will it be the best outcome in a global research environment?

When I spoke to Dominique Babini last year she said “[W]e owe ourselves a global discussion about the future of scholarly communication”. And she added, “Now that OA is here to stay we really need to sit down and think carefully about what kind of international system we want to create for communicating research, and what kind of evaluation systems we need, and we need to establish how we are going to share the costs of building these systems.”

This would seem to imply a more global approach than we are currently seeing develop. Would you agree with Babini? If so, who should organise the global discussion she has called for, and who should take part in it?

KS: Yes, I agree, and I would add that we should consider carefully the unintended consequences of adopting the various models.

“What kind of system do we want to create for communicating researcher?” I would propose that we want one in which all researchers can access and contribute to, regardless of geographic location or discipline; and where the knowledge created is assessed on its real value, rather than on the region from which it emerges or the so called “impact” of the journal in which it is being published.

A dual system as you describe above is not ideal and I believe it will create inherent inequalities across the regions. Especially if we continue to rely on impact measures that do not reflect the quality of the research, but rather serve to prop up the traditional publishing system.

I believe there is a general lack of awareness in the “north” about the “southern” perspective and that we do need to ensure that the voices from the south are heard.

In terms of the global discussion, we already have a number of international forums for exchange: the funding agencies have the Global Research Council; libraries have organizations such as the SPARCs and IFLA; the repository community has COAR; and, publishers have their own venues.

UNESCO, and the governments represented there, has also become interested in open access. We could begin the global discussion by facilitating greater dialogue across these different stakeholder organizations.

One missing but very important link is the research community. It’s clear that many researchers have not been sufficiently engaged with the issues of open access to understand the nuances. For example many researchers still equate open access with open access journals. So we need a mechanism for bringing those communities into the discussion as well.

It is illuminating to note that a parallel global discussion is currently occurring in the area of research data through the Research Data Alliance (RDA). It has been comparatively easy in the context of research data to bring together the key stakeholders — researchers, data repositories, institutions, and funding agencies — to adopt a common vision and agree on practical strategies for moving forward.

Why haven’t we been able to do that for publications? The essential difference is that for publications, there are some parties that have a significant financial interest in maintaining control of the system. This makes the global discussion far more challenging.

RP:  Thank you very much for taking the time to speak with me.


Stevan Harnad said...


Kathleen Shearer is right that the Green Road is the key -- but effective Green OA mandates are the motor.

Repositories are near empty. Repository functionality can always be improved, but no improvement of repository functionality will provide their missing content. That content will only be provided (by the researchers who produce the research) if the researchers' institutions and funders require (mandate) that they provide it, immediately upon acceptance for publication, as a prerequisite for research performance evaluation and funding.

There are currently well over 3000 repositories worldwide but fewer than 300 Green OA mandates worldwide, and many of them are weak, ineffective mandates (compare ROAR and ROARMAP).

What needs to be done on now is (1) for the institutions and funders that have already adopted Green OA mandates to upgrade to what has proved to be the strongest and most effective mandate model (Liège/HEFCE) and (2) for the many remaining institutions and funders adoption have not yet mandated Green OA self-archiving to likewise adopt the Liège/HEFCE model.

Until then, COAR’s mission to “enhance the visibility and application of research outputs through a global network of open access digital repositories” will remain unfulfilled and unfulfillable.


The Liège ORBi model: Mandatory policy without rights retention but linked to assessment processes.

HEFCE/REF Adopts Optimal Complement to RCUK OA Mandate

The only way to make inflated journal subscriptions unsustainable: Mandate Green Open Access

Stuart Shieber said...

Richard Poynder raises with Kathleen Shearer the issue of “dark” deposits in Harvard’s DASH repository. He implies that the presence of a subset of articles in which the deposited article is not made available is a grave failing.

Ms. Shearer’s response is exactly right: “I’m not sure about the specific case of DASH, but this really speaks to the collection policy of the individual repository.” I’ve explained DASH’s collection policy with respect to dark deposits in some detail in my 2011 post “The importance of dark deposit”. In a nutshell, part of the role of the repository is an archival one – to collect the research output of the institution as broadly as possible. We therefore don’t turn articles away. But we also don’t distribute articles from DASH when we don’t hold rights to do so or when authors for whatever reason request us not to. (The particularly unrepresentative case of Professor Knoll’s large number of dark deposits is an instance of the latter. We do not, as a matter of principle and policy, unilaterally override the wishes of authors.)

I believe our collection policy – to deposit articles into DASH even if we cannot (yet) distribute them by right or author preference – is reasonable, and in fact preferable to policies that disallow dark deposits. I won’t rehearse the seven reasons why, though I especially commend Reason 5 to the interested. The best evidence that we are doing something right is that the over 17,000 articles in DASH have been downloaded almost 3.2 million times, and at an increasing pace. Fixation on the subset that we avoid distributing in deference to legal or moral rights seems to miss the point.

Richard Poynder said...

@Stuart: I appreciate your taking the time to comment. I did not intend to imply that Harvard is guilty of a grave failing, and I do not believe I am fixated.

My objective in the Q&As I undertake is to draw out some of the many issues that surround OA. In the case of the comments that you refer to my aim was to air a topic concerning OA repositories that many puzzle over, and seems to me to be something deserving of discussion. As I say, thank you for responding.

Harvard describes DASH as a “central, open-access repository of research by members of the Harvard community”.

In that context, I made the following points:

1. The DASH repository is widely viewed as (and promoted by Harvard as) a poster child of the OA movement.

2. From what Kathleen Shearer said I inferred she believes OA repositories should always provide access to the full text (as well as the metadata) of papers they showcase.

3. In any case, I think most people expect the full text of papers deposited in an OA repository to be both present and freely available to all.

4. Certainly DASH has been criticised for not providing free access to the full text of all the papers it contains (and I linked to one such criticism).

5. While the criticism I pointed to dates from several years ago DASH does today still contain details of papers for which it does not provide access to the full text (and I linked to five examples that I found at random).

6. Some of these papers do not provide a link to the full text, others provide a link to the publisher’s site, where the reader is asked to pay up to $35 to view them. I suggested that this cannot be described as OA.

I understand your point about dark deposits. I believe the standard practice for dealing with such deposits is to provide a Request Copy Button in the repository so that researchers can automatically request that the author send them a copy. As I indicated, I could not find a Request Copy Button in DASH. Perhaps I missed it?

Congratulations on the number of downloads.

Stuart Shieber said...

I apologize for my overstrong language (“grave”, “fixating”). It’s so hard to get tone right in comment threads.

We do refer to DASH as a “central, open-access repository of research by members of the Harvard community”, and I think it is just that. Peter Suber’s take on our use of the phrase “open-access repository” is trenchant I think:

“We call something a ‘bookstore’ even if it also sells magazines and greeting cards. We call something a ‘grocery store’ even if it also sells spatulas and pot holders. We call something a ‘drama’ even if it includes some comedy, and vice versa.

“An ‘OA repository’ may have some dark content without contradiction. The ‘OA’ in the name designates the primary purpose of the repository, not the exclusive purpose, just as with ‘book’ in ‘bookstore’ and so on.

“If a fuller description of a bookstore were ‘store for books, magazines, greeting cards, mugs, and pens’, then a fuller description for DASH would be ‘repository for open access and preservation’. It’s fair and commonplace to abbreviate these long descriptions into short names that leave out much of the descriptive nuance. If it’s fair to say ‘bookstore’, then it’s fair to say ‘OA repository’.”

By the way, the proportion of dark material in DASH is relatively small, about 10%, and we’re looking into what portion of that might be “brightened”.

Richard Poynder said...

Thank you for this further response

You do not say why the DASH repository does not have a Request Copy Button.