Open and Shut?: September 2008

Richard Poynder talks to Annette Holtkamp, an information professional at Germany's largest particle physics research centre Deutsches Elektronen Synchrotron (DESY). Holtkamp is a member of the Helmholtz Society Open Access working group, and the German contact for SCOAP³. More recently she has been heavily involved in the development of a new repository for the particle physics community called INSPIRE.

Annette Holtkamp, Information Professional, DESY

Physicists are widely credited with having begun the Open Access (OA) movement, on the basis that in 1991 theoretical physicist Paul Ginsparg created the physics preprint repository arXiv — an Internet-based service that allows physicists to electronically share their preprints with one another prior to publication in a scholarly journal.

In reality, however, the seeds of the OA movement predate arXiv by at least thirty years, when scientists working in the specific domain of particle physics — otherwise known as high energy physics, or HEP — began to routinely share their preprints via the postal system.

Of course scientists have always exchanged papers with one another by mailing copies of their papers to friends and colleagues. What was new, however, was that during the 1960s librarians at major HEP research institutions like DESY, the US-based Stanford Linear Accelerator Center (SLAC), and the world's largest particle physics laboratory CERN, began developing bibliographic tools and services to assist scientists share their preprints in a formulised fashion, thereby allowing researchers to exchange papers in a new and more effective way.

The service that came to dominate was SPIRES, which began life as a preprints collection and an associated card catalogue system created in the SLAC library in 1962. In 1974 this was computerised and combined with work being done at DESY, broadening the service to include bibliographic data on journal articles, conference proceedings, theses and books. Importantly, a number of SDI services were also introduced. These allowed HEP scientists to request that they be alerted when new papers in their field became available. Since the alerts included author contact details, scientists were able to communicate directly with one another in order to exchange papers, regardless of whether they personally knew each other.

When the Internet became widely available new features were added to SPIRES so that researchers could interrogate the database by email, allowing them to retrieve the information electronically. And in 1991, SPIRES became the first web-based database.

SPIRES has always been a bibliographic database. What was innovative about arXiv, therefore, was that it was the first freely available electronic repository containing the full-text of research papers. However, since SPIRES offered powerful search features, citation analysis tools, and was actively managed by librarians (who started inserting links into its records to allow users to go directly to the full-text on arXiv), researchers began to use SPIRES as a search engine for arXiv. As such, a symbiotic relationship developed between the two services.

Today SPIRES has around 770,000 records and is growing at around 4,000 new records a month. ArXiv, meanwhile, has 492,000 e-prints, with roughly five thousand new ones added each month.

But SPIRES is now over thirty years old, and had begun to creak. Consequently, in May a new service called INSPIRE was launched. INSPIRE is a joint initiative of SLAC, DESY, the US-based Fermi National Accelerator Laboratory (FNAL) and CERN.

The new database is based on a state-of-the-art Open Source software platform called Invenio, initially developed for CERN's institutional repository — known as the CERN Document Server, or CDS. It is planned to populate INSPIRE with content from both SPIRES and CDS — which has around one million records, half of which are full text — promising to make INSPIRE a significant new service for the HEP community.

What is the long-term objective for INSPIRE, and what does its development portend? An information professional at DESY for 25 years, Annette Holtkamp is ideally placed to answer these questions. Holtkamp has long been responsible for inputting journal and conference proceedings into SPIRES, and is directly involved in the development of INSPIRE. "Our ambition is to build a comprehensive HEP information platform hosting the entire body of HEP metadata and the full text of all OA publications", she says.

As such, INSPIRE will eventually become a full-text service like arXiv — something that in a recent survey HEP scientists said they wanted. Moreover, it will build on the strengths of SPIRES. To this end it was decided to use very sophisticated repository software which will, amongst other things, include a new search engine, a metrics system for measuring the impact of articles, and a number of specialist data management tools. It will also boast Web 2.0 functionality and, as with SPIRES, the data will be managed and curated by professional librarians.

What implications might this have for arXiv? Can we expect particle physicists to discontinue putting their papers in arXiv, for instance, and opt for INSPIRE instead? It would, after all, seem a more natural location for HEP researchers to deposit their preprints. Not necessarily, says Holtkamp. "HEP scientists have already a full commitment to self-archive their preprints in arXiv and there is no reason to change this. What INSPIRE will offer is the possibility to deposit material that users want to be preserved in addition to today's possibilities, such as theses or old preprints which were not submitted to the arXiv originally: indeed, authors are usually happy to make them available but arXiv is seldom used as such a forum."

INSPIRE needs also to be seen in the context of the wider OA movement. Unsurprisingly Holtkamp is a passionate advocate for Open Access. She is a member of the Open Access working group of the Helmholtz Society, and was on the working party that designed SCOAP³. And when Germany joined the SCOAP³ consortium she became the German contact for SCOAP³.

SCOAP³ is an ambitious project that hopes to "flip" the entire particle physics literature from today's primarily subscription-based model — in which researchers (or their institutions) pay to access published research — to an Open Access model, in which the HEP research community would instead pay to publish its research. (Not on an author-pays model, but through a single consortium that will facilitate the re-direction of subscription funds). In return, publishers would commit to making HEP papers freely available on the Web.

The development of INSPIRE, however, opens up the possibility that SCOAP³ could prove to be a transitory phase. If successful SCOAP³ would of course make all HEP research OA. But as a result of the so-called serials crisis — and growing disillusionment with the role that publishers play in scholarly communication -- some believe that it is time for the research community to, as they like to express it, "take back ownership" of its research.

In other words, the argument goes, it is time not only for researchers to stop signing over copyright in their papers (something traditional publishers generally insist on as a condition of publication) but for the research community to begin to take responsibility for distributing its own research too. Holtkamp is keen to stress that this is a minority view however. "The bulk of the community wants to preserve the journal system," she insists. "And this is definitely not on the SCOAP³ agenda."

Nevertheless, we should note that one consequence of the rapid rise in both subject-based and institutional repositories is that scholarly communication is moving from a journal model to a database model. This could see a second kind of flip take place: Where today researchers pass over their papers to publishers, who then become the official source and distributors of published research, in the future repositories could become the primary location for scholarly papers. They might also become publishing platforms, with publishers relegated to outsourced service providers paid simply to organise the peer review of the papers that have been deposited in repositories. This, for example, seems to be the model that the University of California is moving towards.

As such, SCOAP³ and INSPIRE are perhaps just the first signs of a far more radical revolution poised to sweep through scholarly communication; one that was never envisaged when HEP librarians began developing new tools to help particle physicists share their preprints with one another.

One related new development, for instance, is the growing desire — nay necessity — for what is called "Open Data". In contrast to research papers, the issue here isn't just that of ensuring that the scholarly community retains ownership of its research, nor just that it ensures the data generated by experiments is preserved in a machine-readable format (Bearing in mind, for instance, that with data formats being constantly updated and superseded we currently face the threat of a digital dark age).

As important as these issues are, the main concern for particle physicists today is that in the light of the increasing complexity of the experiments they conduct it isn't enough simply to keep data in its raw form (even if it is constantly migrated to new machine-readable formats), but to retain with it the knowledge necessary for anyone who did not take part in the original experiment to reuse and reinterpret it. As Holtkamp puts it, "We will have to develop what we call a parallel format — that is, a format that not only preserves the data itself, but also the necessary knowledge to be able to interpret it."

What is the nub of the issue here? Simply that in a world of Big Science — where conducting experiments requires building massive particle accelerators like CERN's new Large Hadron Collider (LHC) — it would be a huge waste of money if the data produced could not be reused. It cost 6 billion Euros to build the LHC for instance. If the HEP community allows the 15 petabytes of data the LHC will generate each year to wither on the vine, and rapidly become obsolescent, it would be sheer profligacy.

As CERN director general elect Rolf-Dieter Heuer put it to me in a recent interview for Computer Weekly, "Ten or 20 years ago we might have been able to repeat an experiment. They were simpler, cheaper and on a smaller scale. Today that is not the case. So if we need to re-evaluate the data we collect to test a new theory, or adjust it to a new development, we are going to have to be able reuse it. That means we are going to need to save it as open data."

For the moment, however, the particle physics community has no way of doing this. "This is a task for the experimental physicists, working with the IT people, not for library-based information professionals like me," says Holtkamp. "And right now they are just at the beginning of the process. The people conducting the experiments are going to have to sit down with the IT people and work out how to do it."

She adds, "I am pretty confident that Open Access will be the standard of the future for scientific papers, although it remains unclear when Open Data will become the norm."

What is striking is that even though the seeds of the OA movement date back over forty years, the full implications of the changes set in train by HEP scientists are only just beginning to be apparent. As they do, we can surely expect a much broader and far more profound revolution in scholarly communication than anyone could have predicted back then and in all disciplines, not just physics.

In addition to Open Access and Open Data, for example, we are seeing the development of a growing number of open lab notebook initiatives, projects like OpenWetWare, where scientists are able to post and share new innovations in lab techniques; initiatives like the Journal of Visualised Experiments (JoVE), an OA site that hosts videos showing how research teams do their work; GenBank, an online searchable database of DNA sequences; the Science Commons, a non-profit project focused on making research more efficient via the Web (e.g. by enabling easy online ordering of lab materials referenced in journal articles), scientist-friendly social networking sites like Laboratree and Ologeez, and a growing emphasis on the use of Open Source software by scientists.

Collectively this broader and rapidly-developing phenomenon has been dubbed Open Science. As Live Science senior editor Robin Lloyd put it recently, "Open science is a shorthand for technological tools, many of which are Web-based, that help scientists communicate about their findings. At its most radical, the ethos could be described as 'no insider information.' Information available to researchers, as far as possible, is made available to absolutely everyone."

In the "Principles for open science" published on the Science Commons web site four main pillars are seen to open science: Open Access to research literature, Open Data, Open Access to publicly-funded research materials (cell lines, DNA tools, reagents etc.), and an open cyberinfrastructure.

So how might the research process look in this new world? We don't yet know. Holtkamp, however, has some ideas. "I can see a piece of research starting with a researcher simply putting an idea on a wiki, where it would be time-stamped in order to establish precedence," she says. "Others could then elaborate on the idea, or write a program or do some calculations to test it, or maybe visualise the idea in some way. Publication would then consist of aggregating all the pieces of the puzzle — all of which would be independently citable."

This surely assumes a future research environment in which there will be few if any scientific secrets and the traditional model of the scholarly journal will be increasingly marginalised, and perhaps entirely superseded. Above all, it implies that the research community will increasingly expect to retain ownership of its research, and it will take far greater responsibility for communicating and sharing its findings.

Of course it is the increasing complexity of science, and above all the rise of the Internet, that have made these developments inevitable. Certainly they would not have been possible without the Internet. One is nevertheless bound to wonder why the seeds of this revolution first sprouted in the garden of the HEP community, and why so early on? What is it about particle physicists that made them OA pioneers?

With a degree in sociology (as well as a PhD in physics), Holtkamp is better qualified than most to suggest an explanation. That physicists were so early in the game, she says, reveals something about their mentality. "If they encounter a problem they immediately want a solution. If nothing ready-made is available — a very common situation — their strong self-confidence and pronounced playfulness lead them to sit down and try to work it out themselves, and often with success"

It is a mindset that always saw physicists loath to have to wait months (or longer) before reading about a new piece of research. They have always wanted immediate, free access to the latest findings — something that the long lead time and subscription barriers inherent in the traditional journal publishing model could never properly satisfy.

From this perspective, one might want to suggest that it was not that the Internet provided the stimulus for scientists to change the way they did things, but that with its arrival physicists were finally able to share their research in the way they needed to. The Internet simply allowed them to satisfy a long-standing pent-up need?

If you would like to learn more about how physicists sparked the revolution now sweeping through scholarly communication, please download and read the interview below with Annette Holtkamp.

####

If you wish to read the interview with Annette Holtkamp please click on the link below. The PDF file that will download includes both the interview and this introduction.

I am publishing the interview under a Creative Commons licence, so you are free to copy and distribute it as you wish, so long as you credit me as the author, do not alter or transform the text, and do not use it for any commercial purpose.

If after reading it you feel it is well done you might like to consider making a small contribution to my PayPal account. I have in mind a figure of $8, but whatever anyone felt inspired to contribute would be fine by me.

Payment can be made quite simply by quoting the e-mail account: richard.poynder@btinternet.com. It is not necessary to have a PayPal account to make a payment.

What I would ask is that if you point anyone else to the article then you consider directing them to this post, rather than directly to the PDF file itself.

If you would like to republish the article on a commercial basis, or have any comments on it, please email me at richard.poynder@btinternet.com.

To read the interview (as a PDF file) click here.

Friday, September 12, 2008

The Open Access Interviews: Annette Holtkamp