In
a recent investment report, analyst Claudio Aspesi
concluded that a new front had opened up in the Open Access (OA) debate. Writing in April, Aspesi noted that academics
are “increasingly protesting the limitations to the usage of the information
and data contained in the articles published through subscription models, and —
in particular — to the practice of text mining articles.” Aspesi is right, and
a central figure in this battleground is University of Cambridge chemist Peter
Murray-Rust. A long-time
advocate for open data, Murray-Rust is now spearheading an initiative to draft a
“Content Mining Declaration”. What is the background to this?
Peter Murray-Rust |
What
Murray-Rust wanted to do, he explained, was to capture the “embedded data”
contained in the tables, charts, and images published in science papers, along
with the “supplemental information” that often accompanies papers. To do this,
he had developed a variety of software tools to mine large quantities of
digital text. Having extracted the data he then wanted to aggregate them, compare
them, input them into programs, use them to create predictive models, and reuse
them in a variety of other ways.
However,
he was having huge problems achieving this, not because of any technical issue,
but because of uncertainty over copyright and publishers’ insistence that a
licence to read journals does not encompass the right to mine them with
software.
To
add to Murray-Rust’s frustration, many of his colleagues were either unsympathetic
or uncomprehending. Even more galling, the Open Access movement — which should
have been a natural ally — was more interested in making papers freely available
to eyeballs, than to software. Even papers published in OA journals, he noted, are often released under licences that do not come with reuse rights.
In
pursuit of his dream, Murray-Rust became a formative voice in the creation of the
open data movement. Open
data, Murray-Rust explained to me in 2008, is data “free of any restraint on
access and on reuse.” Recently, however, governments have tended to lead the way in urging for open data, spawning a generation of data wranglers; open scientific information has often lagged behind, but is now beginning to be seen as a central issue.
Four
years later Murray-Rust is still frustrated. He is not, however, a
man to give up, and he continues his advocacy today under the rubric of “open
content mining”. Essentially, this is text mining plus. As Murray-Rust explains
today, he views the mining of scholarly journals as a hierarchical activity,
with content mining encompassing not just the mining of text and data, but other types of content too, including images, tables, graphs, audio, and video.
Simply
using the term “text mining”, he adds, “might imply that anything other than text
should be protected by the ‘content provider’. However, I and others can
extract factual information from a wide range of material.”
The
good news is that the research community is finally beginning to understand
what Murray-Rust has been “banging on about” for all these years, as are
research funders and governments, and Murray-Rust believes the door to what he
wants is finally beginning to open.
However,
he says, it is imperative that text mining advocates push hard at that open door
if they want to achieve their objectives. To this end, Murray-Rust recently convened
an ad hoc group of interested parties to draft what he calls a “Content Mining
Declaration” (disclosure: I am a member of the group).
####
If
you wish to read the rest of the article, and a short Q&A with Murray-Rust, please click on the link below.
I
am publishing the interview under a Creative Commons licence, so you
are free to copy and distribute it as you wish, so long as you credit
me as the author, do not alter or transform the text, and do not use
it for any commercial purpose.
To read the rest of the text (as a PDF file) click HERE.
No comments:
Post a Comment