Open and Shut?: October 2005

Wednesday, October 19, 2005

Comment on "Time to take the red pill"

Richard,

I was going to add a comment to your Open and Shut? blog article, "Time to take the red pill", but I don't have a blogger account and I don't want to sign up just to comment. So I'm sending my comment to you.

You said: "Clearly there is a valuable potential role here for information professionals, should they choose to seize the opportunity. After all, what better way for disenchanted librarians to make themselves indispensable in a new and relevant way - not by playing their traditional role as gateways to information (putting themselves between the information and the user), but as facilitators able to help researchers and other data creators collaborate and share information. If this means abandoning some of their traditional skills for new ones then so be it. Now there's a topic for discussion at Internet Librarian International 2006!"

Yes, and if librarians want to learn how to do this sooner than late 2006, and in the context of (EPrints-based) institutional repositories, they can sign up to the new series of EPrints Training Courses.

The next one in the UK will be announced soon and will be around the turn of the year. Watch this space.

Or contact Alma Swan or Siobhain Dales for more information.

I wouldn't describe these courses as a 'red pill', but you can if you wish.

Steve Hitchcock

Monday, October 17, 2005

Time to take the red pill

Listening to presentations, and talking to delegates, at Internet Librarian International 2005 (ILI) last week, I was reminded of the film The Matrix. In the movie, the main character is offered an opportunity and a choice: he can take the red pill and see the truth; or he can take the blue pill and return, comfortably unaware, to the illusion that is the world of the Matrix, and life will simply carry on as before.

With the Internet continuing to challenge their traditional skills and roles, information professionals face a not dissimilar choice: embrace the reality of the new world they inhabit, or seek to deny it, clinging to a now outdated illusion of reality.

Disconcerting

For while information professionals initially welcomed the arrival of the Internet, many have become increasingly concerned that it poses a significant threat to their settled world.

This concern was all too evident at ILI, with both delegates and presenters clearly of the view that many traditional notions of information science are under attack from the Web. Long-standing classification systems, for instance, are threatened by newer notions of categorisation; hierarchical indexing is having to give way to the flat indexing of the Web; and taxonomies face growing pressure from new-fangled concepts like folksonomies.

For information professionals — who pride themselves on the many skills and techniques that they have developed over the years — this is both disorientating and distressful. If that were not enough, the Web challenges the very notion that information intermediaries have a role to play any more in a networked world.

None of these anxieties are new, of course, but the depth and intensity of the pain information professionals are experiencing was all too palpable at the London event. Certainly there was a desperate need to appear relevant. As one librarian plaintively put it, "We need to find ways to put ourselves back between the information and the user."

That said, some information professionals — generally the younger ones — are embracing the new world. Michael Stephens, a special projects librarian at St. Joseph County Public Library in Indiana, for instance, gave a presentation in which he talked with great enthusiasm about how libraries can exploit wikis, instant messaging, and podcasts to enhance the services they provide for patrons.

Stephens also bravely volunteered to defend folksonomies from the caustic tongue of UKOLN's Brian Kelly who, amongst other things, publicly critiqued Stephen's "inadequate" use of tags when labelling photographs of his dog Jake on the social networking site Flickr. Kelly's aim was to demonstrate that folksonomies are a pale shadow of traditional classification, even in the hands of a trained librarian.

Grumpy old men

All in all, it felt at times as if ILI was awash with grumpy old men muttering bad-temperedly about the good old days, and the shocking ignorance of the young.

This attitude was best exemplified in the keynote given by information industry personality Stephen Arnold. In a paper entitled Relevance and the Future of Search, Arnold complained that the traditional view of relevance in online searching was under siege on the Web.

Specifically, information science's notion of precision and recall (where precision measures how well retrieved documents meet the needs of the user, and recall measures how many of the relevant documents were actually retrieved) was being destroyed by the practises of web search engines, particularly Google.

This state of affairs, he argued, is being driven by the desire to monetise the Web, not least through Google's pioneering of advertising-driven search models. When a user does a search on Google, for instance, the resulting pages of "organic results" (i.e. the product of Google's search algorithm) are placed alongside links paid for by advertisers. Unfortunately, said Arnold, over 90% of users do not differentiate between the paid listings and organic results.

Entirely alien

The situation is aggravated, he added, because people don’t generally click through many pages of search results. This encourages owners of web sites to exploit Google's search algorithms in order to push links to their sites higher up Google's search page. Indeed, said Arnold, a large and powerful Search Engine Optimisation (SEO) industry has been created precisely in order to sell services aimed at "fixing" search results on Google and the other main search engines. As a consequence, he complained, relevance on the Web is now a concept entirely alien to anything understood by information professionals.

As the market leader, and primary innovator, it was Google that attracted the full force of Arnold's ire. “Indexing is not what you learned in library school," he said. "It’s what Google wants. Effectively, SEO is the new indexing model."

In other words, the notions of comprehensiveness and objectivity long promulgated by information professionals as central to online searching have given way to a process whose raison d’être is to falsify search outcomes to satisfy commercial interests. "The SEO market has grown up to take advantage of this new idea of relevance," said Arnold.

To underline the extent to which traditional notions of relevance have been undermined, Arnold cited research done by the UK-based Internet magazine .net, which found only a 3% overlap in search results listed on Google, Yahoo and AskJeeves when the same search term was input. "When is a hit relevant?" Arnold asked rhetorically. "Where is the boundary between SEO and ‘real indexing?'"

Worse, added Arnold, Google's dominance is growing all the time. Whereas in the previous quarter it had had a 51% share of weblog referrals in the US, for instance, this figure is now 62%. (blog referral logs collect information on who visits a website and how they arrived there).

Intellectual dishonesty

After his presentation I asked Arnold why he objected to these developments. "It's intellectually dishonest," he replied. "These shortcuts trivialise indexing." Moreover, he added, it is dangerous. "If a medical term is misused, it could affect a person's life if the appropriate article is not found. Likewise, if a company doesn’t find the right patent document it could cost that company a lot of money. So I really disapprove.”

But is it really likely that a corporate lawyer or a doctor would rely on Google for an exhaustive patent or medical search? And are information consumers really as naïve or stupid as Arnold implies?

As Arnold himself acknowledged, most users probably don’t care if their search results are paid-for ad links, or the product of Google's algorithm. If someone is looking for a restaurant, for instance, what they want to find is a good-enough restaurant, not a long list of every possible eating house available, categorised by thirty different criteria, and listed by the number of available tables! After all, most of the sponsored links turn up on pages where users are looking for products or services. In this case Google is simply acting like a yellow pages directory.

Moreover, even if it is true that web users don’t always understand the way search engines work, they are learning all the time. In fact, as a general rule, users know as much as they need to know, and this is usually more than information professionals give them credit for knowing!

All in all, it was hard not to conclude that Arnold reflects the grumpy old man school of information science. As he himself admitted. "I'm old. I'm dying out."

For all that, while deprecating SEO techniques, Arnold was happy enough to offer the audience five "cheats" they could use in order to ensure their web sites received higher rankings on Google.

He also included in his presentation what amounted to a sponsored link. After explaining his five cheats, he told the audience they could find another five in his eBook on Google (The Google Legacy, How Google's Internet Search is Transforming Application Software), and invited them to buy it ($180 to you Madam!)

Essentially, Arnold's view seemed to be that much is awry on the Web, but there is little to be done but accept it.

They're watching us!

But Arnold had a second point to make. While many still view Google as a search company, he argued, it was now far more than that. Currently offering 56 different services, he explained, Google is in the process of creating a completely new operating system — one moreover up to 40 times faster than anything that IBM or HP could offer, and based on anything between 155,000 to 165,000 servers.

This too Arnold clearly deprecated, explaining that this "Googleplex" (a term he has appropriated from the name of Google's Mountain View headquarters) now encircles the world like the carapace of a tortoise — making Google the new AT&T; an AT&T, moreover, not subject to any regulation. Clearly in likening the Googleplex to a new operating system Arnold was also portraying Google as the new Microsoft.

At this stage Arnold's presentation began to sound more like a conspiracy theory than factual exposition. Confiding to the audience that Google founders Larry Page and Sergey Brin had refused to speak to him once they realised his was a critical rather than adulatory voice, and referring to a series of patent thickets that Google has built around its technology (patents which his lawyer had, for some inexplicable reason, advised Arnold not to put up on the Web), he went on to complain that he had never provided his address to Google, yet the company nevertheless knew it. "Google knows where I live," he said dramatically. "I didn’t tell them. They are watching me!"

And for those librarians still harbouring any illusion that by scanning books and making them available on the Web Google represents a force for good, Arnold depicted Google Print as a smokescreen. “The scanning of books is a red herring," he said, adding that Google was like a magician into whose hand a quarter suddenly appears as if from nowhere. "Everyone looks at the quarter, not the magician.”

Fortunately, Arnold's presentational mode appeared to owe more to his predilection for drama — and a canny sense of how to market a new book — than to paranoia. It also had moments of humour. Fifteen minutes into his presentation, we were all evacuated after the hotel fire alarm was set off, giving Arnold the opportunity to yell: "You see — I'm so hot! This is what I use in bars to get women."

Later, when we were allowed to re-enter the hotel to hear the rest of Arnold's presentation, the conference organiser announced that the alarm had been triggered by an old man smoking a cigar in his bed. "And that old man," promptly quipped Arnold, "is none other Gregorovich Brin, Sergey's uncle."

Not only is Google watching Arnold, it seems, but its founders have deployed their extended family to silence him!

Real or perceived threat?

But how seriously should we take Arnold's prognostications? He is, after all, not the only commentator to depict Google as the new Microsoft, or AT&T, and thus a significant monopoly threat.

Interestingly, most now view Microsoft as somewhat grey at the temples. This more relaxed view, moreover, is a consequence not of the antitrust case against the company — after all, Judge Jackson's order to break up Microsoft was subsequently overturned by a federal court — but from the growth of new competitors like Google, and the rise of the open source software movement.

That said, Arnold is right to deprecate the growing commercialisation of the Internet, and now that Google is a public company we can surely expect its "do no evil" ethos to come under increasing pressure from shareholders keen to see the return on their investment maximised.

But leaving aside Arnold's dire predictions of an all-seeing, all powerful Googleplex encircling the world and pulling everyone into its monopolistic grasp, it is certainly worth asking how much of a monopoly threat Google represents to web searching. The answer seems to be: "Not as much of a threat as Arnold implies". Many, for instance, believe that large generic search engines are set to see their dominance diminish rather than increase.

Commenting in an EcommerceTimes article earlier this year, the associate editor of SearchEngineWatch.com Chris Sherman argued that the bigger the Web grows, the less useful generic search engines become. As a consequence, he said, "We're seeing a real rise in vertical search engines, which are subject-specific or task-specific — shopping, travel and so on." He added: "We're going to see more of that going forward as people become more sophisticated and as these specialised search engines become better at what they do."

Neither is Sherman a lone voice. Commenting in the same article Gartner Group's Rita Knox said: "People still need information on the Internet, but a more generic search capability like Google is going to be less useful."

Self-fulfilling prophecy

Time will tell. But the fundamental problem with Arnold's dark view of the future is that conspiracy theories tend to have a debilitating effect on our ability to act. We become less inclined to ward off the object of our fear if we believe it to be inevitable, creating a kind of self-fulfilling prophecy.

Arnold is not the only one to be disenchanted with the growing commercialisation of the Web. Nor is he the only one to deplore Google's role in this. In a recent paper called The Commercial Search Engine Industry and Alternatives to the Oligopoly, for instance, Bettina Fabos, from the Media Research Center at Budapest University of Technology and Economics, makes very similar points. Her conclusion, however, is very different.

Rather than portraying the situation as inevitable, and advising us to get over it, she concludes: "[T]o realize the web’s educational and non-commercial potential, educators and librarians need to move away from promoting individual skills (advanced searching techniques, web page evaluation skills) as a way to cope with excessive commercialism" and instead "address the increasing difficulties to locate content that is not commercial, and the misleading motives of the commercial, publicly-traded internet navigation tools, and the constant efforts among for-profit enterprise to bend the internet toward their ends."

In other words, rather than rushing around like Private Frazer in the BBC Sitcom Dad's Army shouting "We're all doomed", information professionals should adopt a more positive approach. Why not take the initiative and turn the technology in a more desirable direction? Why not fill the web with non-commercial content, and then build non-commercial tools to help users locate that content?

Indeed, says Fabos, some are already at work doing just this. She commends, for instance, the activities of initiatives like the Internet Scout Project, which enables organisations to share knowledge and resources via the Web by putting their collections online; she commends Merlot, the free and open resource providing links to online learning materials; and she commends tools like iVia, and Data Fountains, designed to allow web users discover and describe Internet resources about a particular topic.

Open Access

As it turns out, one of the more organised and advanced initiatives with the potential to help create a non-commercial web is the open access (OA) movement — a movement, in fact, in which librarians have always played a very active role.

For while the movement's original impetus was solely to liberate scholarly peer-reviewed articles from behind the subscription firewalls imposed by commercial publishers, there are grounds for suggesting it could develop into something grander, in both scope and scale. How come?

As scholarly publishers have consistently and obdurately refused to cooperate with the OA movement in its attempts to make scientific papers freely available on the Web, the emphasis of the movement has over time shifted from trying to persuade publishers to remove the toll barriers, to encouraging researchers to do it themselves by self-archiving their published papers, either in institutional repositories (IRs), or in subject-specific archives like the arXiv preprints repository and PubMed Central, the US National Institutes of Health free digital archive of biomedical and life sciences papers.

And to assist researches do this, the OA movement has created an impressive collection of self-archiving tools, including archival software like Southampton University's Eprints, and MIT's DSpace; a standardised protocol to enable repositories interoperate (the Open Archives Initiative Protocol for Metadata Harvesting , or OAI-PMH); and OAI-compliant search engines like Michigan University's OAIster, which harvest records from multiple OAI-compliant archives to create a single virtual archive. In this way hundreds of different repositories can be cross-searched using a single search interface — much like Google searches the Web. Essentially a vertical search engine, OAIster currently aggregates records from over 500 institutions.

But while the initial purpose of the Open Archives Initiative (OAI) was limited to scholarly papers, it has become apparent that its aims and its technology could have wider potential. As the OAI FAQ puts it, OA advocates came to realise that "the concepts in the OAI interoperability framework — exposing multiple forms of metadata through a harvesting protocol — had applications beyond the E-Print community." For this reason, the FAQ adds "the OAI has adopted a mission statement with broader application: opening up access to a range of digital materials."

How might this work? Two years ago Clifford Lynch published a paper in which he argued that there is no reason why an institutional repository could not contain "the intellectual works of faculty and students — both research and teaching materials — along with documentation of the activities of the institution". It could also contain, he said: "experimental and observational data captured by members of the institution that support their scholarly activities."

Indeed, Lynch added, repositories in higher educational establishments could also link with other organisations in order to extend and broaden what they offer. "[U]niversity institutional repositories have some very interesting and unexplored extensions to what we might think of as community or public repositories; this may in fact be another case of a concept developed within higher education moving more broadly into our society. Public libraries might join forces with local government, local historical societies, local museums and archives, and members of their local communities to establish community repositories. Public broadcasting might also have a role here."

Need not end there

And it need not end there. Why not use the OAI technology as the framework for an alternative non-commercial web; one encompassing as much as is deemed sufficiently valuable that it could benefit from being accessible outside the confines, constraints and biases of the commercial web. If users wanted to find a restaurant they could go to Google; but if they want to do a medical search then the non-commercial web would be a better choice. Data searchable within this alternative web would no doubt need to meet certain standards — in terms, for instance, of provenance, and depth and range of metadata etc.

Self-archiving purists discourage such talk, fearful that it may distract the movement from the priority of "freeing the refereed literature". But the reality is that as research funders like the Wellcome Trust and Research Councils UK begin to mandate researchers to self-archive their research papers, so the number of institutional repositories is growing. And once a university or research organisation has an institutional repository there is an inescapable logic for that repository to develop in the kind of directions proposed by Lynch.

It may be, of course, that in the end OAI technology is not appropriate for this job. It may also be wise not to distract the OA movement from its primary aim. But it is perhaps now only a matter of time before some such phenomenon develops. Initiatives like Google Print and Google Scholar have served to highlight growing concerns at the way commercial organisations are now calling all the shots in the development of the Web. And it is these concerns that are encouraging more and people to think in terms of non-commercial alternatives.

What we are beginning to see, says Fabos, is a "small but growing countervailing force to the commercialisation of 'the universe of knowledge.'" What will drive these efforts, she adds "is the understanding that, in our commercial system, educators, librarians and citizens interested in nurturing a public sphere must work together to control the destiny of the internet — or somebody else will."

Clearly there is a valuable potential role here for information professionals, should they choose to seize the opportunity. After all, what better way for disenchanted librarians to make themselves indispensable in a new and relevant way — not by playing their traditional role as gateways to information (putting themselves between the information and the user), but as facilitators able to help researchers and other data creators collaborate and share information. If this means abandoning some of their traditional skills for new ones then so be it. Now there's a topic for discussion at Internet Librarian International 2006!

The fact is, it's time for information professionals to stop bemoaning the loss of some perceived golden age, and take control of the Web. In short, it's time to reach for the red pill!

A comment has been made on this article.

Tuesday, October 04, 2005

China Mulls Open Access

As Research Councils UK (RCUK) continues to deliberate over its policy on public access to scholarly papers (a final announcement has been delayed until November), the Chinese Academy of Sciences (CAS) has also begun mulling over the question of open access (OA).

"According to my contact in China," says Jan Velterop, director of open access at the STM journal publisher Springer, "the Chinese Academy of Sciences is now in the process of organising a group of prominent scientists to issue an open call to Chinese funding agencies, and research and educational institutes, to promote open access."

To this end, adds Velterop, the Academy is currently working on a draft document for scientists to review. It is also in the early stages of developing institutional operating policy guidelines for CAS to enable it to support open access.

These developments come in the wake of an international meeting held at the Beijing-based CAS in June. Since then, staff at the Academy's Library — led by Dr Xiaolin Zhang — have been considering various ways of ensuring that Chinese researchers deposit copies of their research reports and journal articles in academic repositories.

For the moment, says Wu Yishan, of the Institute of Scientific & Technical Information of China (ISTIC), there is nothing specific on the table. "So far I haven't seen any concrete measure, let alone mandate, to promote open access in China," he says, "Only a lot of calling."

Many, however, are convinced that it is only a matter of time now. One of those attending the Beijing meeting was Frederick Friend, a consultant who works with the UK's Joint Information Systems Committee (JISC) and the Open Society Initiative (OSI). It was quite clear, he says, that there was already a strong commitment to open access in China prior to the meeting. "Particularly striking were the words of Professor Qiheng Hu [Vice Chair of the Chinese Association for Science and Technology], who in her keynote address referred to open access as 'a necessity to promote capacity building in science and technology'"

Nevertheless, says open-access advocate Stevan Harnad, a lot of lobbying remains to be done, "both of the high-level administrators and of the researchers." It also remains unclear what kind of policy might be implemented.

The hope, adds Harnad, is that China will opt for a similar (but improved) model to that currently being considered by RCUK, rather than emulate the policy introduced in May by the US National Institutes of Health. "If China copies the flawed embargoed-access 'recommendation' along the lines of the NIH, then it will be setting open access back instead of moving it forward," says Harnad.

The NIH policy was watered down following aggressive lobbying by STM publishers. Thus where the initial proposal had been to mandate NIH-funded researchers to make their papers available six months after publication, the final wording only "strongly encourages" grantees to authorise public release of their papers "as soon as possible" after publication, and at least within 12 months of publication.

What open-access advocates would like to see, therefore, is for Chinese researchers to be mandated to self-archive their papers immediately on publication, with no embargoed period. Open-access publishers like BioMed Central and the Public Library of Science would also clearly like Chinese scientists to be told to prefer open-access journals over traditional subscription journals when they publish their papers.

Since China is watching the UK closely, says Harnad, much will depend on the final RCUK policy. "If it retains the [currently proposed] two fatal opt-out loopholes ('self-archiving is mandated only if/when your publisher allows and only if your institution already has a repository') then it too is yet another opportunity lost."

He adds, however: "I am still optimistic that the Brits will manage to sort it out — and that China will emulate a loophole-free RCUK policy. Then the other nations can follow the green lamps of Britain and China."

If China does embrace open access, says Friend, the potential benefits to research are considerable. "Open access to the vast quantity of research undertaken in China will benefit not only the people of China in their economic and social development, but also communities across the world, particularly if open access to research reports from other countries continues to grow."

Regardless of any official policy, says BioMed Central's Matthew Cockerill, Chinese researchers are already embracing OA. "Until now, China's best researchers have tended to publish in foreign subscription-only journals, which are often inaccessible to Chinese researchers. Open access has the potential to rectify this situation, and Chinese researchers are recognising this. BioMed Central has seen a rapidly increasing number of submissions from China this year, and also recently signed up its first independent open-access journals based in China."

This is not surprising, says Key Perspective's Alma Swan, who gave a presentation at the Beijing meeting. "The amount of Chinese science being published is growing rapidly but much of it remains largely invisible to the rest of the world. Although some of the best is published in 'western' journals — there has been a 1500% increase in the number of Chinese articles indexed by ISI over the last 20 years — an enormous amount of Chinese research is tucked away in Chinese journals that are hard to get at. There are 2000 Chinese university journals, for example, and the vast majority of those are not indexed by any of the major indexing services. Chinese science is hiding its light under the proverbial bushel."

In fact, the potential benefits of OA for China could be greater than might at first seem. In a self-archived preprint due to be published in the journal Research Policy, Ping Zhou and Loet Leydesdorff argue that while China is the fifth leading nation in terms of its share of the world's scientific publications, its total citation rate is still low compared to other nations. This suggests that if — as is frequently maintained — open access increases citation levels, then in embracing OA China could not only increase the visibility of its research, but the impact of that research too.

Some traditional publishers have also welcomed developments in China. "The Chinese call to funding agencies, scientists and institutions to promote open access is a most encouraging development," says Velterop. "Springer has very good contacts with the Chinese scientific community and looks forward to serving the community by offering the option of publishing with full open access in our journals." (Although it remains a traditional subscription-based publisher, in July 2004 Springer launched an "Open Choice" option for researchers wishing to embrace open access. And in August this year it recruited Velterop from BioMed Central to head up the company's open access initiative).

Others, however, will be less enthusiastic, not least, perhaps, the Association of Learned and Professional Society Publishers (ALPSP), which has been actively lobbying against the proposed RCUK policy, on the grounds that it would "inevitably lead to the destruction of journals"; a claim that has been refuted by open-access advocates.

Indeed, it is hard not to conclude that the ALPSP is behaving somewhat irrationally over OA. When, for instance, the Wellcome Trust — the UK's biggest non-governmental funder of biomedical research — posted details of its own mandate on the Liblicence mailing list yesterday, the chief executive of ALPSP, Sally Morris, immediately responded: "I'd like to ask how the Wellcome Trust feels about the fact that it appears to be inciting (nay, forcing) its researchers to breach the terms of the contracts some of them they may have signed with publishers."

The Wellcome Trust mandate requires that — from 1st October — all papers emanating from grants it has awarded will have to be posted on PubMed Central (PMC),the free-to access life sciences archive developed by the National Institutes of Health. The papers will also have to be made freely accessible within 6 months of publication.

It wasn’t immediately clear what Morris meant in accusing the Wellcome Trust of forcing researchers to breach the terms of their contracts with publishers, but what is surely clear is that the ALPSP's increasingly aggressive resistance to OA threatens to alienate it from the research community.

As one of the delegates who attended the Beijing meeting — speaking on condition of anonymity — put it: "Whatever the outcome of the current initiatives in the UK and China, those commercial publishers and learned societies who continue to resist open access are holding their fingers in a dyke that will, sooner or later, inevitably burst. The only issue for them now, therefore, is whether they learn to swim in the open waters, or choose to drown."