The Long-Term Preservation of Open Access Journals

The long term preservation of open access journals is one of the 7 criteria for the DOAJ Seal because DOAJ believes that it is an extremely important business process which a publisher of academic content should commit to. This couldn’t be more applicable than in the Global South where financial support and rigorous standards around journal publishing aren’t always available and, sadly, journals tend to just disappear from the Internet without warning. This is a huge problem for the academic footprint of the Global South, not to mention the hundreds of authors whose published papers just aren’t online any more and cannot be retrieved or ever cited.

When DOAJ established its criteria for the Seal in 2014, we were conscious that anything with a cost associated with it effectively put up a barrier to those low income or financially unstable journals to getting the Seal. DOAJ is committed to smooth that path as much as possible. In 2013, DOAJ announced a working agreement with CLOCKSS, one of the archives included on our application form, to seek out funding for a joint project which would get as many of DOAJ’s long-tail of single journals archived and preserved as possible. Unfortunately, those plans didn’t come to fruition and since then, the archiving and digital preservation landscape has changed somewhat.

What remains to be done is clear however: we must help ALL journals get into an archiving and digital preservation program and therefore I am delighted to welcome this guest post by Craig Van Dyck, the Executive Director of the CLOCKSS Archive.

Thanks for reading.

Dom, DOAJ Operations Manager


Users of scholarly content rely upon long-term access to that content. Scholarly research is long-lived, and users need to be able to re-access content repeatedly.

One concern about digital scholarly journals is that they could disappear from the web, which would undermine scholars’ ability to access the materials that they need.

In response to this concern, several Preservation services are available. These services work somewhat differently, but they all aim to ensure the long-term availability of scholarly content on behalf of end-users. Prominent services are CLOCKSS and Portico in the US, Scholars Portal and the Public Knowledge Project Preservation Network (PKP PN) in Canada, and CINES in Europe. Publishers are welcome to participate in any or all of these services. And some national libraries also have archival collections.

In today’s environment, it is considered best-practice for a scholarly publisher to include its content in a preservation service. To receive the DOAJ Seal, journals must be included in a preservation system.

In this post, we will focus on CLOCKSS, with some reference to PKP PN, because those two services both use the LOCKSS technology, which is arguably at the high-end of the spectrum of preservation solutions.

LOCKSS Technology

LOCKSS stands for Lots of Copies Keep Stuff Safe. The technology was invented at the Stanford University Library about 20 years ago. It relies upon multiple copies of the digital content being hosted at geographically distributed nodes. The software (which is open source) includes a unique polling-and-repair mechanism. The multiple nodes are constantly exchanging information about the content that they hold. If one node reports a difference vs. the other nodes, that one node is out-voted by the other nodes, and the variant node’s piece of content is replaced by the correct content from one of the other nodes. In this way, the archive is “dark”, meaning that end-users do not access the content, but the technology ensures that the data is in good repair.

The CLOCKSS Archive

  • The C in CLOCKSS stands for Controlled. This is because CLOCKSS uses twelve servers located at blue-chip libraries around the world, all with first-rate infrastructure and security. CLOCKSS is a free-standing 501(c)(3) charitable non-profit organization, using the LOCKSS technology and working with the LOCKSS technical and operational teams at Stanford, to preserve scholarly content for the long-term. CLOCKSS is certified as a Trusted Digital Repository. In its Trustworthy Repositories Audit & Certification report by the Council for Research Libraries, CLOCKSS received the only perfect score for technology.
  • CLOCKSS includes many Open Access publishers. For example, 24 publishers using the open source OJS publishing system are preserved in CLOCKSS. In total, CLOCKSS is preserving over 20,000 journal titles, with over 30 million journal articles and 75,000 books, growing rapidly each year.
  • One unique aspect of CLOCKSS is that when content is “triggered” for access, CLOCKSS makes the content freely available to all, under a Creative Commons license, which is a sign of the commitment to the concept of Open Access. A “trigger” occurs if a journal has disappeared, or will soon disappear, from the web. To date CLOCKSS has triggered 53 journals.
  • CLOCKSS can access publishers’ journals in two different ways: by harvesting the content from the publishing platform; or by the publisher providing the content to CLOCKSS by FTP.
  • Another unique aspect of CLOCKSS is the governance structure. The Board of Directors is comprised half by libraries and half by publishers. The scholarly community itself is thus responsible for the policies and practices of CLOCKSS.
  • Publishers sign an Agreement with CLOCKSS, which governs rights and responsibilities. There is a small annual cost for participating in CLOCKSS. CLOCKSS is financially sustainable, which is an important element for a long-term preservation archive. 350 libraries around the world, as well as 250 publishers, contribute to CLOCKSS’s sustainability.

Public Knowledge Project Preservation Network (PKP PN)

  • The Public Knowledge Project is a multi-university initiative developing (free) open source software and conducting research to improve the quality and reach of scholarly publishing. PKP is based at Simon Fraser University in Canada, which is where the Open Journal Systems (OJS) software was originally developed.
  • The PKP Preservation Network is an additional capability that enables easy long-term preservation of journals using OJS version 2.4.8 or higher. PKP PN uses the LOCKSS preservation software.
  • There are currently 800 journals preserving their content in PKP PN.
  • There are no fees for participating in the PKP PN. A journal manager must agree to Terms of Use.

Conclusion

It is strongly recommended that scholarly journals and books should be preserved for the long-term in a preservation system. Content that is not preserved is at-risk of being lost. And publishers who do not contribute their content to a preservation system are at-risk of not being considered a serious publisher. The value of long-term preservation is well worth a small cost.

Craig Van Dyck
Executive Director, CLOCKSS Archive
cvandyck@clockss.org

Applications: a note about Archiving and Preservation

One of the questions in our Application Form asks: ‘What digital archiving policy does the journal use?’ (Question 25). The words “archive” and “archiving” are used frequently in academic publishing and more often than not refer to very different things so I want to add some clarity to what DOAJ is referring to with this question.

It is a sad truth that some online only, open access journals have disappeared offline without any trace, taking published articles with them. When those articles have no permanent article identifiers, nor have they been archived with an archival organisation, then they are potentially lost forever.

Long term deep archiving and digital preservation

Archiving and preservation plays an important role for all journals, particularly if those archives are ‘dark’ archives that have an intention of preserving materials for a very long time. They may have the ability to start serving content when the normal content source stops working. They may apply formal methods of preserving content to ensure minimal or no digital deterioration.

The 3 deep archiving schemes that we list in Question 25—LOCKSS, CLOCKSS, and Portico—are all recognised archiving agencies and are listed as such at The Keepers Registry (KR). More on the KR in a future post. LOCKSS (or Global LOCKSS Network) is essentially like a digital bookshelf where libraries have perpetual access to content to which they are entitled. CLOCKSS is a not-for-profit, dark archive, that preserves digital scholarly materials for the very long term, through a global and geopolitically distributed network of archive nodes. Portico is a company offering comprehensive archiving and preservation techniques.

We also recognise ‘PMC/Europe PMC/PMC Canada’ (PubMed Central) as a valid archiving option. They have a remit to preserve copies of research content that has been funded by public money. Unlike the previous 3 options, they convert the content they receive into their own format, archive copies and distribute copies to their own local repositories.

The final option in Question 25 is ‘a national library’ and we add this option because many (although not all) national libraries have a mandate to receive, via legal deposit, and preserve a copy of anything published in their countries. Although it doesn’t cover all countries, Wikipedia has a good list of such libraries.

What is not a deep archive

So let me quickly cover also what doesn’t count as a valid archiving option:

  • an online hosting platform (e.g. OJS)
  • a 3rd party aggregator (e.g. EBSCO) that you have licensed to reuse or distribute your content
  • a journal’s back issues or older articles made available on its own site (often, confusingly, referred to as the journal’s archives)
  • an institutional repository which often has author preprints and not the final article.

Hopefully this post has add some clarity to our archiving question but, as always, get in touch if you have any questions.