Frédéric Villiéras, vice-provost for research at Université de Lorraine says: “We are delighted to strengthen our support to the DOAJ. This choice is in line with other financial supports towards open platforms that were decided earlier this year. We are fully committed to supporting open science infrastructures such as the DOAJ and we hope that other french research institutions and libraries will follow. “
You can’t run a service like DOAJ on a shoestring, or fund it on a hand-to-mouth basis, which is how DOAJ has been funded until last year, so large amounts of sustained funding are really crucial to the database’s security, stability, scalability and, ultimately, survival. Last month, DOAJ came under attack and the whole process from diagnosing the problem to resolution revealed just how fragile the DOAJ ecosystem can be under those circumstances.
Below is a guest blog post is by Steven Eardley, the DevOps Engineer at our technology partner, Cottage Labs. You may remember that the SCOSS initiative was launched last year to provide DOAJ and Sherpa/RoMEO with levels of sustainable funding that would ensure the survival of 2 services which had been earmarked as vital to the open access movement. I asked Steven to write this post because, after DOAJ took a battering, I thought it would be useful for our community to understand exactly what happens when DOAJ becomes unresponsive, what it takes to keep DOAJ up and running, and what a difference sustainable funding can make for a service like DOAJ.
Thanks for reading!
The Directory of Open Access Journals experienced some sustained and repeated downtime events mid July to mid August, which required significant intervention from Cottage Labs. This post describes what happened and explains how we mitigated the issue, and illustrates some of the inherent vulnerability which needs to be addressed.
As well as the search interface on the website, searches to our database come from the OAI-PMH interface, our Atom feed, and ‘widgets‘ embedded on users’ sites. The DOAJ also has an API for programmatic interactions with our content allowing, for example, publishers to upload their content. To keep the ElasticSearch index performing correctly for the rest of the site, we tend to tweak the rate limit for the API during periods of high load.
During the period mentioned, we saw an elevated request rate via the query endpoint which resulted in all of the site functions slowing down. Eventually the index was overloaded and the DOAJ website stopped responding. Recovery tended to be fairly quick and site performance would be restored for a number of hours. However, the downtime was persistent and reoccurring, sometimes multiple times in a day.
While we could see the quantity of queries to our index had at least doubled from its normal rate, we couldn’t identify the source of the queries causing the trouble and couldn’t look for long-term trends. We will find resource to improve this in the future but monitoring of this kind is costly and is currently out of budget. Therefore, we mainly had to diagnose via inference. Our immediate action was to disable the application components one by one, endeavouring to pinpoint the source of the excess load. We turned off the API, the Atom feed, the back end editorial admin pages, and finally the OAI-PMH interface. The latter had a small impact on the index stability – we saw reduced memory use in the index and it was coping somewhat better with the load. This pointed us towards the source of the problem: deep paging.
Deep Paging – ElasticSearch’s kryptonite
Deep paging essentially means that someone or something is scrolling through a large number of objects sequentially leading to high memory commitment; the server must hold the entire context in memory to get deeper and deeper into the results. We determined that someone was sending thousands of requests per hour directly via the query endpoint, instead of through the API, and furthermore essentially trawling through all results over a long period of time. That is to say: they weren’t using a provided feature, rather bypassing the features and using a hidden route.
Each time the system went down, the traffic took a while to resume, presumably because the trawl had to be started again from the beginning.
Our next task was therefore to block these external sources of deep paging. First, we changed the permitted referral parameter (e.g. “`ref=ui“` in the request URL) and the traffic dropped off. Success? Not quite. These parameters are easy to spoof so it was of no surprise that not long later, the site went down once again amid further high traffic. We tried the same configuration change again, this time changing the referral parameter to “`please-use-our-api“` – the ‘begging’ approach to reducing system load! Unfortunately that bought us even less time than before.
At this point we categorised this as something like a denial of service attack – although the chances are it wasn’t intentionally malicious, circumvention of our countermeasures and being a direct source of instability for thousands of other users is at least inconsiderate!
User agent blocking was the next measure we attempted to reduce the load. Since the query route is really just for our use, we decided it was reasonable to block some programmatic user agents, such as the Node.js http module, or the wget and curl utilities. Again we saw a temporary alleviation of the high load.
Of course, user agent strings are sent from the client from each request, so they can be altered at source. By now we’d identified the user’s IP address, and watched a few intermittent requests come in as the user was iterating and figuring out what we’d changed. Within a couple of hours, the torrent of requests resumed and our infrastructure once again creaked under the strain.
Since we’re not keen on IP banning and other heavy-handed measures we decided to address the problem within our application code and give the query endpoint some rules to filter out the sorts of traffic that give the index trouble: we now cap the results to a reasonable number expected to be viewed from the search page; we enforce paging limits to disallow queries that page a long way into our data set; and the query endpoint will reject a request that doesn’t look like it came from a user.
We were a little complacent about our query endpoint’s obscurity. For a long time we’d planned to tighten up the rules on the search interface but, with DOAJ’s long list of other developments, it wasn’t a priority. Our endpoint was open and available to function like a little bonus undocumented API and this was stretched to the limit when it was being used to harvest all of our data rather than just facilitating search functions. This is a lesson for the design stage of the software process: keep the roles of system components as distinct and single-purpose as feasible.
In addition, this persistent grab of our data constituted a feature request – it should be easier to download our entire DOAJ dataset and we will be implementing that over the coming weeks.
We have a handful of improvements coming soon in response to these lessons:
- We’ll be re-writing our OAI-PMH interface to mitigate deep paging and high memory use.
- The Search API will also see some more restrictions to paging depth and number of results – this will more closely reflect its role as a discovery interface and not a harvest endpoint.
- Create a dump of our entire dataset on a regular basis, that way the entirety of the DOAJ’s data is more easily accessible to the public without strain on our infrastructure.
Watch out here and on our social media channels for further announcements regarding these new developments.
Thanks to dozens of quick-acting universities and institutions in Australia, Europe & North America, a new effort to secure Open Science infrastructure is off to a strong start. More than 680 000 Euros have been pledged to support DOAJ and SHERPA/RoMEO already.
“This being a new concept, we are very encouraged by the response of the community at this point. We’re taking this as an early indication that we will, in time, reach our full three-year funding goals for both the DOAJ and SHERPA/RoMEO, two truly vital services. But for this to happen, we will need to continue to see growth in support; far more institutions committing to funding.”
Lars Bjørnshauge, Managing Director and Founder of DOAJ, said: “We are very pleased to see that many of the long standing members of DOAJ have decided to increase their financial support, based on the fees recommended by SCOSS and for the next three years. We are looking forward to welcoming even more members and support shortly. We will do our very best to live up to the ever-changing expectations from the community.”
And “the ever-changing expectations from the community” are, in a nutshell, why SCOSS and sustainable funding models are so important to DOAJ, SHERPA/RoMEO and open access in general. Open access is still a relatively young publishing model and is growing rapidly. New markets are opening up to open access publishing, each of them bringing new challenges with them, and technology is creating new opportunities and functionality in publishing. DOAJ must remain at the forefront of these developments and that means having a stable financial foundation upon which work can continue.
If you’d like to know more about SCOSS please go to http://scoss.org/ and if you would like to make a financial contribution using the SCOSS model, or indeed, any amount at all, please contact Lars: email@example.com.
Lotte Faurbæk and Hanne-Louise Kirkegaard from the Danish Agency for Science and Higher Education (Styrelsen for Forskning og Uddannelse) answer our questions.
-Your organisation has been supporting DOAJ for some years now. Why is it important for the Danish Agency for Science and Higher Education to support DOAJ?
We regard DOAJ as an authoritative data source on Open Access Journals. We use DOAJ in the Danish Research Indicator to verify the data quality of the journals in our database, which consists of over 300,000 journals (both Opens Access and toll). Additionally, whenever we get a suggestion to accept a new journal to our list of publication channels that should generate points in the indicator, we check the status in DOAJ, to make sure it lives up to the criteria for acceptance. DOAJ is also an important part of the project called “Nordic lists”, which is a project supported by NordForsk, where the Nordic countries with research indicators collaborate to enhance the data quality of their national lists of publication channels.
-What is the Danish Agency for Science and Higher Education doing to support that development? Do you have any exciting projects underway?
In 2014, the Danish Ministry of Higher Education and Science adopted a national strategy for Open Access to research articles from publicly funded institutions. The strategy The strategy has an ambitious goal, stating that already in 2022, 100% of the articles should be freely available via the Internet. Though, the Danish Open Access Indicator showed that only 36 percent of scientific publications produced at Danish universities were Open Access in April 2018. So, we are far from reaching the ambitious target, and a revision of the strategy – including scaling down the Open Access targets – is under way.
-What are your personal views on the future of Open Access publishing?
I think it is an irreversible trend. Though, the transition towards 100 pct. Open Access will happen at a slower pace than aimed for in the EU. In the EU Council Conclusions on Open Science the OA target is 100 pct. OA in 2020. Full OA will probably not happen at the speed desired due to a lot of reasons. Some of the reasons are: different Open Access approaches in EU member states and third countries – green, hybrid, green etc. -, the current lack of merit of OA compliance, the reluctance among publishers towards green Open Access, including big publishers imposing extraordinary long embargoes on scientific articles – 24 months or more.
– What do you think that the scholarly community could do to better support the continued development of the Open Access movement in the near future?
· The rationale behind Open Science/Open Access must be communicated better to the public, and OA should be a political priority – both at national and institutional level.
· National and institutional policies on Open Access must be adopted, implemented, monitored and enforced.
· Change of culture among researchers towards openness is needed and could be supported by a change of the current merit system
· Research funders must mandate and monitor OA
· Universities must unite and collectively negotiate economically sustainable subscription deals – including OA – with the publishers (bargaining power).
– Much has been said recently about whether open access is succeeding or failing, particularly in terms of the original vision laid out by the Budapest Open Access Initiative in 2002. Do you think that open access has fallen short of this vision, or has it surpassed expectations?
I think we are under way, but not as fast as one could hope. More needs to be done, as we said in the previous question.