In a series of enhancements to article metadata, we have released the third today.*
If you are familiar with DOAJ metadata then you probably already know that you can download a CSV file which contains the journal-level metadata for all the journals in DOAJ. It is updated every 30 minutes and is probably one of the most popular metadata services we have. It is certainly the best way to get an overview of all the journals in DOAJ.
Today, we’ve released a new version of that file which adds two new columns to it: the number of article records added to a journal in DOAJ and the date that the last article was added. (Columns BF and BG respectively)
The columns were added for two reasons:
To give greater transparency to the information which we display on our homepage where we state that 77% of the journals in DOAJ have article content loaded to them. This is slightly misleading because a journal which uploaded only 5 articles to us in 2013 is counted in that 77%. It is more interesting to know how recent the articles are from a particular title and how much content they have uploaded to us. While this information has always been available by selecting a journal ISSN and using the ‘articles’ filter in Search, we’ve never been able to show all the information in one place.I think that this development will be welcomed by all our users, especially publishers, librarians and those doing research on open access publishing developments.
To allow us to review those journals which have been awarded the DOAJ Seal and remove that Seal from those which are not supplying article metadata to us. Supplying article metadata is one of the 7 Seal criteria and we haven’t yet been able to check, in an efficient manner, which journals are sticking to their promise. [In the application form, we ask if journals “intend” to supply metadata to us.] It’s going to take deeper analysis to get the final figure but I can see very quickly that 25 journals are going to lose their Seal.
I’d love to know what you think about this development so, as usual, do leave a comment or a question, or email me directly: firstname.lastname@example.org.
P.S. The other development which I will post about soon is the removal of duplicated articles from the DOAJ database. More on that when I have it.
When we reject an application, the rejection email contains details about why the application was not successful and usually tells applicants that they must wait 6 months before submitting another application for the same journal to us. Why do we do this?
One reason is that it is an attempt to discourage repeat applications, made in haste, and we get many, many of those. Repeat and duplicate applications clog up the system and take our dedicated volunteers and team away from those applications which need some time spent on them. (In 10 months alone, DOAJ received 221 duplicate applications!)
The other reason is that many of the recommendations that we make in our rejection emails, recommendations made to help journals meet our criteria, take time to implement. Adding words to a website isn’t enough. Changes need to be implemented properly, communicated to stakeholders, tested, and managed. Some changes will require other parties to implement changes too. This all takes time. After the 6 months has passed, we welcome a new application but we ask that the journal website demonstrates very clearly that our recommendations have been put into practice and our editorial team will be very careful to check that all our recommendations have been implemented.
“We’re going to a hackathon and would love to work with DOAJ’s data! Do you have a dump of all DOAJ data?”
“Er, no. Sorry. I mean, we could get it for you but it will take a while. You could probably get it yourself if you know how to extract the JSON… When do you need it for?”
We’ve had this conversation a few times. Or had requests from eager individuals and organisations who want to use the rich offerings of the DOAJ metadata. They’ve told us of the wonderful things they could do with the data (slicing, reporting, analysing, apps, databases, software…) and we’ve never been able to help them in good time. But now, we can….
So what? Well, let me suggest to you why these are a good thing:
For the journal metadata, the CSV is really the only easy-to-use format. The journal data dump provides another way to do this.
The data dumps are updated weekly, so can keep you up-to-date on a reasonably short delay. (There is no change feed, just a full dump.)
When you want all of the DOAJ data for any reason, you can just take it!
Deep paging on the search API is no longer permitted – search is for search, not harvesting. The data dump allows you to harvest.
Whenever you want a subset of the DOAJ data, you can just download the data dump, then filter it locally for your needs. For example, if you are a publisher and you want to see all of your metadata in DOAJ, that is all in this data dump, and you can then filter by ISSN
You can use it to enhance any local data in your own system or database: you may have basic article metadata in your system, and you want to extend it with DOAJ metadata.
If you want to aggregate publications data from multiple sources, this is one way of quickly getting that information from DOAJ (versus using OAI-PMH).
These data dumps are more metadata rich than OAI-PMH
You may want to use the data for analysis or data mining or other forms of research, or hackathons.
The data dumps are also useful as a test dataset.
So there you have it. Despite a rather awkward name, data dumps are A Good Thing.
We’d love to know what you think, so do please leave a comment here or send us feedback: email@example.com
Dr Xenia van Edig, Business Development, answers our questions.
-Your organisation has been supporting DOAJ for a few years now. Why is it important for Digital Science to support DOAJ?
As an information hub for all those interested in high-quality peer-reviewed open-access journals, the DOAJ is an extremely important platform. It is independent and committed to high-quality and peer-reviewed open access in all fields of STEM and HSS. With the re-vetting of all its content in 2016 and with the introduction of the DOAJ seal, its mission to increase the visibility, accessibility, reputation, usage, and impact of open-access journals has become even more evident. For us as an exclusively open-access publisher, it is therefore only logical that we support DOAJ.
–What benefits does being indexed in DOAJ bring to your journals?
Indexing in DOAJ increases the visibility of our journals and demonstrates that our journals adhere to best practices in open-access publishing. Furthermore, many libraries and institutions understandably only provide financial support for article processing charges (APCs) for journals which are indexed in DOAJ and therefore receive an external quality seal.
-Do you think that the DOAJ has been and/or still is important for the development of Open Access publishing?
-What is Copernicus doing to support that development? Do you have any exciting projects underway?
Copernicus Publications has been an open-access publisher since 2001. In the past 18 years, we have helped many learned societies and academic institutions launch new open-access journals or transform their existing journals into open-access journals. In addition, we have been promoting open access in the peer-review process since 2001 by implementing the Interactive Public Peer Review, which is now applied by 20 of the 42 journals we publish. The current rise of preprint servers and the formation of initiatives promoting open peer review prove that this peer review model is still innovative.
These past years have focussed on making content accessible. The next ongoing challenge is to overcome the barriers regarding APC payments. We recently launched a national licence in Germany, with many universities and research centres participating. Together with our partners in libraries and funding bodies, we strive towards a seamless open-access experience for authors without worrying about APC payments.
-What are your personal views on the future of Open Access publishing?
I hope that further progress will be made in accelerating the transition towards a world where research outputs are publicly available and reusable. However, I fear that current major initiatives are focussing too much on the big legacy publishers – leaving out smaller publishers and those who are purely open access. While “read and publish” deals might be a step in transforming the publishing ecosystem, funders, consortia, and institutions should not forget about those who stood up for open access when the topic was not on “everyone’s lips”. Furthermore, even though many journals published by Copernicus are financed via article processing charges, APCs are not the only business model for open access.
-What do you think that the scholarly community could do to better support the continued development of the Open Access movement in the near future?
I think the current evaluation system for grants, tenure, etc., which still heavily relies on the journal impact factor, favours established journals and puts newer publication venues and innovative outlets at an unfair disadvantage. Of course there are many open-access journals with high impact factors, but there is a structural disadvantage since many open-access journals are newer.
In addition, faculty and students need to be more educated about open access. For many academics, their academic freedom to freely choose a journal for their articles seems to hinge on the fact that they do not want to deal with access and reuse rights. Many academics seem to think that everything is fine because they have access to the literature through the subscriptions of their institutions’ libraries. Furthermore, they do not have to deal with APCs when publishing in subscription journals. This means a lot of advocacy for open access is still needed.
-Much has been said recently about whether open access is succeeding or failing, particularly in terms of the original vision laid out by the Budapest Open Access Initiative in 2002. Do you think that open access has fallen short of this vision, or has it surpassed expectations?
Whether something is a good idea or not cannot be measured in number of articles or successful journal transformations. I think that most people involved in the open-access movement had hoped for a quicker transition. However, only because it has been slower than envisioned, the vision of BOAI – the public good of “the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it” – is still the goal to achieve. Around 17 years ago open access was not on the political agenda like it is today (e.g. Plan S). Therefore, I would say the movement has been successful.
A recently published article on a comparison of blacklists and whitelists draws the conclusion that “In the DOAJ, more criteria relate to transparency of business and publishing practices rather than to the quality of peer review. This indicates a risk of falsely endorsing the legitimacy of a journal based on its transparent nature, while at the same time ignoring journals’ lack of best practices in peer review”
Perhaps we could have done a better job in explaining how DOAJ assess the quality of journals. When the DOAJ list started in 2003, peer review was one of four criteria used in the evaluation.
After the upgrade in 2014, to include more than 40 criteria, it is certainly true that most of these pertain to the transparency of publishing and business practices.
At the same time however, peer review has remained a key criteria for judging the quality of journals that apply for inclusion in the DOAJ index. So the aspect of quality peer review weighs heavily in the assessment of the quality of journals.
In contrast to what the authors of the article state on peer review procedures DOAJ requires peer review by at least two independent reviewers. Page 13 ; “Both blacklists and whitelists include criteria stating that a journal needs to have a “rigorous” peer review system in place (see list of criteria in supplementary file 2). Both whitelists do not define “rigorous”, however, Cabell’s whitelist implies that peer review should be anonymous and conducted by at least two reviewers.
As stated in the article, peer review is one of the intermediate verifiable criteria. That means that when a publishers states on the website that they have for instance double blind peer-review our editors usually check the correctness of this by verifying an accompanying description of the peer review process. However in case of any doubt concerning the journal’s quality, a special editorial team will do a more detailed analysis on quality criteria including peer review practices, editorial board competence, content comparison of published articles, plagiarism checking and other factors. It is safe to say that our users will have a hard time finding journals in DOAJ with no or inadequate peer review procedures in place.
Because peer review until now has been the holy grail of scientific quality control, it is understandable that people link the quality of, or even the mere presence of peer review with the quality of a given journal. The relationship is unfortunately not so clear cut as many want to believe. Peer review by good connections, friends of friends, even colleagues is often seen. In addition independent peer review panels of experts come to very different conclusions regarding one and the same scholarly work. Because of this, the entire peer review procedure is in a state of rapid change. Indexing services like the DOAJ have to be aware of the shortcomings of the current system and therefore avoid overrating peer review as THE criteria to assess quality. We think that good publishing practices other than peer review and good quality editorial boards are at least as important and more easy to verify as details of peer review practices.
I want to end with a short word on blacklists. We note that blacklists are depending for a large part on difficult verifiable criteria and subjective judgment, while DOAJ depends largely (77%) on easily verifiable criteria related to transparency and business practices. Blacklists also tend to give a lasting sting to the reputation of journals. More often than not, there exist inadequate and non-transparent procedures for a journal to be removed from a blacklist after improving a journal. For this reason blacklists are often inaccurate and out of date. This risk is even more prominent for one of the lists in the PeerJ study, Bealls list, which has officially stopped to exist but has been resurrected by some people with very unclear policies regarding updates, inclusion and removal of journals from the revived list. In addition, blacklists can never be inclusive, while whitelists are inclusive (ie. most journals in the whitelist will be of good quality while many blacklisted journals will not be predatory at all).
It will not come as a surprise that we strongly recommend to users to use whitelists and not blacklists to check the quality of journals. Let it also be clear that we do not believe in any complementary nature between both list types and there is another important difference between the (DOAJ) whitelist and blacklists: in contrast to blacklists, DOAJ is not in the business of stigmatizing publishers, rather we spend substantial resources helping journals to improve.
At DOAJ, we work hard to maintain a high level of recency and accuracy in our metadata. All of our metadata is freely available, in various formats, to those who want it. This means that any errors in it get distributed freely around the web. To reduce these and negate the knock-on effect, DOAJ works with its technical partners, Cottage Labs, to clean the metadata.
On 21st February, we will be releasing two small but fairly important enhancements to our article upload function. The two changes are as follows:
Spaces will be stripped from DOIs and full text URLs upon ingest. This is to improve matching in our database on DOIs and URLs. We use DOIs and full text URLs to version articles, thereby allowing corrections or enhancements to article metadata to be uploaded without the existing version being deleted first.
We regularly receive metadata with badly formatted URLs or DOIs, with preceding spaces, trailing spaces or spaces right in the middle of a DOI or URL. This means matching doesn’t occur, we end up with multiple versions of the same article in the database and an increased number of duplicates.
Duplicates in the same file will be prevented upon upload To upload article metadata to us, a file of article metadata must be sent to our article ingester. We will introduce an enhancement here which will prevent a file from uploading if duplicates within it are detected. We carry out no such checks at the moment.
Both enhancements will be implemented in all 3 article ingest front ends: the XML uploader, the manual article uploader, and the API.
Both changes are two small steps toward a larger project of eradicating all duplicated content in the database. We don’t know yet how much article content is duplicated but it will be enough to cause a noticeable reduction in the current number of articles in DOAJ (3,767,076 articles at the time or writing).
If you have any questions about these enhancements or are wondering if you have duplicates in your own article metadata, do please leave a comment here.