"A rethinking in the minds of researchers is necessary"

Guest comments: Scientists provide insights into the "reproducibility of research results"

The reproducibility of research results is one of the fundamental quality criteria in science. The demand for transparency of the scientific knowledge process aims to ensure the repeatability of scientific studies or experiments. The project "Opening Reproducible Research" (o2r) of the Institute for Geoinformatics of the University of Münster and the University and State Library of Münster deals with this topic. In February, the project team organized a workshop on the reproducibility of analysis based on software and data. At the workshop, the participants spoke about the challenges of reproducibility through digital research as well as the current state of the art and existing infrastructures. In four guest comments, scientists describe how the topic of reproducibility of research results is handled in their field.

Staying power for more reproducibility in spatial sciences
By Daniel Nüst, Institute for Geoinformatics

Daniel Nüst<address>© Sergey Mukhametov</address> — Daniel Nüst
© Sergey Mukhametov

In the project o2r (https://o2r.info), we develop new methods that help scientists to achieve a higher reproducibility of their work. Researchers can publish more than an unchangeable document with limited interaction in a printed journal, but instead share a holistic contribution of all data, software, be it used consciously or rather hidden, and the documentation, for example the actual article. Besides technical solutions, a rethinking in the minds of researchers is necessary, both concerning the evaluation of scientific products of others and in their own daily work habits. Achieving this cultural change is a major challenge for many scientific fields, because it requires endurance, effective communication and persuasion. The work of Dr. Markus Konkol, who investigated the status of reproducibility in the geosciences, is an important component for the persuasion. To this end, he identified articles in which data and software were described, but the procedure and analyses were not sufficiently explained - thus it was not possible to repeat the analyses of many of the papers and the conclusions remain partly unconfirmed.

In addition to this capture of the state of reproducibility in geosciences, I, together with international colleagues, was able to initiate a cultural change in the geoinformatics community. At the international conference series AGILE, organized by the "Association of Geographic Information Laboratories in Europe", new guidelines for reproducible papers are recommended for the first time this year (https://doi.org/10.17605/OSF.IO/CB7Z8). The workshops not only helped geoinformaticians to improve their skills in handling data and software, but also provided the basis for developing the practical and effective guidelines for scientists working with spatio-temporal data. We are very satisfied with our efforts so far. The approach taken provides a blueprint for other scientific disciplines: The guidelines comprise several stages and are deliberately designed to emphasize commendable efforts, but not to exclude any methodology. In this way, established working procedures can be further developed towards greater transparency and reproducibility, and leading scientists can be rewarded with attention. All materials for reproducible articles are available under open licences and their use will hopefully increase the quality and reusability of scientific work.

When is research data safe?
By Dr. Nils Schuhmacher, research group Developmental Psychology

Dr. Nils Schuhmacher<address>© privat</address> — Dr. Nils Schuhmacher
© privat

It is important for researchers that their research data is secure. This includes protecting them from unauthorized access. Data that is particularly sensitive or worthy of protection exists, for example, if study participants can be recognized in videos or if they provide very personal information. This applies, among other things, to the field of infant research in which I work. However, researchers are often uncertain about the question: Is my research data stored securely enough? This feeling of uncertainty can take on questionable forms: There is the case of research data stored exclusively on hard drives in massive vaults. In another case, an employee who was the only person with access to the research data of a project has died. This put the results of several years of research work at risk.

In many studies, however, the research data is collected anonymously or it is subsequently made anonymous, i.e. the data is altered in such a way that it is no longer possible to identify participants. This makes the data less "sensitive". But here, too, there are currently uncertainties and many researchers ask themselves: When will my data be sufficiently anonymised? Because of this feeling of uncertainty, researchers often prefer to keep their anonymised research data to themselves rather than share it with others. This behaviour conflicts with current standards for good research, such as transparency and reusability of data, which can save future research projects a lot of time and money. My wish, therefore, would be to have a technical system that supports me in the research process and helps me with the automatic storage and anonymization of sensitive data. This could be a useful platform for many different disciplines and give researchers at the University of Münster a pioneering role.

From a Crisis of Confidence to Open Science
By Dr. Lisanne Pauw, research group Couple & Family Psychology

Dr. Lisanne Pauw<address>© privat</address> — Dr. Lisanne Pauw
© privat

Whether instilled by the academic fraud case of Diederik Stapel in 2011, Daryl Bem’s article on extrasensory perception, or the perhaps dormant problem all along, since almost a decade the field of psychology has faced a crisis of confidence. This crisis instilled the realization of many problematic research practices in the field. For example, experiments often relied on too few participants to draw reliable conclusions, researchers reported unexpected findings as if they were predicted from the beginning and journals fell prey to a publication bias favouring papers that find support for their hypotheses. But even when having good intentions, scientists often have to make decisions (e.g., about how to analyse their data) that can fundamentally affect their results.

While problematic, the upside of this increased awareness is that it encouraged the whole field to develop and implement methods to tackle these problems. In the past decade, many attempts have been made to enhance transparency, reproducibility and replicability. The open sharing of data, code and analysis plans is strongly encouraged by universities and journals. Many platforms have been established that allow people to preregister their studies and analysis plans (e.g., Open Science Framework, https://osf.io/). Journals try to fight publication bias by “registered reports”, a new article format in which the research proposal is peer-reviewed before conducting the study, emphasizing research quality over outcomes. Replication studies are encouraged by new research grants and specific sections in journals.

Having recently started working at the University of Münster, I am happy to learn that the Psychology department has also founded an Open Science Initiative (https://osf.io/x3s5c/). I did my PhD at the University of Amsterdam, where the Psychology department actively promotes open science among employees and teaches students about good (and bad) research practices, and I look forward to taking this baggage with me to foster open, transparent and reliable research here at Münster University.

Software development to facilitate reproducible research
By Dr. Ben Stöver, research group Evolution and Biodiversity of Plants (Prof. Dr. Kai Müller)

Dr. Ben Stöver<address>© privat</address> — Dr. Ben Stöver
© privat

The annotation of scientific data with metadata documenting how raw data was generated and which analysis steps lead to derived data, offers an efficient way to improve the reproducibility of scientific studies. Phylogenetic trees (representing evolutionary relationships, e.g., between species) – as an example – are often inferred from DNA sequences of different species, which in turn were sequenced from tissue samples from specimens (e.g., collected plants). Ideally, a published phylogenetic tree would contain metadata that links a specific archived specimen for each species and the sequence generated from it, as well as metadata to document the analysis steps and the software used to reconstruct the tree. This principle of annotation can also be applied to other data types and can therefore be used in many areas of science.

Although file formats (e.g., NeXML) that enable appropriate annotation of trees and other phylogenetic data were developed years ago, they are still used relatively little compared to older formats that do not allow this. To change this, we at the group for Plant Evolution and Biodiversity (Prof. Dr. Kai Müller) are developing a number of different software components that make it as easy as possible for scientists to use the new file formats and the necessary annotation. At the same time, the interoperability with existing analysis software is ensured, even if it does not yet support the corresponding formats itself. Specifically, we develop graphical editors for biologists to easily process and annotate the main data types of phylogenetics, as well as, software libraries for easy reuse in other bioinformatics software. Our software is freely available at http://bioinfweb.info/.