Diese Informationen stehen aktuell nur auf Englisch zur Verfügung.
The identification of open data in the institutions' body of publications is done semi-automatically using an algorithm developed by QUEST (ODDPub Tool, OSF project page). The publications identified to be relevant are then reviewed manually. The process is described in detail in a research article (Iarkaeva et al., 2024), as well as a protocol. The underlying data used for determining if publications qualify for open data LOM/IOM 2024 are confirmed publications by lead and last authors who have LOM eligibility (Charité) or IOM eligibility (BIH) (cut-off date June 30th, 2023). In addition, the full version of publications must be available via institutional access. The criteria for qualifying as open data as listed below are valid for the LOM/IOM allocation in 2024 (publication period 2020–2022). The criteria are described in detail with rationales and examples in Bobrov et al. (2024). However, please use below criteria for reference, as these are the most recent criteria.
The criteria for the open data incentive as of 2024 are as follows:
Research data have been made either freely available or available with restrictions by researchers of the Charité/BIH, and meet the following requirements:
1. For both open and restricted-access datasets:
- The shared data are the basis for results presented in an article publication; thus, stand-alone datasets which are not the basis for results published in peer-reviewed journals are not considered.
- The publication contains an explicit reference to the dataset(s); a reference to supplementary materials without stating that data is available is thus not sufficient; we recommend to refer to the specific data available, rather than using generic statements as e.g. "all data are available in the article and its supplements"; it is also not sufficient to reference a database without naming the specific data set, accession code, or exact search settings.
- The data can be raw, primary, or secondary data (e.g. from analyses of freely available datasets, meta-analyses, or health technology assessments); the data would thus allow the analytical replication (retracing of analysis steps) for at least a part of the study’s results; reporting of statistical values (means, standard deviations, p-values etc.) is not sufficient.
- Data can be found independently of the publication; thus, supplementary materials are only permissible if they are stored in a repository (archive) and can also be found via this repository.
2. Additionally for open datasets only:
- Data are stored in an external repository (or archive, database, registry).
- Data have been shared in a machine-readable format; for tables e.g. CSV, Excel or Word files, but not PDFs or image formats.
- The data are indeed available and can be accessed at the time of checking (for data under embargo, this must expire no later than July 31st), without overriding browser safety warnings.
- The data has been shared persistently, such that there is either a persistent identifier or accession code; thus, datasets available under URLs only as e.g. in Github are not eligible; if the repository allows the creation of persistent identifiers (e.g., DOIs), as is the case e.g. for Open Science Framework (OSF), creating a persistent identifier is recommended and will become a requirement for the IOM from 2025.
3. Additionally for restricted-access datasets only:
- A standardized access route is named, i.e., the access requirements, the procedure for a request and the responsible persons or offices are described
- Access is possible for all academic researchers – at least from the European Economic Area
- Co-authorship of articles is not a condition for the provision of the data
- The access to the data is free of charge or maximally requiring compensation of expenses
- The deposit took place by academic researchers, not pharma sponsors or other commercial entities (e.g. consumer genetics companies); deposit by pharma companies is assumed if the study is pharma-sponsored and the repository is predominantly used by pharma companies (e.g. Vivli)
The open data definition applied does not include:
- Analysis scripts, computer programs, workflows, models, and other methods, materials, and protocols, even if their development was the goal of the research project and/or their presentation was the focus of the publication; if data has been collected and shared for development or validation, these can, however, fall under the open data definition, and this includes training and simulated data.
- Data contained within the article text itself, as long as these are not embedded tables, which can be accessed as digital objects for themselves as well.
- Image, audiovisual, and other data which primarily serve illustrative purposes.
- Data supporting case reports, unless these were shared in repositories (archives) of the respective discipline.
- For systematic reviews and meta-analyses: lists of sources or other general information on the studies, such as survey method, number of participants, or risk of bias assessments (eligible for LOM/IOM, however, are datasets newly compiled from the original literature that ensure traceability of the analysis, such as extracted text passages or meta-analyzed statistical values).
- Data which is only accessible upon request, without a defined and standardized access route (also see above)
- Data from data collections of consortia ("data pools"), if it is unclear whether the authors themselves have contributed to the pool.
- Data for which only a "private link" is shared, so that it cannot be found in the repository, but is only accessible via the publication.
- Data which constitutes information but not a "dataset", as e.g. entries in dictionaries or individual gene variants
Due to potential misunderstandings, we also ask not to confound open data with open access (i.e., the free availability of article publications).
The application of aforementioned criteria always yields some borderline cases. If you are of the opinion that your or your department's publication has mistakenly not been classified as an open data publication, please send a short explanatory note to quest@bih-charite.de, and we will check this and contact you. In addition, the semi-automated search for open data only takes place within the English-language body of literature. If you should have shared data supporting articles in other languages, please inform us about it.
The criteria require continuous adjustment, and will be developed further in coming years. Further criteria regarding the findability or re-usability of data could be applied, thus aligning the criteria closer with the FAIR principles (Findable, Accessible, Interoperable, Re-usable). The current criteria are inclusive with regard to data from investigator-initiated clinical trials, clinical cohorts, observational studies, and other types of research including personal data which are not easily shared and for which appropriate platforms are only just emerging. In such cases, the criterion of persistency is loosened, and it is considered sufficient that a website exists which describes the data and the access to them.