Health Data

The Health Data Hub focusses on processes in medical informatics and bioinformatics with the aim of automatisation and bringing new research results faster into healthcare. A fundamental challenge of a digital healthcare environment is the growing amount of sensitive patient data, e.g. created by smart wearables or whole genome analyses. Handling and processing of such huge data sets in daily routine diagnostics requires new infrastructures and IT solutions. Therefore, large-scale IT and cloud solutions are also a responsibility of the Health Data Hub.
Research Topics
One Touch Pipeline
One Touch Pipeline (OTP) is an automation platform for processing next generation sequencing (NGS) data. The application provides support in all steps of this process, including data registration, storage handling, quality monitoring, alignment of reads to reference genomes and all crucial types of variant calling. OTP has been developed since 2013 and is used for all genomic projects of HiDiH in Berlin and Heidelberg. The data collection in Heidelberg comprises about 50.000 samples and a data storage volume of more than 10 petabytes. The OTP roll out for Berlin is in the ramp up phase.
From software architecture perspective, OTP is both information center and workflow system. Meta-data files describing sequences are uploaded into the database, involving curating steps and manual interventions in case of ambiguities. Based on complete and consistent information, data management and bioinformatics workflows are executed with all steps book-kept in a database. CPU and I/O intensive jobs are sent to our high performance computing clusters.
The application provides three major benefits to stakeholders: first, the automation process reduces the man-power required for data management. Second, all operations are executed more reliably and faster reducing the time until the sequences can be analysed by bio-informatics groups. Third, all information is located in one system with secure web access and search capabilities.
The application was implemented using Grails framework in Groovy programming language with web layer in HTML5. The Grails framework provides support for dependency injection from Spring framework and object relational mapping using Hibernate. The authorisation is programmed using Spring Security annotations with authentication based on LDAP system. The application is deployed on Apache Tomcat web container and PostgreSQL is used as a database engine. The application is managing data and processes in two separate computing realms with petabyte scale file system and computing cluster operated by job schedulers as SLURM and PBS in each realm.
External German projects are supported in the context of the Heidelberg Center for Human Bioinformatics as part of the German Network for Bioinformatics Infrastructure (de.NBI) supports OTP for de.NBI users. Among others, these are the Heidelberger Center for Personalized Oncology (DKFZ-HIPO), the German Consortium for Translational Cancer Research (DKTK) and the Berlin program Precisioned Digital Oncology (PeDiOn).
Medical Informatics Initiative - HiGHmed
The Health Data Hub supports the use cases oncology and cardiology within the HiGHmed consortium. The use case oncology deals with the challenge of integrating enormous amounts of data from genome sequencing and radiology into clinical practice. In virtual molecular tumor boards, the potential treatment of cancer patients is discussed and expertise from different institutions like medical doctors, researchers and patients is combined. Similar cancer cases should be better recognized, and individual patient-oriented treatment should be made possible. For this purpose, the Health Data Hub builds a data integration center for OMICS data (omicsDIC). It is automating the processes for processing and analysing sequencing data using the software One Touch Pipeline (OTP).Publications
*these authors contributed equally
§corresponding author