Variation in our genomes influences how susceptible we are to disease and how we respond to medical treatment. Hence, modern medicine requires comprehensive reference sets of genome variation to interpret the specific genetics of a patient. The research of the JRG Genome Informatics aims at improving such reference sets using genome sequencing data. In particular, the JRG develops methods to better detect structural variation, large sequence differences affecting at least 50 bases of the genome. Structural variation ranges from insertions, deletions and duplications (copy number variants) to inversions, translocations and even more complex variation. It can affect gene organization, gene expression and genome architecture.
Detecting structural variation poses challenges as variants of this type span long DNA segments; yet, established sequencing technologies produce only short sequence reads, typically 150 bases in length. Nevertheless, short read data contains information on structural variation that can be unveiled with computational strategies. One approach pursued by the JRG Genome Informatics is to analyze data from many genomes jointly, comparing data of a single genome not only to a reference human genome but also among data of many other human genomes.
In addition, the JRG Genome Informatics explores data from the most recent sequencing technologies, for example linked read and long read technologies. Linked read sequencing incorporates long-range information in short read data by adding so-called barcodes. In contrast, long read technologies produce sequences of many thousand bases in length albeit at a very high error rate. Both, barcodes and high sequence error rates, add new layers of complexity to data analysis. The JRG Genome Informatics addresses these layers with novel computational approaches.
Through collaboration with partners from biology and medicine, the JRG Genome Informatics contributes to research on a variety of diseases. Birte Kehr has previously described variants involved in different cancers as well as kidney and heart disease. Ongoing collaborations at BIH and Charité further investigate genome variation in cancer and rare disease. With this combination of biomedical and computational research, the JRG Genome Informatics strives to deepen our understanding of genetic variation and its benefits for precision medicine.
- Joint calling of structural variation
- Structural variation calling from long read data
- Analysis methods for linked read data
Birte Kehr, Group Leader, email@example.com
Brian Caffrey, Postdoc, firstname.lastname@example.org
Sebastian Roskosch, PhD Student; email@example.com
Thomas Krannich, PhD student, firstname.lastname@example.org
Prior to joining the BIH, Birte Kehr worked as a Research Scientist at deCODE genetics in Iceland. There she gained experience in working with large-scale genomic data and developed a particular interest in structural variation discovery for understanding human disease. She received her PhD from the Freie Universität Berlin within the International Max Planck Research School for Computational Biology and Scientific Computing in 2014. Her thesis addressed algorithms and data structures for multiple whole-genome alignment.
Jónsson H, Sulem P, Kehr B, Kristmundsdottir S, Zink F, Hjartarson E, Hardarson MT, Hjorleifsson KE, Eggertsson HP, Gudjonsson SA, Ward LD, Arnadottir GA, Helgason EA, Helgason H, Gylfason A, Jonasdottir Ad, Jonasdottir As, Rafnar Th, Frigge M, Stacey SN, Magnusson OTh, Thorsteinsdottir U, Masson G, Kong A, Halldorsson BV, Helgason A, Gudbjartsson DF, Stefansson K (2017). Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature, 549:519–522.
Kehr B, Helgadottir A, Melsted P, Jonsson H, Helgason H, Jonasdottir Ad, Jonasdottir As, Sigurdsson A, Gylfason A, Halldorsson GH, Kristmundsdottir S, Thorgeirsson G, Olafsson I, Holm H, Thorsteinsdottir U, Sulem P, Helgason A, Gudbjartsson DF, Halldorsson BV, Stefansson K (2017). Diversity in non-repetitive human sequences not found in the reference genome. Nat Genet, 49(4):588-593.
Eggertsson HP, Jonsson H, Kristmundsdottir S, Hjartason E, Kehr B, Masson G, Zink F, Hjorleifsson KE, Jonasdottir As, Jonasdottir Ad, Jonsdottir I, Gudbjartsson DF, Melsted P, Stefansson K, Halldorsson BV (2017). Graphtyper enables population-scale genotyping using pangenome graphs. Nat Genet, 49(11):1654-1660.
Kehr B, Melsted P, Halldorsson BV (2016). PopIns: population-scale detection of novel sequence insertions. Bioinformatics, 32(7):961-967.