Comparative Genomics Group
Institute of Bioinformatics
Tel.: +49 251/83-53007
During the last decades, extensive research and third-generation sequencing technologies have led to an exponential increase in biological data. Storage and accessibility of sequences and meta-information are key challenges in bioinformatics, today. As a consequence, two different types of data storage systems have been developed: Primary databases store the raw experimental data, whereas secondary databases focus on subsets of primary data often curated by human experts. Thus, the latter are designed to connect information between experiments to answer specific questions for a certain field of research.
In my project, I will focus on building secondary databases to aid researchers in areas, such as upstream open reading frame (uORF) and transposable elements (TEs) research. Both databases will shed light on healthy and cancer cells and will thus benefit the medical and biological research. uORFs are open reading frames in the 5‘-UTR of genes. They often suppress the translation of the subsequent coding sequence. Mutations can disrupt the healthy
equilibrium of translational regulation and lead to serious diseases, such as cancer. My collaborators have already designed a comprehensive literature database on uORFs (http://www.compgen.uni-muenster.de/tools/uorfdb). I will use that database and connect it to experimental data on critical uORF features, such as Kozak sequence context, location in the transcript, and mutations in different cancers. This will help to understand the mechanism of translational regulation by uORFs. Although transposable elements are among the well-studied molecular phenomena, there are only few secondary databases for mobile element insertion analysis in the human population. These, however, tend to use outdated annotations of the human genome or cannot be readily accessed. To fulfill the need for an up-to-date freely accessible TE resource, I will create a specialized database for mobile element insertions in the human population using human genome GRCh38 assembly as a reference. By storing data of the healthy population and of cancer patients, I extend the scope of common cancer databases to mobile element insertions.
Manske F, Ogoniak L, Jürgens L, Grundmann N, Makałowski W, Wethmar K (2022) The new uORFdb: integrating literature, sequence, and variation data in a central hub for uORF research. Nucleic Acids Research:gkac899. 10.1093/nar/gkac899 [doi]