Keeping up with epigenomic analysis: theory and practice
Tuesday, 1st September
13:30 to 16:30 (CEST)
Instructors and helpers
- Marcel Schulz | Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University Frankfurt, Germany
- Ivan G. Costa | Institute for Computational Genomics, RWTH Aachen, Germany
- Sivarajan Karunanithi | Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University Frankfurt, Germany
- Nina Baumgarten | Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University Frankfurt, Germany
- Dennis Hecker | Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University Frankfurt, Germany
- Zhijian Li | Institute for Computational Genomics, RWTH Aachen, Germany
Decades of ongoing research has improved our understanding of gene regulation. An open challenge in epigenomics is to unravel the role of non-coding regions in transcriptional regulation of possibly far-away target genes. Genome-wide association studies show that a large part of genomic variation is found in those non-coding regulatory elements, but their possible mechanisms of gene regulation are often unknown.
Due to the constant developments of whole-genome assays that measure different parts of the epigenome, of many or even single cells, computational method development is a moving target. Thus, it can remain difficult to perform epigenome analysis and integrate that with other types of information, such as enhancer-gene interactions and gene expression data. In this tutorial, we will review modern technologies in epigenomics and discuss state-of-the-art methods for the analysis of the resulting data. We will concentrate on the analysis of ATAC-seq to define regulatory elements (REMs). We will discuss standard and advanced computational tasks including quality analysis, peak calling, footprinting, motif analysis. The attendees will perform analysis of real datasets using workflows that were set up for the tutorial.
In addition to the challenges in REM annotation, linking a REM to the gene it regulates is an even more difficult task. Possible approaches are linking a gene to its nearest REM, assigning all REMs to a gene that are located in a defined window around the gene or determining interactions based on associations between epigenomics and expression data. Each method has its advantages and drawbacks. For instance, nearest gene approaches are not sufficient to incorporate the REMs that were shown to target far-away genes. On top of these challenges, every method performs differently depending on the data at hand and the characteristics of the region of interest.
As epigenomics is a rapidly evolving field with a lot of emerging new techniques, it is hard to keep track of all data sources and analysis tools available and to still be aware of potential flaws that come along with them. Without a proper overview, researchers might choose a tool that does not fit their data or might not realize how a certain tool can affect the interpretability of their results. Moreover, with all the different annotations of regulatory elements out there, it is important to raise awareness of the underlying methods used. Especially when trying to make use of a novel database that catalogues experimentally measured or computationally inferred REM-gene associations, there are tools that select the nearest gene as REM target, although it has been experimentally confirmed that a REM can be located several kilobases away from its associated gene. We want to provide a guide to navigate through the complexity of epigenomics data, help to obtain a deeper understanding of available tools and to get practice in performing appropriate analyses.
This intermediary level tutorial is designed for bioinformaticians who are interested in studying regulatory regions of the genome, who want to gain insight into the current status of the field and to practice possible workflows. First, we want to provide an overview of the current status of epigenomics, the state-of-the-art techniques and the respective data types. Subsequently, we want to show how to analyze epigenomics data with a focus on open chromatin sequencing (ATAC-seq). The attendees will learn how to predict and compare TF binding in regions that were defined from ATAC-seq data.
The next section will discuss the possible approaches to identify target genes of REMs. A concluding hands-on session will give the opportunity to try out different approaches for determining REM-gene interactions and to get to know their characteristics and potential drawbacks.
Due to the limited time scope of the tutorial, all hands-on sessions will cover rather simple analyses. They serve as a basis for more advanced downstream analyses for which we will provide step-by-step guides on a GitHub page. We will also host all the content of the tutorial there. Any open questions can be forwarded to us and we will stand ready to help users with any upcoming issue.
This tutorial is open to at most 50 attendees.
Participants should have their own wifi-enabled laptop/computer to be able to practice the presented workflow. All the software used in the tutorial will come as packaged workflows in containerized format (e.g. Docker) to ease installation and use during the hands-on sessions. All the necessary information will be hosted on a GitHub website. It will also contain all slides of the presentations, the use cases and materials for the hands-on session, optional further downstream analyses and can be circulated to registered participants for preparation.
|13:30 - 13:45||Introduction: Scope of the tutorial|
|13:45 - 14:15||Methods and algorithms for epigenome analysis: peak/footprint-calling, and transcription factor motif analysis|
|14:15 - 15:00||Hands-on: Analysis of ATAC-seq data|
|15:00 - 15:30||Approaches for identifying target genes of regulatory elements|
|15:30 - 16:15||Hands-on: Integrative analysis of regulatory regions and gene expression|
|16:15 - 16:30||Wrap-up and discussion|