Contact us     |

Machine Learning and Omics data: Opportunities for advancing biomedical data analysis in Galaxy

Date:

Monday, 31st August

Time:

13:30 to 16:30 (CEST)

Instructors and helpers 

  • Anup Kumar | de.NBI, ELIXIR-DE, Germany
  • Alireza Khanteymoori | de.NBI, ELIXIR-DE, Germany
  • Björn Grüning | de.NBI, ELIXIR-DE, Germany
  • Fotis Psomopoulos | INAB | CERTH, ELIXIR-GR, Greece

Summary

Machine learning (ML) has emerged as a discipline that enables computers to assist humans in making sense of large and complex data sets. With the drop-in cost of sequencing technologies, large amounts of omics data are being generated and made accessible to researchers. Analyzing these complex large data is not trivial and the use of classical tools cannot explore their full potential. Machine learning algorithms can thus be very useful in mining large omics datasets to uncover new insights that can advance the field of medicine and improve health care. There is an increasing interest in the potential of ML to create predictive models and to identify complex patterns from omics datasets. The aim of this tutorial is to introduce participants to the Machine learning and taxonomy of machine learning algorithms and common machine learning algorithms, through the Galaxy-ML environment. Galaxy-ML extends Galaxy, a user-friendly, web-based computational workbench used by tens of thousands of scientists across the world, with a machine learning tool suite that supports end-to-end analysis. Galaxy-ML uses the Galaxy framework to make machine learning tools and pipelines widely accessible. As Galaxy records all parameters and tools used, all analyses including those for machine learning are completely reproducible.

The tutorial will cover the methods being used to analyze different omics data sets by providing a practical context through the use of basic but widely used modules in Galaxy. It will comprise a number of hands on exercises and challenges, where the participants will acquire a first understanding of the theory behind ML algorithms as well as the practical skills in applying them on familiar problems and publicly available real-world data sets.

 

Learning Objectives for Tutorial:

By the end of the workshop attendees will be able to:

  • Introducing the basic concepts in ML and understanding the taxonomy of ML algorithms
  • Understanding differences between supervised and unsupervised ML algorithms categories and which kind of problem they can be applied to
  • Introducing the typically used machine learning algorithms for analyzing “omics” data
  • Understanding different applications of ML in different -omics studies
  • Learning how to tune the parameters of ML algorithms Learning how to use Galaxy’s machine learning tools

Target audience

This tutorial is aimed towards scientists active in Life Sciences (graduate students and researchers) familiar with different omics data analysis, that are interested in applying machine learning to analyze them using Galaxy.

Maximum participants

This tutorial is open to at most 50 attendees.

Requirements

Previous experience in bioinformatics analysis will be useful.

Material

The material addressed in the tutorial will be adapted from the Galaxy Training Network lesson on Statistical Analyses for omics data and machine learning using Galaxy tools, as well as from the ELIXIR, GOBLET and CODATA-RDA Advanced Bioinformatics Workshop (corresponding material available here). Participants will be using their personal laptops; however, no additional software needs to be installed beyond a web-browser (an account at https://usegalaxy.eu/ will be useful).

Schedule

Time (CEST)
Details
13:30 - 13:45 Tutorial introduction, get to know each other, Setup
13:45 - 14:15 Basics of machine learning using Galaxy
Machine Learning basic concepts / Taxonomy of ML algorithms / ML approaches in Bioinformatics
14:15 - 14:50 Unsupervised Learning: discovering hidden structures in unlabeled data
How could unsupervised learning be used to analyse omics data?
14:50 - 15:10 Break
15:10 - 15:45 Supervised Learning: classification and regression
How can classification and regression be used to analyze omics data? What are the challenges?
15:10 - 15:45 An ML Application: Age Prediction
How to use machine learning to create predictive models from biological datasets (RNA-seq and DNA methylation)?
15:10 - 15:45 Closing, discussion and resource sharing