Biomedical Data and Text Processing using Shell Scripting
Friday, 4th September
13:30 to 16:30 (CEST)
- Francisco M. Couto | LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
Besides being almost immutable for more than four decades and being available in most of our computers, shell scripting (command-line) is still one of the most important tools to solve many of the data and text processing challenges that Computational Biologists and Bioinformaticians face in their work.
This tutorial will be a hands-on session to learn how simple command line tools can be used and combined to retrieve, extract and filter data and text from web resources using open standard data file formats, such as TSV, CSV, and XML that can be open by any text editor or spreadsheet application.
Given that it is a 3-hour virtual tutorial, it will only cover some introductory topics, namely Sections 3.5 to 3.8 of the open access book entitled Data and Text Processing for Health and Life Sciences. Participants need to read and test the examples of previous sections beforehand. More teaching material is freely available at: http://labs.rd.ciencias.ulisboa.pt/book/.
This tutorial is particularly relevant to Health and Life specialists or students that want an example-based introduction to shell scripting, so they can easily automate some common data and text processing tasks without the need to acquire advanced computer science skills.
This tutorial is open to at most 20 attendees.
No programming skills are required, but participants need:
- A computer with access to internet, text editor and a terminal application.
- Check all necessary command tools:
- Execute and understand all examples until Section 3.4 of the book
|13:30-14:00||Data Retrieval (cURL)
Download proteins associated with a compound from ChEBI
|14:00-14:30||Data Extraction (grep and gawk)
Select the relevant proteins and their identifiers
|14:30-15:00||Task Repetition (xargs)
Download information of multiple proteins fromUniProt
|15:00-16:00||XML Processing (xmllint and xpath)
Identify the UniProt entries that represent a Homo sapiens (Human) protein
|16:00-16:30||Text Retrieval (cURL, grep, gawk, xargs, xmllint, xpath)
Download the text (titles and abstracts) of the publications associated with a list of proteins