Python for downstream data analysis

programming
basic bioinformatics
ELIXIR
live training

Python for downstream data analysis

Target Audience:
All scientists
Location:

Online course

General context

This course is organised over two full days. We will start with a brief recap of  the basics of Python, followed by exploring libraries for data manipulation and visualization (pandas and seaborn resp.). With the help of plenty hands-on exercises, you will learn to fetch biological data and sequencing files from online databases, and be able to parse and analyze this data. There will be time to highlight specific requests of topics from participants. 

The course is organized over two separate days. It is possible to follow only one day. 

Objectives
  • Use libraries for advanced data manipulation and visualization (day 1)
  • Working with biological data using Biopython (day 2)
  • Being able to write scripts and functions from scratch for specific bioinformatics problems
Required skills

Participants are expected to have attended an introductory Python course and/or have acquired some working knowledge of Python.

Program

Introduction & Python recap quiz
-
Data manipulation with pandas
-
  • Exploring the pandas library
  • Processing tabular data (csv-files)
  • Preparing RNA-seq differentially expressed genes dataset for a heatmap 
Visualization with seaborn
-
  • Exploring seaborn library
  • Basic plotting (lineplots, barplots, etc.) & multiplot grids
  • Create a scatter plot & heatmap
BioPython
-
  • Introduction & fetching data
  • Parsing and analyzing biological data & file formats
  • BLASTing via Biopython

After a brief introduction in a specific Biopython module, the participants will have gathered enough information to tackle the exercises. Together we will discover different strategies to solve these bioinformatics related problems.   

Selected topics from participants
-
  • Answering remaining questions from participants