Prerequisites

Scope of the course

This course aims at providing students with a practical approach of the analysis of biological data with R, based on the concepts acquired in the course “Probabilities and statistics for modelling 1”. The associated mathematical foundations will be developed in the course “Advanced statistics”.

The following notions will be investigated :

  • Sampling and estimation (moments, robust estimators, confidence intervals)
  • Fitting
  • Additional distributions
  • Hypothesis testing (mean comparison, goodness of fit,…)

Study cases

  1. Distribution of k-mers (oligonucleotides) in DNA sequences

  2. Analysis of omics data:

    • Den Boer ML et al. (2009). A subtype of childhood acute lymphoblastic leukaemia with poor treatment outcome: a genome-wide classification study. Lancet Oncol. 2009 10:125-34. [doi: 10.1016/S1470-2045(08)70339-5], [PMID 19138562]. Data available at Gene Expression Omnibus, series [GSE13425]

Access


Course content and supports

Day Contents Type Supports
2020-02-17 Fitting theoretical distributions on k-mer counts Practical [html] [pdf] [Rmd]
Live demo starting script [R]
2020-02-20 Discrete distributions Lecture [html] [pdf] [Rmd]
K-mer count distributions in promoters Solutions of the practical [html] [pdf] [Rmd]
Multivariate analysis
2020-03-05 Study case: the DenBoer 2009 dataset Slides [html] [pdf] [Rmd]
Den Boer (2009): data loading and exploring Practical [html] [pdf] [Rmd]
Den Boer (2009): data loading and exploring Solutions [html] [pdf] [Rmd] [R]
2020-03-19 Supervised classification Lecture [pdf]
Den Boer (2009): supervised classification Practical [html] [pdf] [Rmd]

Skills expected to be acquired at the end of this course

Practical experience of statistical concepts

  • Fitting
  • Convergence between distributions
  • Impact of sample size
  • Estimators: moments versus robust estimators
  • Multiple testing

R programming

  • Data structures

    • vector
    • factor
    • matrix
    • data.frame
    • list
  • Functions related to the main data structures

    • class()
    • is.numeric(), is.integer(), is.na(), is.xxxx()
    • c()
    • vector(), matrix(), data.frame()
    • as.vector(), as.matrix(), as.data.frame()
    • unlist()
    • nrow(), ncol(), dim()
    • append(), sort(), unique()
  • Handling strings

    • sub()
    • split()
    • grep()
  • Getting help

    • help()
    • ?
    • ??
  • Handling files and directories

    • getwd()
    • setwd()
    • dir.create()
    • list.files()
  • Handling distributions of probabilities

    • The four functions for a distribution X: dX, pX, qX, rX
    • Binomial
    • Poisson
    • Hypergeometric
    • Normal
    • Student
    • Chi2
  • Descriptive statistics

    • mean(), median(), mode()
    • sd(), var(), cor()
    • quantile(), IQR()
  • Drawing graphs

    • plot()
    • barplot()
    • boxplot()
    • hist()
  • Installing and loading R packages

    • cran
    • bioconductor
    • github
  • Implementing a function

  • Documenting a function

  • Using the apply and related functions

  • Using R classes and objects