Scanning non-coding sequences with a TFBM

LCG_BEII 2020

Jacques van Helden

2020-02-04

Introduction

The goal of this practical is to evaluate the respective performances of two modes of representations for transcription factor binding motifs (TFBMs) to predict transcription factor binding sites (TFBS).

Parameter | Value |
| |
Reference genome | Escherichia_coli_GCF_001308065.1_ASM130806v1|

Collective table for the 2020 practical

Students will store their results in a shared spreadsheet, which will be used to compare their results and get a broader landscape from the comparison of the results obtained with different transcription factors.

In your computer, create a folder to store the results of this practical, for example : $HOME/LCG_BEII_practicals/ (you can change the path and name according to your own organisation of folders).

Choosing a TF on RegulonDB

Computing the degenerate consensus from the reference matrix

Getting all upstream (“promoter”) sequences of E.coli

Coverage of the annotated binding sites by the reference motif

Binding site prediction in all promoters

Negative control 1: scan artificial sequences with your motif

Negative control 2: permute the columns of the matrix