Introduction

In this tutorial, we will attempt to predict cis-regulatory motifs for a set of genes of interest, based on the discovery of conserved motifs in the promoters of their orthologs (phylogenetic footprints). This aproach was proposed in the end of the 1990’s, when a few genomes only were available. With the ever-increasing speed and ever-decreasing costs of sequencing projects, the number of completely sequenced genomes follows an exponential growth, and the power of comparative genomics methods increases accordingly.

The anlaysis of phylogenetic footprints is particularly fruitful in small genomes (Bacteria, Fungi), where cis-regulatory elements are located in delimited regions circumscribed to a few hundred base pairs upstream of the transcription start site.

Prerequisites

This tutorial assumes that you already learned the theory in the following chapters.

Chapter Slides
Phylogenetic footprints - introduction 03.4.2.comparative_genomics.pdf
Discovering phylogenetic footprints in bacterial genomes 03.4.3.footprint_discovery.pdf

Tutorial

Getting the list of target genes for your transcription factor of interest

  1. Open a connection to the RegulonDB
  2. Enter the name of your transcription factor of interest in the Search box. Below the Regulon title, click on the link Transcription factor id.
  3. Copy the list of regulated genes, and paste them in a text file. Replace the commas by newline character, in order to obtain a file with one row per target gene for your transcription factor.

Getting putative orthologs (BBH) for your transcription factor of interest

In a first time, we will identify the genomes in which your transcription factor of interest has a putative ortholog. We use a somewhat rudimentary (but pragmatic) approach to identify putative ortholog, by selecting the bidirectional best hits (BBH).

  1. Open a connexion to the RSAT prokaryote server.
  2. In the Comparative genomics tools, open the get-orthologs form.
  3. Type the name of your transcription factor in the Query genes box, and select a Taxon. In a first time we recommend to select the class (Enterobacteriales), but this can then be modified to compare the results obtained at different taxonomic levels.

Questions

  1. How many genomes does the server support for the selected taxon?
  2. I how many of these genomes did you find a BBH for your taxon of interest?

Discovering conserved elements in the promoters of orthologs

  1. Open a connexion to the RSAT prokaryote server (alternatively, you can choose the teaching server http://teaching.rsat.eu/ or the Fungi server http://fungi.rsat.eu).
  2. Under the menu Comparative genomics, open the footprint-discovery form.
  3. Type the target genes of your factor of interest in the Query genes box.
  4. Select the option Unique organism per species.
  5. Leave all other parameters unchangedn enter your email address and click *GO**.
  6. After a few seconds, a message appears with the URL of the result page. Click on it to open the result page in a separate window.

Note that the result page is currently resricted to a header, and the result table is empty. The process may take some time (3-4 minutes per gene), but you will already be able to consult the results after a few minutes, and then periodically refresh the result page to check the progress of your query.

While the first analysis is running, we can start the same analysis with one modified parameter:

  1. Come back to the previous form (or enter the same parameters in a separate window), but check the option predict operon leader genes.
  2. Open the link to the result page in a separate window.

The analysis with operon inference is somewhat slower, but the results are generally more relevant. You can now let run the analyses on the server, and periodically check the results of the two processes.

Interpreting the results

The interpretation of the results will be discussed during the course

Further reading

TO BE DONE