Clustering motifs with RSAT

Chapter	Slides
Clustering of Motifs	CisReg_motif_clustering.pdf

Tool	Usage
RSAT matrix-clustering	Separate a collection of input PSSMs in separated clusters and display these clusters in different representations (Tree + Heatmap).
RSAT compare-matrices	Compare a query collection of motifs versus a target collection of motifs, using several dis(similarity) metrics.

Introduction

To study the TF biology, we can use experimental or computational methods. Usually the in silico motif analysis goes from motif discovery –> motif comparison (identification) –> motif scanning. One common issue in this procedure is the common redundancy of motifs after using several motif discovery algorithms (overrepresented words or positional bias) or different tools (RSAT, MEME).

We developped a tool that cluster similar PSSMs and allow to create a non-redundant collection of motifs, simplifying in this ways the motif analysis procedure.

In this tutorial we show how to deal with two cases where the cluster of motifs can be used.

1.- Reducing motif redundancy from a collection of motifs.

2.- Grouping motifs in order to identify motifs sharing similar DBD (DNA-binding Domain).

Exercise 1 - Using matrix-clustering to group motifs sharing a common DBD domain.

We have one collection of unkown motifs. We want to kwnow to which TFs correspond these motifs and if they share a DBD. We will cluster these motifs, reduce the redundancy and compare the non-redundant set in order to discover the DBD domains of the motifs.

Initially we will use the most permisive threshold in order to group all the motifs in a single cluster, which can be visualized as a hierarchical tree.

Increasing the threshold value should separate the motifs into different clusters.

Objectives

Set different threshold values in order to split one collection of motifs in separated clusters.
Identify the DBD of the identified clusters.

Aligning all the unknown motifs

Downlwad the Unknown motifs.
Open a connection to the Regulatory Sequence Analysis Tools teaching server (http://teaching.rsat.eu/).
In the left-side menu, open the tool set Matrix tools, and click on matrix-clustering.
Analysis Title: Write the name of the analysis in the Title box. E.g. ‘Unknown motif analysis’.
Matrices: Paste the matrices in the Matrix box, and check that the input format is set to transfac.
Collection Name: (Optional) Give a collection name to all the motifs of this collection. E.g. ‘Unknown motifs’.
Threshold (low threshold): Click on the section Thresholds to define the cluster and set the next values in the lower threshold column. Ncor : -1; cor : -1 ; w : 0.
Metric to build the tree: Select one metric that will be used to build the hierarchical tree. Select –> Ncor.
Linkage method: Select one linkage method to group the motifs. Select –> Average.
Output file options: Select –> Heatmap. Let empty the other boxes.
Labels displayed in logo tree: Select –> Names. Let empty the other boxes.
Click on GO (This analysis takes around 1 minute).
Repeat the previous steps but with different Threshold (high threshold): Ncor : 0.4; cor : 0.6 ; w : 5.

matrix-clustering report

Before start the analysis take a look to the matrix-clustering report which is separated in several sections.

The matrix-clustering report is divided in several section that you can hide/show.
Results Summary: a small table showing the input parameters and the number of input motifs and clusters found.
Clusters Summary: a table showing the Familal Binding Profiles FBP of each cluster and the name and the number of the motifs grouped for each cluster.
Logo Forest: all the cluster are represented as a forest (set of trees). Each tree is dynamic (Click on the nodes to collapse/expand them).
Individual Motif View: a table showing the description of each input motif.
Individual Cluster View: analyze each cluster individually. Click on the number at each tree branch to select it and display a table with its corresponding FBP.
Heatmap View: A heatmap showing the (dis)similarities among all the motifs. Each cluster is indicates with a color bar (which is the same color used for the trees in the Logo Forest).
Additional Files: A list of complementary files.

Analysis

For both analysis compare in how many clusters the motifs were separated, (see Clusters summary section). The logo in this section represent the root motif (also called Familal Binding Profile FBP) which is the most external node in the hierarchical tree and summarize all the motifs in the cluster.
Open the Logo forest section. You can collapse/expand dynamically the trees. Find the groups of motifs collapsing the tree.
Another way to visualize all the motifs is using the heatmap view. You can see clearly that there are four clusters of motifs in our collection of unknown motifs.
For the both analysis (using low/high thresholds) you can go to the Additional Files menu and download the Root motifs file. (One motif for each cluster)
Open a conexion to footprintDB website, go the search section and click on sequences.
Click on the DNA sites or motifs circle, paste the motifs in the query data box and click on search button. (Do this step for both analysis).

7 . Identify the DBD of you clusters in both analysis.

Excercise 2 - Reduce motif redundancy after motif discovery in ChIP-seq peaks

One collection of motifs discovered in one set of ChIP-seq peaks is grouped and the clusters reveals different binding conformations of the immunoprecipitated TF.

Objectives

Identify different binding conformations of Oct4 in a set ChIP-seq peaks.
This highly redundant set of motifs can be reduced to a handle of non-redundant motifs.

matrix-clustering

Download a set of matrices that were found with peak-motifs in a set of ChIP-seq peaks for the TF Oct4.
Open a connection to the Regulatory Sequence Analysis Tools teaching server (http://teaching.rsat.eu/).
In the left-side menu, open the tool set Matrix tools, and click on matrix-clustering.
Analysis Title: Write the name of the analysis in the Title box. E.g. ‘Oct4 motifs from ChIP-seq peaks’.
Matrices: Paste the matrices in the Matrix box, and check that the input format is set to transfac.
Collection Name: (Optional) Give a collection name to all the motifs of this collection. E.g. ‘Oct4 ChIPseq’.
Threshold: Click on the section Thresholds to define the cluster and set the next values in the lower threshold column. Ncor : 0.6 ; cor : 0.4 ; w : 5.
Metric to build the tree: Select one metric that will be used to build the hierarchical tree. Select –> Ncor.
Linkage method: Select one linkage method to group the motifs. Select –> Average.
Output file options: Select –> Heatmap. Let empty the other boxes.
Labels displayed in logo tree: Select –> Names. Let empty the other boxes.
Click on GO (This analysis takes around 1 minute).

Analysis

The selection of non-redundant motifs will be done during the session.

Clustering motifs with RSAT

Jaime Castro

2015-10-23

Prerequisites

Tools used in this tutorial

Introduction

Excercise 2 - Reduce motif redundancy after motif discovery in ChIP-seq peaks

Objectives

matrix-clustering

Analysis