CHiP-Seq dataset description

For this tutorial we will use CHiP-Seq datasets produced by Theodorou et al. The authors used ChIP-Seq technology in order to systematically identify ESR1 binding regions across the human genome. Importantly, they demonstrated that knock-down of GATA3 through siRNA greatly affect ESR1 binding. The corresponding abstract of this article is provided below.


Estrogen receptor (ESR1) drives growth in the majority of human breast cancers by binding to regulatory elements and inducing transcription events that promote tumor growth. Differences in enhancer occupancy by ESR1 contribute to the diverse expression profiles and clinical outcome observed in breast cancer patients. GATA3 is an ESR1-cooperating transcription factor mutated in breast tumors; however, its genomic properties are not fully defined.

In order to investigate the composition of enhancers involved in estrogen-induced transcription and the potential role of GATA3, we performed extensive ChIP-sequencing in unstimulated breast cancer cells and following estrogen treatment. We find that GATA3 is pivotal in mediating enhancer accessibility at regulatory regions involved in ESR1-mediated transcription. GATA3 silencing resulted in a global redistribution of cofactors and active histone marks prior to estrogen stimulation. These global genomic changes altered the ESR1-binding profile that subsequently occurred following estrogen, with events exhibiting both loss and gain in binding affinity, implying a GATA3-mediated redistribution of ESR1 binding. The GATA3-mediated redistributed ESR1 profile correlated with changes in gene expression, suggestive of its functionality. Chromatin loops at the TFF locus involving ESR1-bound enhancers occurred independently of ESR1 when GATA3 was silenced, indicating that GATA3, when present on the chromatin, may serve as a licensing factor for estrogen-ESR1-mediated interactions between cis-regulatory elements. Together, these experiments suggest that GATA3 directly impacts ESR1 enhancer accessibility, and may potentially explain the contribution of mutant-GATA3 in the heterogeneity of ESR1+ breast cancer.

Getting information about the experiment using the GEO and SRA websites

Gene Expression Omnibus (GEO) is a public repository that provides tools to submit, access and mine functional genomics data. Data may be related to array- or sequence-based technologies. For HTS data, GEO provides both processed data (such as *.bam, *.bed, *.wig files) and links to raw data. Raw data are available from the Sequence Read Archive (SRA) database (including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics). Both web sites propose search engines to query their databases.

  • Go to GEO web site.
  • Choose “Search” and paste GSE40129 (GSE stands for GEO Series Experiment). Click “GO” to get information about this experiment.
  • In the “sample section” (middle of the page), click on More to visualize all sample names.
  • Click on GSM986059 hyperlink (GSM stands for GEO SaMple) to get information about this sample.
  • In the “relations” section, select SRX176856 hyperlink to open the SRA page corresponding to this sample.
  • Click on the SRR link (bottom right) to access the record of the run.
  • On the new page, click on the Reads tab to view the read sequence (you can display the quality clicking on Customize).
  • From there, you might also download the dataset as a .sra file, but we will not do it in the context of this practical (beware, this would take time and occupy disk space, since SRA files typically weight several hundred Mb !).

NB: SRA file are not always very convenient as they required to be “dumped” into fastq file format. One can also download sequencing data from ENA (European Nucleotide Archive) to get them directly in fastq format.

  • What is the HTS platform used to sequence this sample?
  • Is this experiment single-end or paired-end sequencing?
  • How many runs (i.e. lanes) are associated to this sample?
  • How many reads were produced (# of Spots)?
  • Select SRR540192 hyperlink. What is the sequence of the first read?

Connecting to the Galaxy server

  • Open a connection to pedagogix Galaxy server.
  • Enter your login (command Login in the menu User at the top of the Galaxy window). If this is your first connection, use the Register command.

Quality control of sequencing data