Whitehead zebrafish promoter chipsets
Design principles
The Whitehead zebrafish promoter chipsets were designed for large-scale chromatin immunoprecipitation using microarrays (ChIP chips). The chipsets represent the promoters of about 11,000 transcriptional units derived using transcript sequences from Refseq, Ensembl, Vega, ZGC, and the Zon Lab. The 9-chipset represents extended promoters (-9 kb to +3 kb) using 60-mer oligonucleotides on nine 44,000 microarrays, whereas the 2-chipset represents proximal promoters (-1500 nt to +500 nt) on two microarrays of the same density. Probes are spaced an average of 250 nt apart.
Design details
Transcripts from RefSeq, Ensembl, Vega, ZGC, and the Zon Lab were mapped to the Zv4 zebrafish genome as packaged by UCSC Bioinformatics into their June 2004 assembly. Transcripts with TSSs within 500 were clustered into a "transcriptional unit" (TU) and used to define a promoter relative to the most upstream TSS in the unit. These TUs were divided into two groups based on confidence in the TSS. TUs were described as "high confidence" if they contained transcripts from Vega, ZGC, the Zon Lab, or (RefSeq AND Ensembl).
For each high confidence TU (and several low confidence TUs requested by researchers in the zebrafish community), an extended promoter was extracted from the the Zv4 genome, previously processed by RepeatMasker to mask repetitive DNA. The set of 13,400 promoters was partially redundant; if a TU mapped dependably to more than one location in the genome, each promoter was extracted. If the promoter contained a gap (generally of unknown length), the upstream sequence was also removed.
For each promoter, we scored each potential 60-mer using a locally customized version of ArrayOligoSelecter, which assays oligo properties such as uniqueness, complexity, GC content, and lack of secondary structure. All potential oligos were then mined (using software developed at Whitehead) to select the most representative set, optimizing oligo quality and spacing, aiming for oligos about every 250 nt. We also removed any isolated oligos, since previous array designs revealed the importance of binding binding evidence from at least two adjacent oligos. The set of oligos representing the extended promoters were then mined for those overlapping the corresponding set of proximal promoters. Some promoters could not be effectively represented, primarily due to extended regions of repetitive DNA and gaps in the Zv4 assembly. The final oligo sets represent about 11,000 promoters for each chipset. Oligos were designed to alternate between positive and negative strands to eliminate any bias.
Controls of several types were added, including positive controls (labelled with the gene to which they were designed), intensity controls (chosen to give similar intensities across many cells and conditions), and negative controls described as representing "gene deserts". Gene deserts were defined as those regions of the genome most distant from any annotated genes. A manufacturer may add other sets of their own controls (which are not shown in the oligo files for download).
To confirm the location of every oligo in the genome, we used BLAT to identify all perfect matches. These mappings allowed us to link, in an unbiased manner, each oligo to a promoter as defined by the design of the corresponding chipset. Some oligos unexpectedly matched many sites in the genome and may be uninformative. We also expect some misannotation of these sets of oligos due to errors in the genome assembly and errors in transcript sequences, especially those that are missing their 5' end. Further analysis of specific oligos can be performed in the context of a genome browser.
Please contact George Bell with any questions.
Downloads
Proximal promoter set of 2 chips- Promoters/transcripts represented
- Oligo sequences and annotations: [zipped text] [description of data fields]
- Promoter sequences
- Promoters/transcripts represented
- Oligo sequences and annotations: [zipped text] [description of data fields]
Obtaining the chipsets
These microarrays were designed as a resource for the zebrafish community and the oligo sequences are freely available to the community. These arrays are currently manufactured by Agilent as Agilent Microarray Design Identification (AMADID) numbers 013834 & 013835 (the 2 chipset) and 013824 through 013832 (the 9 chipset).
Credits
The Whitehead zebrafish promoter chipsets were originally designed by George Bell and Bingbing Yuan of Whitehead Bioinformatics and Research Computing for the laboratory of Hazel Sive, in collaboration with the laboratories of Richard Young (Whitehead) and Jim Smith (Gurdon Institute). Whether you purchase them from Agilent or manufacture them yourself, the Bioinformatics and Research Computing group at the Whitehead Institute for Biomedical Research would appreciate an acknowledgment for the development of this resource when cited in a publication.
| Bioinformatics and Research Computing | Whitehead Institute for Biomedical Research | |
| Last Updated: October 21 2009 11:17:32 am | ||
