Hotspot and the SPOT data quality metric
Hotspot is a program for identifying regions of local enrichment of
short-read sequence tags mapped to the genome using a binomial
distribution model. Regions flagged by the algorithm are called
"hotspots." The algorithm utilizes a local background model that
automatically normalizes for large regions of elevated tag levels due
to, for example, copy number effects. Hotpsot is otherwise able to
detect regions of enrichment of highly-variable size, making it
applicable to both broad and highly-punctate signals. We have
applied it extensively to DNase-seq and ChIP-seq data, including
transcription factor (CTCF) and histone modification (H3K4me3,
H3K36me3, H3K27me3) data.
Hotspot was originally conceived and implemented by Mike
Hawrylycz. Additional contributors and developers include Bob Thurman,
Eric Haugen, and Scott Kuehn.
This distribution also includes scripts for computing SPOT (Signal
Portion of Tags), a quality measure for short-read sequence
experiments. SPOT is simply the percentage of all tags that fall in
Documentation for hotspot (Word and Powerpoint documents are slightly out-of-date)
- Sam John, Peter J Sabo, Robert E Thurman, Myong-Hee Sung, Simon C Biddie, Thomas A Johnson, Gordon L Hager, John A Stamatoyannopoulos, Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nature Genetics, 43, 264-268 (2011).
- Brief methods-type
- Powerpoint presentation
Hotspot is currently hosted on Github
See the hotspot Github repository for the current
version of hotspot.
5 Jul 2013
Hotspot-SPOT distribution (v4)
- This version has been superceded by the current version hosted on
Github. See above.
- Gzipped tar-ball (254Mb). After
unpacking, start with the top-level README file.
- The changes between v3 and v4 are detailed in the CHANGES file in
the top level of the distribution. Some highlights are as follows.
- For ChIP-seq data, an input tags file is now accommodated. This
will trigger subtracting input tags from the ChIP tags in hotspots
in the final scoring of hotspots.
- A final cleanup script is now provided which creates a simplified
output directory with more intuitive file-naming conventions, and
includes a "clean" option that removes all intermediate directories
- Auxiliary scripts are now provided for generating mappability
files, so that you can do this yourself, provided you have access to
the bowtie aligner (Langmead B, Trapnell C, Pop M, Salzberg
SL. Ultrafast and memory-efficient alignment of short DNA sequences
to the human genome. Genome Biology 10:R25,
http://bowtie-bio.sourceforge.net). This should remove the need for
updating the "Mappability files" section, below (v3), but we will
continue to entertain requests to generate new files as well.
25 Jan 2013
Hotspot-SPOT distribution (v3)
- This version is now superceded. Although I will continue to try to
answer qustions is the current, maintained version, and includes code for
computing hotspots (including SPOT scores) and peaks, and performing
FDR thresholding. (Version 2, below, does not include peak-finding
nor FDR thresholding capabilities.) The code has only been tested
- Gzipped tar-ball (304Mb). After
unpacking, start with the top-level README file.
Below find files containing coordinates of uniquely-mappable regions of
the genome for various read-lengths and genomes. These files would be
used for the _MAPPABLE_FILE_ variable defined in runall.tokens.txt.
NOTE: the .starch files are bed files compressed using the starch
tool, which is part of
the BEDOPS suite. The
file used in _MAPPABLE_FILE_ must be uncompressed (you can use
unstarch from BEDOPS for this purpose). If you have need for a
particular combination not available below, feel free to contact
- hg38 (_CHROM_FILE_)
- hg19 (_CHROM_FILE_)
- 20bp reads (bed file, 787 Mb, starch file, 48 Mb)
- 22bp reads (bed file, 484 Mb, starch file, 32 Mb)
- 26bp reads (bed file, 334 Mb, starch file, 24 Mb)
- 27bp reads (bed file, 326 Mb, starch file, 23 Mb)
- 32bp reads (bed file, 236 Mb, starch file, 17 Mb)
- 36bp reads (bed file, 182 Mb, starch file, 14 Mb)
- 40bp reads (bed file, 182 Mb, starch file, 14 Mb)
- 42bp reads (bed file, 119 Mb, starch file, 10 Mb)
- 50bp reads (bed file, 72 Mb, starch file, 6.0 Mb)
- 58bp reads (bed file, 42 Mb, starch file, 3.9 Mb)
- 72bp reads (bed file, 20 Mb, starch file, 2.0 Mb)
- 76bp reads (bed file, 17 Mb, starch file, 1.7 Mb)
- 100bp reads (bed file, 9 Mb, starch file, 900 kb)
- mm9 (_CHROM_FILE_)
- 36bp reads (bed file, 117 Mb, starch file, 9.7 Mb)
- 40bp reads (bed file, 95 Mb, starch file, 8.1 Mb)
- 48bp reads (bed file, 66 Mb, starch file, 5.9 Mb)
- 51bp reads (bed file, 58 Mb, starch file, 5.3 Mb)
- 100bp reads (bed file, 15 Mb, starch file, 1.5 Mb)
- mm10 (_CHROM_FILE_)
- dm3 (_CHROM_FILE_)
- sacCer2 (_CHROM_FILE_)
- sacCer3 (_CHROM_FILE_)
- ce4 (_CHROM_FILE_)
- ce10 (_CHROM_FILE_)
- rn5 (_CHROM_FILE_)
11 June 2010
Hotspot-SPOT distribution (v2)
- Gzipped tar-ball (176Mb). After
unpacking, start with the top-level README file. The
documentation linked above is in the doc directory.
- UW utils (gzipped tar-ball, 102 kb): a
selection of bed file-oriented utilities for computing various set
operations, sorting, etc., required by many of the scripts
included in the SPOT distribution.
- Mappability files for hg19 (.tgz, 47Mb).
Above distribution is equipped for hg18 coordinates. Use the files
here to work with hg19 coordinates, substituting for the token
variables _CHROM_FILE_, _MAPPABLE_10KB_FILE_, and _MAPPABLE_FILE_.