Q: What does the database contain?
A: TELiS contains integer numbers indicating the frequency with
which
each specific TFBM is detected in each promoter - a P x T
matrix of P promoters scanned for T TFBMs. TELiS is
actually a family of such matrices, with each matrix containing data from one scan conducted at a specific stringency (MatInspector
mat_sim values of .80, .90, or .95) over a specified promoter size
(300 or 600 bases upstream of the TSS, or a region from -1000 to +200 bases). These scans are
conducted by the Java application PromoterScan
using NCBI RefSeq nucleotide sequences obtained during the fourth quarter
of 2003. TFBM definitions come from anoymous FTP releases of TRANSFAC v3.2 and JASPAR 2.
Q: For genes with
alternative transcription start sites (TSS)?
A: Multiple results for a
single gene are averaged and rounded to the nearest integer to produce a
single record.
Q: How do I search for a gene?
A: Genes are identified by
HGNC Gene Symbols,
with lists of Gene Symbols separated by tabs, spaces, or line-breaks (carriage returns).
Q: Why can't I find a
gene in TELiS?
A:
Usually due to a formatting problem OR failure
to use
HGNC Gene Symbols.
Capitalization is ignored and names should be on separate lines or separated by
spaces or tabs. Do NOT include dividers such as slashes (/), commas
(,), or colons (:). Be sure to select the appropriate
species for your gene names - do not submit mouse gene names and
indicate using a human microarray. It is also possible that the
NCBI RefSeq database
did not contain your gene in Fall 2003, or that its Gene Symbol has
changed status from "-pending".
Q: What is a "TELiS differential expression
analysis?"
A:
Differential expression analysis
seeks to identify the transcription factors driving observed
changes in gene expression. Given a set of
differentially expressed genes (defined by microarray, SAGE, etc.), the
PromoterStats
statistical tool determines which TFBMs are
over-represented in the promoters of those genes. This provides inferences about which
transcription factors are active. If the upstream signaling pathways that control transcription
factor activity are known, differential expression analyses can also be used to indirectly monitor
signal transduction dynamics and extracellular stimuli.
Some examples.
To identify the effect of an experimental manipulation, consider a
Transcriptional Shift Analysis.
Q: What is a "transcriptional shift
analysis?"
A:
A
2-Group Differential Expression Analysis
is used to identify the effect of an experimental manipulation
while controlling for background influences
(e.g., cell type-specific biases in gene expression).
Comparison of differentially expressed genes with the entire genome (or all genes on a microarray)
picks up both the effects of experimental differences and biases due to the specific cell type
studied (e.g., transcription factors that determine cell fate/differentiation).
To isolate the effects of experimental conditions, compare a list of genes up-regulated in one
cell type with a list of genes down-regulated in the same cell type.
This holds constant the cell type, and focuses the statistical analysis on
promoter characteristics that
show a shifting prevalence as a function of the experimental manipulation.
2-group differential expression analyses are available for
TRANSFAC and
JASPAR databases.
Q: What stringency and promoter size should I
use?
A: Development studies show
that default parameters (600 bases/.90 stringency) work well under a wide
variety of circumstances.
The signal-to-noise ratio can be improved by reducing the promoter size to 300 bases
or increasing scan stringency to
.95. Both decrease spurious background detections.
HOWEVER, high stringency analyses may fail to
detect TFBMs that are actually present.
If you increase scan stringency to .95, consider also
increasing promoter size to 1200 to reduce the likelihood of null results.
A good general strategy is to start with 600/.90,
which provides a good balance between sensitivity and specificity. To increase
sensitivity, first try decreasing promoter size (300/.90). Then try
increasing stringency and promoter size (1200/.95). For maximal sensitivity, decrease
promoter size again (to 600, then 300) while maintaining high stringency. Remember that
many valid results will disappear as stringency increases.
Low stringency analyses (.80) are useful
because some TFBM matrices are already excessively stringent or poorly defined.
Q: How is statistical significance assessed?
A: Two
ways.
1.) Frequency analyses compare the average number of TFBMs detected
in promoters of differentially expressed genes with the average number in
all assayed genes (the "sampling frame"), or to the average number in a second
gene list (in a
2-Group Differential Expression Analysis). These comparisons are
carried out using a z-test (comparing a gene list to
a sampling frame) or a 2-sample t-test (comparing 2 different gene lists).
2.) Incidence analyses determine
whether a TFBM is present in a greater fraction of differentially
expressed genes than in the sampling frame as a whole (or another gene list). This is a
binary analysis (TFBM is present vs. not in each promoter), executed
as an exact binomial test (comparing a gene list to a sampling
frame) or a 2-sample z-test (comparing 2 different gene lists).
With > 1000 genes, incidence analyses switch to the normal approximation
to the binomial.
TFBMs are ranked according to their p-value
in frequency analyses.
BLUE motifs are significantly over-represented, RED
motifs are significantly under-represented, and
GRAY motifs do not differ significantly from population norms.
Top 100 results are listed.
Q: Why 2 statistical tests?
A: They measure different things and have different strengths and vulnerabilities.
Incidence analyses are somewhat more conservative, especially in small
sample sizes. Be most confident when both tests are statistically
significant.
Q: What does it mean for a TFBM to be
under-represented?
A:
RED motifs are significantly less prevalent in promoters of the analyzed genes than in
the sampling frame as a whole. It is not clear what this means
biologically, but it could reflect an inhibitory effect - genes can
change expression only if this transcription factor has no opportunity to
"veto" the change.
Q: How
many genes should I submit?
A: 100 or more
is best. Analytic sensitivity drops significantly for samples <
20, and Frequency analysis p-values lose precision.
Incidence analysis p-values remain accurate for any sample size.
Q: Which genes are driving my differential expression results?
A:
Use TELiS Data Retrieval
to download raw data for your gene list and load it into a spreadsheet such as Excel.
Examine the column containing data for the differentially represented TFBM to determine which
genes contain that motif.
Q: Why not analyze differential representation
relative to the entire genome?
A: Genes found in
microarrays and other sampling frames are not representative of
the entire genome. TFBM prevalence in microarray-assessed genes can
differ by 2-fold or more from their genome-wide prevalence.
The sampling frame defines the set of transcripts that could
possibly be observed to change in an experiment, so it represents the
appropriate reference population.
It could be argued that the most appropriate sampling frame is the subset of genes
found to be present in a particular experiment (rather than the entire set assayed by a
microarray). A
Custom Sampling Frame Analysis
allows you to paste a list of "present" genes as the sampling frame and test whether a
subset of differentially expressed genes is representative of that population.
Be cautious of custom sampling frames, though, because development studies have shown that
biases in microarray "present" calls can reduce the signal-to-noise ratio for detecting
transcription factor activity.
Q: What if my sampling frame is not
available from TELiS?
A: Run your analysis
relative to the entire genome, but treat the results as provisional until
an appropriate sampling frame is defined. To generate a new
sampling frame for human, mouse, or rat genes, email to
coles@ucla.edu 1.) a list of all genes
in your sampling frame (as HGNC Gene Symbols) and 2.) a brief title for
your sampling frame (< 20 characters).
Q: In frequency analyses, why not use a
Poisson test?
A: TFBM frequency data
do not follow a Poisson distribution, so that test would produce
inaccurate p-values. The variance of TFBM frequency data
often exceeds the mean frequency by 2-fold or more, whereas the
Poisson distribution assumes the mean and variance
are equal. We recommend using the default z-test instead, but a
Poisson-based analysis is available.
Q: What is the risk of a false positive result?
A:
The p-value for each statistical test gives the risk of a false positive error
for that particular TFBM (e.g., p < .01).
TELiS differential expression analyses survey hundreds of individual TFBMs,
so the probability of at least one false positive error in the entire set
of results is greater than the p-value for any single test.
False positive risks are often
analyzed in terms of a "false discovery rate" (FDR) -- the fraction of
significant results that are likely due to chance alone.
FDRs depend upon several factors, including the number of genes analyzed,
the number of TFBMs surveyed, the characteristics of the promoter scan
(stringency and promoter size), the stringency of the statistical analysis
(p < .01 vs. p < .00001), and the number of
truly significant results present in the data.
TELiS differential expression analyses provide two FDR estimates.
At the top of the output, a Multiple testing note gives the estimated FDR
when statistical results are declared significant at p < .01.
At the end of the output is an FDR threshold table which provides specific
significance levels (p-values) that control the FDR at 10%, 20%, 30%, or 40%.
Thresholds are derived from the change in FDR over a range of p-values between
.03 and .0001. The FDR is calculated for each p-value by comparing the frequency
of significant results observed in your data with the incidence of significant results
in 10,000 randomly sampled gene lists of similar size and scan characteristics.
The p-value generating a specified FDR is then estimated by regression.
Q: Do TELiS differential expression analyses
actually work?
A:
See some examples.
Q: Who made TELiS?
A: Weihong Yan (wyan@chem.ucla.edu)
generated the promoter bank, and Steve Cole (coles@ucla.edu) did scans and
statistics.
Q: How is it implemented?
A: The TELiS database is
powered by MySQL. Data are generated by the Java application
PromoterScan (defining prevalence matrices) and analyzed by the Java
servlet PromoterStats (detecting differential representation) running
under Apache Tomcat.
Q: Why can't I save my results?
A:
Most likely because you are using the Mozilla Firefox browser. Other browsers such as Explorer, Navigator, and Safari
let you save the current browser content using the "File / Save as..." feature.
For some reason, the Firefox engineers decided NOT to allow saving of current content. Instead, their
"Save page as..." function tries to read the content from the website again (they save a link instead of content).
Unfortunately, there is no persistent link to your TELiS results for security reasons. Firefox loyalists can use
this work-around: From the "View" menu, select "Page source".
When the HTML text pops up, use "Edit / Select all..." to copy the content and paste it into a new empty text file
(e.g., using Windows Notepad). Save the text file with ".html" extension (NOT ".txt")
and you will have a permanent copy of the TELiS results page that can be reopened by Firefox.
Or use another web browser for TELiS analyses.