Home Documentation Statistics
Statistics
Statistics for the Example Data

Purpose

The distributions show the performance of the ranking and may also be used to estimate the quality of the seed.

 

How it was calculated

Ten percent of the seed is repedatly taken out and the position of this left out group in the rank is determined. A good performance results in a clear tendency to show high frequencies for top positions (left side). A random seed would result in a uniform distribution (flat histogram).

 

Definition of the recovery test statisics and the p-value

The Null-Hypothesis for this test is that the relative probability to be in the most left bin is not larger in comparison with the relative probability in the rest of the histogram. The p-value is obtained using the cumulative binomial distribution.

 

Definition of the enrichment of a user specified transcription factor test statistics and the associated p-value

If the user chooses to filter the ranked table with binding data an additional statistical test for the enrichment of the selected binding factor is calculated.

The Null-Hypothesis for the second statistics is that the genes top-ranked by expression similarity to the seed, are not enriched in binding sites of the selected transcription factor. We use TRAP affinity predictions to model transcription factor binding to the promoters of all genes. The user may select which fraction of top-ranking affinities should be taken for the analysis (default: 0.05). We overlap respective genes with the top-ranked genes obtained from the expression similarity to the seed (here by default top 500 genes are taken). The p-value is derived using the hypergeometric distribution for the expected number and the Fisher's exact test. The example p-values in the last column of the following table are calculated using the default cut off values (affinity rank: 0.05, number of filtered gens: 500).

 

Actual p-values for the different example seed sets

Seed set
p-value for recovery
ID of selected factor p-value for factor enrichment
c-myc 1.05e-11 MYCMAX_01 0.022
NFkB 3.63e-11 NFKB_Q6_01 0.000456
ETS1 0.00013 ETS_Q6 0.000348
HIF-1a 2.87e-07 HIF1_Q5 0.5280
E2F
 2.50e-14 E2F_Q6 0.0000000253
HNF4 0.00052 HNF4_Q6 0.000010
random 0.66 e.g. HNF4_Q6 1.0

 

Histograms of the recovery tests for the example data:

c-myc histogram  NFkB- histogram  ETS1- histogram  HIF-1a histogram  HNF 4 histogram   histogram from a random seed

 


Copyright © 2012 targetfinder.org. All Rights Reserved.