|
Purpose The distributions show the performance of the ranking and may also be used to estimate the quality of the seed. How it was calculated Ten percent of the seed is repedatly taken out and the position of this left out group in the rank is determined. A good performance results in a clear tendency to show high frequencies for top positions (left side). A random seed would result in a uniform distribution (flat histogram). Definition of the recovery test statisics and the p-value The Null-Hypothesis for this test is that the relative probability to be in the most left bin is not larger in comparison with the relative probability in the rest of the histogram. The p-value is obtained using the cumulative binomial distribution. Definition of the enrichment of a user specified transcription factor test statistics and the associated p-value
If the user chooses to filter the ranked table with binding data an additional statistical test for the enrichment of the selected binding factor is calculated. The Null-Hypothesis for the second statistics is that the genes top-ranked by expression similarity to the seed, are not enriched in binding sites of the selected transcription factor. We use TRAP affinity predictions to model transcription factor binding to the promoters of all genes. The user may select which fraction of top-ranking affinities should be taken for the analysis (default: 0.05). We overlap respective genes with the top-ranked genes obtained from the expression similarity to the seed (here by default top 500 genes are taken). The p-value is derived using the hypergeometric distribution for the expected number and the Fisher's exact test. The example p-values in the last column of the following table are calculated using the default cut off values (affinity rank: 0.05, number of filtered gens: 500). Actual p-values for the different example seed sets Seed set
| p-value for recovery
| ID of selected factor | p-value for factor enrichment | | c-myc | 1.05e-11 | MYCMAX_01 | 0.022 | | NFkB | 3.63e-11 | NFKB_Q6_01 | 0.000456 | | ETS1 | 0.00013 | ETS_Q6 | 0.000348 | | HIF-1a | 2.87e-07 | HIF1_Q5 | 0.5280 | E2F
| 2.50e-14 | E2F_Q6 | 0.0000000253 | | HNF4 | 0.00052 | HNF4_Q6 | 0.000010 | | random | 0.66 | e.g. HNF4_Q6 | 1.0 |
Histograms of the recovery tests for the example data:  
|