1 Introduction ................................................. 1
1.1 Integrative biology ..................................... 1
1.2 Embryonic stem cells .................................... 2
1.3 Large-scale and integrative studies in ES cells
research ................................................ 7
1.3.1 RNAi screening in ES cells ....................... 7
1.3.2 TF binding profiling in ES cells ................. 8
1.3.3 Integrative analysis of ES cells related
datasets ........................................ 10
2 Aims of the study ........................................... 13
3 Identification of protein complexes maintaining self-
renewal of ES cells ......................................... 15
3.1 Introduction ........................................... 15
3.1.1 RNA interference ................................ 15
3.1.2 RNAi in molecular biology ....................... 17
3.1.3 RNA screening ................................... 17
3.1.4 Integrative analysis of RNAi screens ............ 20
3.2 Results ................................................ 21
3.2.1 Normalization of genome-wide RNAi screens ....... 22
3.2.2 Low overlap between genome-wide RNAi screens .... 26
3.2.3 Combined analysis of genome-wide RNAi screens
allows better coverage of known protein
complexes ....................................... 27
3.2.4 Extensive overlap between subsets of CORUM
complexes ....................................... 28
3.2.5 Enrichment tests accounting for the overlap
between complexes ............................... 28
3.2.6 Overlap-adjusted enrichment tests reduce
redundancy among top findings ................... 32
3.2.7 Complex enrichment analysis enhances
consistency between screens ..................... 34
3.2.8 Complex enrichment analysis enhances the
recovery of known pluripotency related genes .... 35
3.2.9 Complex enrichment analysis enhances the
recovery of genes downregulated upon
differentiation ................................. 36
3.2.10 Complex enrichment analysis enhances the
recovery of genes upregulated upon
reprogramming ................................... 37
3.2.11 Statistical significance of the complex
enrichment analysis ............................. 39
3.2.12 Prioritization of complexes for follow-up
analysis ........................................ 42
3.2.13 Selected complexes identified by complex
enrichment analysis ............................. 43
3.2.14 Identification and analysis of protein
subcomplexes .................................... 59
3.3 Discussion ............................................. 60
3.3.1 Achievements .................................... 60
3.3.2 Consistency between genome-wide RNAi screens .... 61
3.3.3 Complex enrichment analysis ..................... 63
3.3.4 Protein complexes maintaining ES cells self-
renewal and pluripotency ........................ 65
3.3.5 Limitations of current approach and future
work ............................................ 67
3.4 Methods ................................................ 68
3.4.1 Preprocessing and normalization of genome-
wide RNAi screens ............................... 68
3.4.2 Preprocessing of other RNAi screen used for
validation ...................................... 70
3.4.3 Preprocessing of expression data used for
validation ...................................... 70
3.4.4 Protein complexes ............................... 71
3.4.5 Correlation between CORUM complexes ............. 73
3.4.6 Complex enrichment analysis ..................... 73
3.4.7 Multiple testing correction ..................... 76
3.4.8 Evaluation tests for enrichment methods ......... 78
3.5 Contributions .......................................... 79
4 Transcription factor target gene identification based on
ChlP-seq data ............................................... 81
4.1 Introduction ........................................... 81
4.1.1 ChlP-seq experiment ............................. 81
4.1.2 Inferring TF targets from ChIP-seq data ......... 83
4.2 Results ................................................ 84
4.2.1 TF target prediction methods .................... 84
4.2.2 Evaluation of TF-target prediction methods ...... 88
4.2.3 Ranking of differentially expressed genes is
biased by non-changing genes .................... 93
4.2.4 Inclusion of additional genomic data ............ 95
4.2.5 ClosestGene minimizes bias in gene-rich
regions ......................................... 97
4.2.6 Q-values better allow comparison between
ChIP-seq experiments than p-values .............. 98
4.2.7 Comparing TF target profiles .................... 99
4.2.8 Delayed response of strong TF targets to TF
perturbation ................................... 101
4.2.9 Regulatory program of gene clusters ............ 103
4.3 Discussion ............................................ 106
4.3.1 Achievements ................................... 106
4.3.2 Peak-to-gene assignment is crucial for
successful target gene identification .......... 106
4.3.3 Common shortcomings of ChIP-seq-based
scorings ....................................... 107
4.3.4 Unregulated genes may bias correlation
between expression and ChIP-seq data ........... 108
4.3.5 Inclusion of additional genomic data ........... 109
4.3.6 Delayed response of strong targets ............. 109
4.3.7 Chromosomal clustering of target genes ......... 110
4.3.8 Future work .................................... 111
4.4 Methods ............................................... 112
4.4.1 ChlP-seq data .................................. 112
4.4.2 Expression datasets used for validation ........ 112
4.4.3 Gene positions ................................. 113
4.4.4 Window-based methods for distance-based TF-
target prediction .............................. 114
4.4.5 Distribution-based methods for distance-based
TF target assignment ........................... 114
4.4.6 Evaluation of the TF-target prediction
methods ........................................ 115
4.4.7 Incorporation of additional information ........ 116
4.4.8 Q-value calculation ............................ 116
4.4.9 Time of response of TF targets upon TF
depletion ...................................... 117
4.4.10 Clustering of TFs based on binding sites and
target genes ................................... 117
4.4.11 R package ...................................... 117
4.5 Contributions ......................................... 118
5 Exploring network-based analysis of functional screens ..... 119
5.1 Network-based analysis of functional screens .......... 119
5.2 Application of the eQED algorithm to the analysis of
RNAi screens .......................................... 120
6 Outlook .................................................... 129
List of Figures ............................................... 131
List of Tables ................................................ 133
Abbreviations ................................................. 135
Appendices .................................................... 137
A Manually added complexes ................................. 137
В impFisher and impGSEA pseudocode ......................... 139
С Complexes identified in mouse screens .................... 141
D Complexes identified in human screen ..................... 145
E Transcription factors and data included in the study ..... 149
References .................................................... 155
|