1
|
Bleker C, Grady SK, Langston MA. A Comparative Study of Gene Co-Expression Thresholding Algorithms. J Comput Biol 2024; 31:539-548. [PMID: 38781420 DOI: 10.1089/cmb.2024.0509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024] Open
Abstract
The thresholding problem is studied in the context of graph theoretical analysis of gene co-expression data. A number of thresholding methodologies are described, implemented, and tested over a large collection of graphs derived from real high-throughput biological data. Comparative results are presented and discussed.
Collapse
Affiliation(s)
- Carissa Bleker
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | - Stephen K Grady
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, USA
| | - Michael A Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, USA
| |
Collapse
|
2
|
Garren JM, Kim J. Bootstrapping Time-Course Gene Expression Data for Gene Networks: Application to Gene Relevance Networks. J Comput Biol 2018; 25:1374-1384. [PMID: 30133320 DOI: 10.1089/cmb.2018.0029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Identification of gene regulatory networks (GRNs) is a fundamental step to understand the molecular role of each gene and it helps to develop treatment and cure of a disease. To identify GRNs, time-course gene expression data are widely used. However, the identification is hampered by intrinsic attributes of the data such as small sample size, a large number of variables, and complex error structures with high variation. Under this situation, most GRN inference methods utilize point estimators or make numerous assumptions that are often incompatible with the experimental data. Moreover, different inference methods often provide inconsistent results. An alternative to alleviate this problem can be the bootstrap method because it provides more reliable outcomes by integrating results from multiple bootstrap samples without any distributional assumptions. In this study, we propose a bootstrap method for dependent time-course gene expression data and we mainly focus on its application to gene relevance networks. The proposed method is applied to gene networks for zebrafish retina.
Collapse
Affiliation(s)
| | - Jaejik Kim
- Department of Statistics, Sungkyunkwan University, Seoul, Korea
| |
Collapse
|
3
|
Sakata K, Saito T, Ohyanagi H, Okumura J, Ishige K, Suzuki H, Nakamura T, Komatsu S. Loss of variation of state detected in soybean metabolic and human myelomonocytic leukaemia cell transcriptional networks under external stimuli. Sci Rep 2016; 6:35946. [PMID: 27775018 PMCID: PMC5075873 DOI: 10.1038/srep35946] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 10/07/2016] [Indexed: 01/13/2023] Open
Abstract
Soybean (Glycine max) is sensitive to flooding stress, and flood damage at the seedling stage is a barrier to growth. We constructed two mathematical models of the soybean metabolic network, a control model and a flooded model, from metabolic profiles in soybean plants. We simulated the metabolic profiles with perturbations before and after the flooding stimulus using the two models. We measured the variation of state that the system could maintain from a state-space description of the simulated profiles. The results showed a loss of variation of state during the flooding response in the soybean plants. Loss of variation of state was also observed in a human myelomonocytic leukaemia cell transcriptional network in response to a phorbol-ester stimulus. Thus, we detected a loss of variation of state under external stimuli in two biological systems, regardless of the regulation and stimulus types. Our results suggest that a loss of robustness may occur concurrently with the loss of variation of state in biological systems. We describe the possible applications of the quantity of variation of state in plant genetic engineering and cell biology. Finally, we present a hypothetical "external stimulus-induced information loss" model of biological systems.
Collapse
Affiliation(s)
- Katsumi Sakata
- Maebashi Institute of Technology, Maebashi 371-0816, Japan
- National Institute of Radiological Sciences, Chiba 263-8555, Japan
| | - Toshiyuki Saito
- National Institute of Radiological Sciences, Chiba 263-8555, Japan
| | - Hajime Ohyanagi
- National Institute of Radiological Sciences, Chiba 263-8555, Japan
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Jun Okumura
- Maebashi Institute of Technology, Maebashi 371-0816, Japan
| | - Kentaro Ishige
- Maebashi Institute of Technology, Maebashi 371-0816, Japan
| | - Harukazu Suzuki
- RIKEN Centre for Life Science Technologies, Yokohama 230-0045, Japan
| | - Takuji Nakamura
- National Agriculture and Food Research Organisation (NARO) Hokkaido Agricultural Research Centre, Sapporo 062-8555, Japan
| | | |
Collapse
|
4
|
Crespo I, Doucey MA, Xenarios I. Social networks help to infer causality in the tumor microenvironment. BMC Res Notes 2016; 9:168. [PMID: 26979239 PMCID: PMC4793762 DOI: 10.1186/s13104-016-1976-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 03/03/2016] [Indexed: 11/10/2022] Open
Abstract
Background Networks have become a popular way to conceptualize a system of interacting elements, such as electronic circuits, social communication, metabolism or gene regulation. Network inference, analysis, and modeling techniques have been developed in different areas of science and technology, such as computer science, mathematics, physics, and biology, with an active interdisciplinary exchange of concepts and approaches. However, some concepts seem to belong to a specific field without a clear transferability to other domains. At the same time, it is increasingly recognized that within some biological systems—such as the tumor microenvironment—where different types of resident and infiltrating cells interact to carry out their functions, the complexity of the system demands a theoretical framework, such as statistical inference, graph analysis and dynamical models, in order to asses and study the information derived from high-throughput experimental technologies. Results In this article we propose to adopt and adapt the concepts of influence and investment from the world of social network analysis to biological problems, and in particular to apply this approach to infer causality in the tumor microenvironment. We showed that constructing a bidirectional network of influence between cell and cell communication molecules allowed us to determine the direction of inferred regulations at the expression level and correctly recapitulate cause-effect relationships described in literature. Conclusions This work constitutes an example of a transfer of knowledge and concepts from the world of social network analysis to biomedical research, in particular to infer network causality in biological networks. This causality elucidation is essential to model the homeostatic response of biological systems to internal and external factors, such as environmental conditions, pathogens or treatments. Electronic supplementary material The online version of this article (doi:10.1186/s13104-016-1976-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Isaac Crespo
- Vital-IT, SIB (Swiss Institute of Bioinformatics), University of Lausanne, Lausanne, Switzerland.
| | - Marie-Agnès Doucey
- Ludwig Center for Cancer Research, University of Lausanne, Epalinges, Switzerland
| | - Ioannis Xenarios
- Vital-IT, SIB (Swiss Institute of Bioinformatics), University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
5
|
System-wide analysis of the transcriptional network of human myelomonocytic leukemia cells predicts attractor structure and phorbol-ester-induced differentiation and dedifferentiation transitions. Sci Rep 2015; 5:8283. [PMID: 25655563 PMCID: PMC4319166 DOI: 10.1038/srep08283] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 01/09/2015] [Indexed: 11/24/2022] Open
Abstract
We present a system-wide transcriptional network structure that controls cell types in the context of expression pattern transitions that correspond to cell type transitions. Co-expression based analyses uncovered a system-wide, ladder-like transcription factor cluster structure composed of nearly 1,600 transcription factors in a human transcriptional network. Computer simulations based on a transcriptional regulatory model deduced from the system-wide, ladder-like transcription factor cluster structure reproduced expression pattern transitions when human THP-1 myelomonocytic leukaemia cells cease proliferation and differentiate under phorbol myristate acetate stimulation. The behaviour of MYC, a reprogramming Yamanaka factor that was suggested to be essential for induced pluripotent stem cells during dedifferentiation, could be interpreted based on the transcriptional regulation predicted by the system-wide, ladder-like transcription factor cluster structure. This study introduces a novel system-wide structure to transcriptional networks that provides new insights into network topology.
Collapse
|
6
|
Novel approach for coexpression analysis of E2F1-3 and MYC target genes in chronic myelogenous leukemia. BIOMED RESEARCH INTERNATIONAL 2014; 2014:439840. [PMID: 25180182 PMCID: PMC4142389 DOI: 10.1155/2014/439840] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 07/23/2014] [Indexed: 02/07/2023]
Abstract
BACKGROUND Chronic myelogenous leukemia (CML) is characterized by tremendous amount of immature myeloid cells in the blood circulation. E2F1-3 and MYC are important transcription factors that form positive feedback loops by reciprocal regulation in their own transcription processes. Since genes regulated by E2F1-3 or MYC are related to cell proliferation and apoptosis, we wonder if there exists difference in the coexpression patterns of genes regulated concurrently by E2F1-3 and MYC between the normal and the CML states. RESULTS We proposed a method to explore the difference in the coexpression patterns of those candidate target genes between the normal and the CML groups. A disease-specific cutoff point for coexpression levels that classified the coexpressed gene pairs into strong and weak coexpression classes was identified. Our developed method effectively identified the coexpression pattern differences from the overall structure. Moreover, we found that genes related to the cell adhesion and angiogenesis properties were more likely to be coexpressed in the normal group when compared to the CML group. CONCLUSION Our findings may be helpful in exploring the underlying mechanisms of CML and provide useful information in cancer treatment.
Collapse
|
7
|
Joosen RVL, Ligterink W, Hilhorst HWM, Keurentjes JJB. Advances in genetical genomics of plants. Curr Genomics 2011; 10:540-9. [PMID: 20514216 PMCID: PMC2817885 DOI: 10.2174/138920209789503914] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2009] [Revised: 07/24/2009] [Accepted: 07/29/2009] [Indexed: 11/25/2022] Open
Abstract
Natural variation provides a valuable resource to study the genetic regulation of quantitative traits. In quantitative trait locus (QTL) analyses this variation, captured in segregating mapping populations, is used to identify the genomic regions affecting these traits. The identification of the causal genes underlying QTLs is a major challenge for which the detection of gene expression differences is of major importance. By combining genetics with large scale expression profiling (i.e. genetical genomics), resulting in expression QTLs (eQTLs), great progress can be made in connecting phenotypic variation to genotypic diversity. In this review we discuss examples from human, mouse, Drosophila, yeast and plant research to illustrate the advances in genetical genomics, with a focus on understanding the regulatory mechanisms underlying natural variation. With their tolerance to inbreeding, short generation time and ease to generate large families, plants are ideal subjects to test new concepts in genetics. The comprehensive resources which are available for Arabidopsis make it a favorite model plant but genetical genomics also found its way to important crop species like rice, barley and wheat. We discuss eQTL profiling with respect to cis and trans regulation and show how combined studies with other ‘omics’ technologies, such as metabolomics and proteomics may further augment current information on transcriptional, translational and metabolomic signaling pathways and enable reconstruction of detailed regulatory networks. The fast developments in the ‘omics’ area will offer great potential for genetical genomics to elucidate the genotype-phenotype relationships for both fundamental and applied research.
Collapse
Affiliation(s)
- R V L Joosen
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, NL-6708 PB Wageningen, The Netherlands
| | | | | | | |
Collapse
|
8
|
Prado-Prado FJ, Ubeira FM, Borges F, González-DÃaz H. Unified QSAR & network-based computational chemistry approach to antimicrobials. II. Multiple distance and triadic census analysis of antiparasitic drugs complex networks. J Comput Chem 2010; 31:164-73. [DOI: 10.1002/jcc.21292] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
9
|
Sima C, Hua J, Jung S. Inference of gene regulatory networks using time-series data: a survey. Curr Genomics 2009; 10:416-29. [PMID: 20190956 PMCID: PMC2766792 DOI: 10.2174/138920209789177610] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2008] [Revised: 02/28/2009] [Accepted: 03/02/2009] [Indexed: 11/22/2022] Open
Abstract
The advent of high-throughput technology like microarrays has provided the platform for studying how different cellular components work together, thus created an enormous interest in mathematically modeling biological network, particularly gene regulatory network (GRN). Of particular interest is the modeling and inference on time-series data, which capture a more thorough picture of the system than non-temporal data do. We have given an extensive review of methodologies that have been used on time-series data. In realizing that validation is an impartible part of the inference paradigm, we have also presented a discussion on the principles and challenges in performance evaluation of different methods. This survey gives a panoramic view on these topics, with anticipation that the readers will be inspired to improve and/or expand GRN inference and validation tool repository.
Collapse
Affiliation(s)
- Chao Sima
- Address correspondence to this author at the Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA; Tel: 1(602)343-8485; Fax: 1(602)343-8740; E-mail:
| | | | | |
Collapse
|
10
|
Ogata Y, Sakurai N, Aoki K, Suzuki H, Okazaki K, Saito K, Shibata D. KAGIANA: an excel-based tool for retrieving summary information on Arabidopsis genes. PLANT & CELL PHYSIOLOGY 2009; 50:173-7. [PMID: 19043069 PMCID: PMC2638708 DOI: 10.1093/pcp/pcn179] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 11/17/2008] [Indexed: 05/21/2023]
Abstract
Various public databases provide Arabidopsis gene information via the internet. It is useful to abstract information obtained from such databases. We have developed the KAGIANA tool, which allows a user to retrieve summary information obtained from selective databases and to access pages for a gene of interest in those databases. The tool is based on Microsoft Excel and provides several macro programs for gene expression analyses. It can assist plant biologists in accessing omics information for plant biology. The KAGIANA tool is freely available at http://pmnedo.kazusa.or.jp/kagiana/.
Collapse
Affiliation(s)
- Yoshiyuki Ogata
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Nozomu Sakurai
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Koh Aoki
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Hideyuki Suzuki
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Koei Okazaki
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Kazuki Saito
- Graduate School of Pharmaceutical Science, Chiba University, Yayoi-cho 1-33, Inage-ku, Chiba, 263-8522 Japan
| | - Daisuke Shibata
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
- *Corresponding author: E-mail, ; Fax, +81-438-52-3948
| |
Collapse
|
11
|
Prado-Prado FJ, Martinez de la Vega O, Uriarte E, Ubeira FM, Chou KC, González-Díaz H. Unified QSAR approach to antimicrobials. 4. Multi-target QSAR modeling and comparative multi-distance study of the giant components of antiviral drug-drug complex networks. Bioorg Med Chem 2008; 17:569-75. [PMID: 19112024 DOI: 10.1016/j.bmc.2008.11.075] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2008] [Revised: 11/24/2008] [Accepted: 11/28/2008] [Indexed: 11/18/2022]
Abstract
One limitation of almost all antiviral Quantitative Structure-Activity Relationships (QSAR) models is that they predict the biological activity of drugs against only one species of virus. Consequently, the development of multi-tasking QSAR models (mt-QSAR) to predict drugs activity against different species of virus is of the major vitally important. These mt-QSARs offer also a good opportunity to construct drug-drug Complex Networks (CNs) that can be used to explore large and complex drug-viral species databases. It is known that in very large CNs we can use the Giant Component (GC) as a representative sub-set of nodes (drugs) and but the drug-drug similarity function selected may strongly determines the final network obtained. In the three previous works of the present series we reported mt-QSAR models to predict the antimicrobial activity against different fungi [Gonzalez-Diaz, H.; Prado-Prado, F. J.; Santana, L.; Uriarte, E. Bioorg.Med.Chem.2006, 14, 5973], bacteria [Prado-Prado, F. J.; Gonzalez-Diaz, H.; Santana, L.; Uriarte E. Bioorg.Med.Chem.2007, 15, 897] or parasite species [Prado-Prado, F.J.; González-Díaz, H.; Martinez de la Vega, O.; Ubeira, F.M.; Chou K.C. Bioorg.Med.Chem.2008, 16, 5871]. However, including these works, we do not found any report of mt-QSAR models for antivirals drug, or a comparative study of the different GC extracted from drug-drug CNs based on different similarity functions. In this work, we used Linear Discriminant Analysis (LDA) to fit a mt-QSAR model that classify 600 drugs as active or non-active against the 41 different tested species of virus. The model correctly classifies 143 of 169 active compounds (specificity=84.62%) and 119 of 139 non-active compounds (sensitivity=85.61%) and presents overall training accuracy of 85.1% (262 of 308 cases). Validation of the model was carried out by means of external predicting series, classifying the model 466 of 514, 90.7% of compounds. In order to illustrate the performance of the model in practice, we develop a virtual screening recognizing the model as active 92.7%, 102 of 110 antivirus compounds. These compounds were never use in training or predicting series. Next, we obtained and compared the topology of the CNs and their respective GCs based on Euclidean, Manhattan, Chebychey, Pearson and other similarity measures. The GC of the Manhattan network showed the more interesting features for drug-drug similarity search. We also give the procedure for the construction of Back-Projection Maps for the contribution of each drug sub-structure to the antiviral activity against different species.
Collapse
Affiliation(s)
- Francisco J Prado-Prado
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Santiago de Compostela 15782, Spain
| | | | | | | | | | | |
Collapse
|
12
|
Prado-Prado FJ, González-Díaz H, de la Vega OM, Ubeira FM, Chou KC. Unified QSAR approach to antimicrobials. Part 3: first multi-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds. Bioorg Med Chem 2008; 16:5871-80. [PMID: 18485714 DOI: 10.1016/j.bmc.2008.04.068] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Revised: 04/22/2008] [Accepted: 04/25/2008] [Indexed: 10/22/2022]
Abstract
Several pathogen parasite species show different susceptibilities to different antiparasite drugs. Unfortunately, almost all structure-based methods are one-task or one-target Quantitative Structure-Activity Relationships (ot-QSAR) that predict the biological activity of drugs against only one parasite species. Consequently, multi-tasking learning to predict drugs activity against different species by a single model (mt-QSAR) is vitally important. In the two previous works of the present series we reported two single mt-QSAR models in order to predict the antimicrobial activity against different fungal (Bioorg. Med. Chem.2006, 14, 5973-5980) or bacterial species (Bioorg. Med. Chem.2007, 15, 897-902). These mt-QSARs offer a good opportunity (unpractical with ot-QSAR) to construct drug-drug similarity Complex Networks and to map the contribution of sub-structures to function for multiple species. These possibilities were unattended in our previous works. In the present work, we continue this series toward other important direction of chemotherapy (antiparasite drugs) with the development of an mt-QSAR for more than 500 drugs tested in the literature against different parasites. The data were processed by Linear Discriminant Analysis (LDA) classifying drugs as active or non-active against the different tested parasite species. The model correctly classifies 212 out of 244 (87.0%) cases in training series and 207 out of 243 compounds (85.4%) in external validation series. In order to illustrate the performance of the QSAR for the selection of active drugs we carried out an additional virtual screening of antiparasite compounds not used in training or predicting series; the model recognized 97 out of 114 (85.1%) of them. We also give the procedures to construct back-projection maps and to calculate sub-structures contribution to the biological activity. Finally, we used the outputs of the QSAR to construct, by the first time, a multi-species Complex Networks of antiparasite drugs. The network predicted has 380 nodes (compounds), 634 edges (pairs of compounds with similar activity). This network allows us to cluster different compounds and identify on average three known compounds similar to a new query compound according to their profile of biological activity. This is the first attempt to calculate probabilities of antiparasitic action of drugs against different parasites.
Collapse
|
13
|
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E. Proteomics, networks and connectivity indices. Proteomics 2008; 8:750-78. [DOI: 10.1002/pmic.200700638] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
14
|
GonzÁlez-DÍaz H, Prado-Prado FJ. Unified QSAR and network-based computational chemistry approach to antimicrobials, part 1: Multispecies activity models for antifungals. J Comput Chem 2007; 29:656-67. [DOI: 10.1002/jcc.20826] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
15
|
Elo LL, Järvenpää H, Oresic M, Lahesmaa R, Aittokallio T. Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process. ACTA ACUST UNITED AC 2007; 23:2096-103. [PMID: 17553854 DOI: 10.1093/bioinformatics/btm309] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Coexpression networks have recently emerged as a novel holistic approach to microarray data analysis and interpretation. Choosing an appropriate cutoff threshold, above which a gene-gene interaction is considered as relevant, is a critical task in most network-centric applications, especially when two or more networks are being compared. RESULTS We demonstrate that the performance of traditional approaches, which are based on a pre-defined cutoff or significance level, can vary drastically depending on the type of data and application. Therefore, we introduce a systematic procedure for estimating a cutoff threshold of coexpression networks directly from their topological properties. Both synthetic and real datasets show clear benefits of our data-driven approach under various practical circumstances. In particular, the procedure provides a robust estimate of individual degree distributions, even from multiple microarray studies performed with different array platforms or experimental designs, which can be used to discriminate the corresponding phenotypes. Application to human T helper cell differentiation process provides useful insights into the components and interactions controlling this process, many of which would have remained unidentified on the basis of expression change alone. Moreover, several human-mouse orthologs showed conserved topological changes in both systems, suggesting their potential importance in the differentiation process. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura L Elo
- Department of Mathematics, FI-20014 University of Turku, Finland.
| | | | | | | | | |
Collapse
|