Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wang Z, Zhang XC, Le MH, Xu D, Stacey G, Cheng J. A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS One 2011;6:e17906. [PMID: 21455299 PMCID: PMC3063783 DOI: 10.1371/journal.pone.0017906] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2010] [Accepted: 02/16/2011] [Indexed: 11/18/2022] Open

For:	Wang Z, Zhang XC, Le MH, Xu D, Stacey G, Cheng J. A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS One 2011;6:e17906. [PMID: 21455299 PMCID: PMC3063783 DOI: 10.1371/journal.pone.0017906] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2010] [Accepted: 02/16/2011] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023;91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]

Mining semantic information of co-word network to improve link prediction performance. Scientometrics 2022. [DOI: 10.1007/s11192-021-04247-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Sikander R, Wang Y, Ghulam A, Wu X. Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network. Front Genet 2021;12:759384. [PMID: 34917128 PMCID: PMC8670239 DOI: 10.3389/fgene.2021.759384] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 10/25/2021] [Indexed: 11/21/2022] Open

Abstract

Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we propose a novel approach for predicting the amino acid sequences of enzymes and non-enzymes via Convolutional Neural Network (CNN). In CNN, the roles of enzymes are predicted from multiple sides of biological information, including information on sequences and structures. We propose the use of two-dimensional data via 2DCNN to predict the proteins of enzymes and non-enzymes by using the same fivefold cross-validation function. We also use an independent dataset to test the performance of our model, and the results demonstrate that we are able to solve the overfitting problem. We used the CNN model proposed herein to demonstrate the superiority of our model for classifying an entire set of filters, such as 32, 64, and 128 parameters, with the fivefold validation test set as the independent classification. Via the Dipeptide Deviation from Expected Mean (DDE) matrix, mutation information is extracted from amino acid sequences and structural information with the distance and angle of amino acids is conveyed. The derived feature maps are then encoded in DDE exploitation. The independent datasets are then compared with other two methods, namely, GRU and XGBOOST. All analyses were conducted using 32, 64 and 128 filters on our proposed CNN method. The cross-validation datasets achieved an accuracy score of 0.8762%, whereas the accuracy of independent datasets was 0.7621%. Additional variables were derived on the basis of ROC AUC with fivefold cross-validation was achieved score is 0.95%. The performance of our model and that of other models in terms of sensitivity (0.9028%) and specificity (0.8497%) was compared. The overall accuracy of our model was 0.9133% compared with 0.8310% for the other model.

Collapse

YAMATO K, KATO H, KATSURAGI T, TAKAHASHI Y. The Multiple Representation of Protein Sequence MotifsUsing Sequence Binary Decision Diagrams. JOURNAL OF COMPUTER CHEMISTRY-JAPAN 2020. [DOI: 10.2477/jccj.2019-0028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Gao R, Wang M, Zhou J, Fu Y, Liang M, Guo D, Nie J. Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation. Int J Mol Sci 2019;20:E2845. [PMID: 31212665 PMCID: PMC6600291 DOI: 10.3390/ijms20112845] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 06/03/2019] [Accepted: 06/04/2019] [Indexed: 01/28/2023] Open

Liu T, Wang Z. Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks. BMC Bioinformatics 2018;19:496. [PMID: 30591009 PMCID: PMC6309071 DOI: 10.1186/s12859-018-2464-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open

Abstract

BACKGROUND

Hi-C data have been widely used to reconstruct chromosomal three-dimensional (3D) structures. One of the key limitations of Hi-C is the unclear relationship between spatial distance and the number of Hi-C contacts. Many methods used a fixed parameter when converting the number of Hi-C contacts to wish distances. However, a single parameter cannot properly explain the relationship between wish distances and genomic distances or the locations of topologically associating domains (TADs).

RESULTS

We have addressed one of the key issues of using Hi-C data, that is, the unclear relationship between spatial distances and the number of Hi-C contacts, which is crucial to understand significant biological functions, such as the enhancer-promoter interactions. Specifically, we developed a new method to infer this converting parameter and pairwise Euclidean distances based on the topology of the Hi-C complex network (HiCNet). The inferred distances were modeled by clustering coefficient and multiple other types of constraints. We found that our inferred distances between bead-pairs within the same TAD were apparently smaller than those distances between bead-pairs from different TADs. Our inferred distances had a higher correlation with fluorescence in situ hybridization (FISH) data, fitted the localization patterns of Xist transcripts on DNA, and better matched 156 pairs of protein-enabled long-range chromatin interactions detected by ChIA-PET. Using the inferred distances and another round of optimization, we further reconstructed 40 kb high-resolution 3D chromosomal structures of mouse male ES cells. The high-resolution structures successfully illustrate TADs and DNA loops (peaks in Hi-C contact heatmaps) that usually indicate enhancer-promoter interactions.

CONCLUSIONS

We developed a novel method to infer the wish distances between DNA bead-pairs from Hi-C contacts. High-resolution 3D structures of chromosomes were built based on the newly-inferred wish distances. This whole process has been implemented as a tool named HiCNet, which is publicly available at http://dna.cs.miami.edu/HiCNet/ .

Collapse

Co-Occurrence Network of High-Frequency Words in the Bioinformatics Literature: Structural Characteristics and Evolution. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8101994] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Keel BN, Deng B, Moriyama EN. MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks. Bioinformatics 2018;34:1270-1277. [PMID: 29186344 DOI: 10.1093/bioinformatics/btx755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/23/2017] [Indexed: 11/14/2022] Open

Hansen BO, Meyer EH, Ferrari C, Vaid N, Movahedi S, Vandepoele K, Nikoloski Z, Mutwil M. Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana. THE NEW PHYTOLOGIST 2018;217:1521-1534. [PMID: 29205376 DOI: 10.1111/nph.14921] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Accepted: 10/24/2017] [Indexed: 05/25/2023]

Wang Z, Zhao C, Wang Y, Sun Z, Wang N. PANDA: Protein function prediction using domain architecture and affinity propagation. Sci Rep 2018;8:3484. [PMID: 29472600 PMCID: PMC5823857 DOI: 10.1038/s41598-018-21849-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 02/09/2018] [Indexed: 12/23/2022] Open

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. Molecules 2017;22:molecules22101732. [PMID: 29039790 PMCID: PMC6151571 DOI: 10.3390/molecules22101732] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 10/11/2017] [Accepted: 10/11/2017] [Indexed: 11/25/2022] Open

Cao R, Cheng J. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods 2016;93:84-91. [PMID: 26370280 PMCID: PMC4894840 DOI: 10.1016/j.ymeth.2015.09.011] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 09/03/2015] [Accepted: 09/10/2015] [Indexed: 11/30/2022] Open

Exploring soybean metabolic pathways based on probabilistic graphical model and knowledge-based methods. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015;2015:5. [PMID: 28194174 PMCID: PMC5270328 DOI: 10.1186/s13637-015-0026-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 06/09/2015] [Indexed: 12/02/2022]

Li J, Hou J, Sun L, Wilkins JM, Lu Y, Niederhuth CE, Merideth BR, Mawhinney TP, Mossine VV, Greenlief CM, Walker JC, Folk WR, Hannink M, Lubahn DB, Birchler JA, Cheng J. From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data. PLoS One 2015;10:e0125000. [PMID: 25902288 PMCID: PMC4406561 DOI: 10.1371/journal.pone.0125000] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 03/19/2015] [Indexed: 01/31/2023] Open

Affiliation(s)

Jilong Li Computer Science Department, University of Missouri, Columbia, Missouri, United States of America MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America
Jie Hou Computer Science Department, University of Missouri, Columbia, Missouri, United States of America
Lin Sun Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America
Jordan Maximillian Wilkins Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
Yuan Lu MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
Chad E. Niederhuth Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America
Benjamin Ryan Merideth Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
Thomas P. Mawhinney Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
Valeri V. Mossine MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
C. Michael Greenlief MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America Department of Chemistry, University of Missouri, Columbia, Missouri, United States of America
John C. Walker Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America
William R. Folk MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
Mark Hannink MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
Dennis B. Lubahn MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
James A. Birchler Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America
Jianlin Cheng Computer Science Department, University of Missouri, Columbia, Missouri, United States of America MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America Informatics Institute, University of Missouri, Columbia, Missouri, United States of America C. Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America

Collapse

Gong P, Madak-Erdogan Z, Li J, Cheng J, Greenlief CM, Helferich W, Katzenellenbogen JA, Katzenellenbogen BS. Transcriptomic analysis identifies gene networks regulated by estrogen receptor α (ERα) and ERβ that control distinct effects of different botanical estrogens. NUCLEAR RECEPTOR SIGNALING 2014;12:e001. [PMID: 25363786 PMCID: PMC4193135 DOI: 10.1621/nrs.12001] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Revised: 04/28/2014] [Accepted: 05/13/2014] [Indexed: 12/31/2022]

Abstract

The estrogen receptors (ERs) ERα and ERβ mediate the actions of endogenous estrogens as well as those of botanical estrogens (BEs) present in plants. BEs are ingested in the diet and also widely consumed by postmenopausal women as dietary supplements, often as a substitute for the loss of endogenous estrogens at menopause. However, their activities and efficacies, and similarities and differences in gene expression programs with respect to endogenous estrogens such as estradiol (E2) are not fully understood. Because gene expression patterns underlie and control the broad physiological effects of estrogens, we have investigated and compared the gene networks that are regulated by different BEs and by E2. Our aim was to determine if the soy and licorice BEs control similar or different gene expression programs and to compare their gene regulations with that of E2. Gene expression was examined by RNA-Seq in human breast cancer (MCF7) cells treated with control vehicle, BE or E2. These cells contained three different complements of ERs, ERα only, ERα+ERβ, or ERβ only, reflecting the different ratios of these two receptors in different human breast cancers and in different estrogen target cells. Using principal component, hierarchical clustering, and gene ontology and interactome analyses, we found that BEs regulated many of the same genes as did E2. The genes regulated by each BE, however, were somewhat different from one another, with some genes being regulated uniquely by each compound. The overlap with E2 in regulated genes was greatest for the soy isoflavones genistein and S-equol, while the greatest difference from E2 in gene expression pattern was observed for the licorice root BE liquiritigenin. The gene expression pattern of each ligand depended greatly on the cell background of ERs present. Despite similarities in gene expression pattern with E2, the BEs were generally less stimulatory of genes promoting proliferation and were more pro-apoptotic in their gene regulations than E2. The distinctive patterns of gene regulation by the individual BEs and E2 may underlie differences in the activities of these soy and licorice-derived BEs in estrogen target cells containing different levels of the two ERs.

Collapse

Proteome-wide remodeling of protein location and function by stress. Proc Natl Acad Sci U S A 2014;111:E3157-66. [PMID: 25028499 DOI: 10.1073/pnas.1318881111] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Qu Z, Meng F, Zhou H, Li J, Wang Q, Wei F, Cheng J, Greenlief CM, Lubahn DB, Sun GY, Liu S, Gu Z. NitroDIGE analysis reveals inhibition of protein S-nitrosylation by epigallocatechin gallates in lipopolysaccharide-stimulated microglial cells. J Neuroinflammation 2014;11:17. [PMID: 24472655 PMCID: PMC3922161 DOI: 10.1186/1742-2094-11-17] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Accepted: 01/20/2014] [Indexed: 12/28/2022] Open

Abstract

Background

Nitric oxide (NO) is a signaling molecule regulating numerous cellular functions in development and disease. In the brain, neuronal injury or neuroinflammation can lead to microglial activation, which induces NO production. NO can react with critical cysteine thiols of target proteins forming S-nitroso-proteins. This modification, known as S-nitrosylation, is an evolutionarily conserved redox-based post-translational modification (PTM) of specific proteins analogous to phosphorylation. In this study, we describe a protocol for analyzing S-nitrosylation of proteins using a gel-based proteomic approach and use it to investigate the modes of action of a botanical compound found in green tea, epigallocatechin-3-gallate (EGCG), on protein S-nitrosylation after microglial activation.

Methods/Results

To globally and quantitatively analyze NO-induced protein S-nitrosylation, the sensitive gel-based proteomic method, termed NitroDIGE, was developed by combining two-dimensional differential in-gel electrophoresis (2-D DIGE) with the modified biotin switch technique (BST) using fluorescence-tagged CyDye™ thiol reactive agents to label S-nitrosothiols. The NitroDIGE method showed high specificity and sensitivity in detecting S-nitrosylated proteins (SNO-proteins). Using this approach, we identified a subset of SNO-proteins ex vivo by exposing immortalized murine BV-2 microglial cells to a physiological NO donor, or in vivo by exposing BV-2 cells to endotoxin lipopolysaccharides (LPS) to induce a proinflammatory response. Moreover, EGCG was shown to attenuate S-nitrosylation of proteins after LPS-induced activation of microglial cells primarily by modulation of the nuclear factor erythroid 2-related factor 2 (Nrf2)-mediated oxidative stress response.

Conclusions

These results demonstrate that NitroDIGE is an effective proteomic strategy for “top-down” quantitative analysis of protein S-nitrosylation in multi-group samples in response to nitrosative stress due to excessive generation of NO in cells. Using this approach, we have revealed the ability of EGCG to down-regulate protein S-nitrosylation in LPS-stimulated BV-2 microglial cells, consistent with its known antioxidant effects.

Collapse

Zhu M, Dahmen JL, Stacey G, Cheng J. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data. BMC Bioinformatics 2013;14:278. [PMID: 24053776 PMCID: PMC3854569 DOI: 10.1186/1471-2105-14-278] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 09/03/2013] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed.

RESULTS

We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature.

CONCLUSIONS

We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.

Collapse

A novel function prediction approach using protein overlap networks. BMC SYSTEMS BIOLOGY 2013;7:61. [PMID: 23866986 PMCID: PMC3720179 DOI: 10.1186/1752-0509-7-61] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 07/12/2013] [Indexed: 11/10/2022]

The properties of genome conformation and spatial gene interaction and regulation networks of normal and malignant human cell types. PLoS One 2013;8:e58793. [PMID: 23536826 PMCID: PMC3594155 DOI: 10.1371/journal.pone.0058793] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Accepted: 02/06/2013] [Indexed: 01/01/2023] Open

Fang H, Gough J. A disease-drug-phenotype matrix inferred by walking on a functional domain network. MOLECULAR BIOSYSTEMS 2013;9:1686-96. [PMID: 23462907 DOI: 10.1039/c3mb25495j] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Abstract

Protein domains are classified as units of structure, evolution and function, and thus form the molecular backbone of biosphere. Although functional networks at the protein level have been reported to be of value in predicting diseases (phenotypes or drugs), they have not previously been applied at the sub-protein resolution (protein domain in this case). We herein introduce a domain network with a functional perspective. This network has nodes consisting of protein domains (at the superfamily/evolutionary level), with edges weighted by the semantic similarity according to domain-centric Gene Ontology (dcGO) annotations, which henceforth we call "dcGOnet". By globally exploring this network via a random walk, we demonstrate its predictive value on disease, drug, or phenotype-related ontologies. On cross-validation recovering ontology labels for domains, we achieve an overall area under the ROC curve of 89.0% for drugs, 87.3% for diseases, 87.6% for human phenotypes and 88.2% for mouse phenotypes. We show that the performance using global information from this network is significantly better than using local information, and also illustrate that the better performance is not sensitive to network size, or the choice of algorithm parameters, and is universal to different ontologies. Based on the dcGOnet and its global properties, we further develop an approach to build a disease-drug-phenotype matrix. The predicted interconnections are statistically supported using a novel randomization procedure, and are also empirically supported by inspection for biological relevance. Most of the high-ranking predictions recover connections that are well known, but others uncover connections that have only suggestive or obscure support in the literature; we show that these are missed by simpler methods, in particular for drug-disease connections. The value of this work is threefold: we describe a general methodology and make the software available, we provide the functional domain network itself, and the ranked drug-disease-phenotype matrix provides rich targets for investigation. All three can be found at .

Collapse

Wang Z, Cao R, Cheng J. Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks. BMC Bioinformatics 2013;14 Suppl 3:S3. [PMID: 23514381 PMCID: PMC3584933 DOI: 10.1186/1471-2105-14-s3-s3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Abstract

Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era. However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function. Here, we developed a method that integrated profile-sequence alignment, profile-profile alignment, and Domain Co-Occurrence Networks (DCN) to predict protein function at different levels of complexity, ranging from obvious homology, to remote homology, to no homology. We tested the method blindingly in the 2011 Critical Assessment of Function Annotation (CAFA). Our experiments demonstrated that our three-level prediction method effectively increased the recall of function prediction while maintaining a reasonable precision. Particularly, our method can predict function terms defined by the Gene Ontology more accurately than three standard baseline methods in most situations, handle multi-domain proteins naturally, and make ab initio function prediction when no homology exists. These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).

Collapse

Mangiola S, Young ND, Korhonen P, Mondal A, Scheerlinck JP, Sternberg PW, Cantacessi C, Hall RS, Jex AR, Gasser RB. Getting the most out of parasitic helminth transcriptomes using HelmDB: implications for biology and biotechnology. Biotechnol Adv 2012;31:1109-19. [PMID: 23266393 DOI: 10.1016/j.biotechadv.2012.12.004] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Revised: 12/08/2012] [Accepted: 12/13/2012] [Indexed: 12/17/2022]

Abstract

Compounded by a massive global food shortage, many parasitic diseases have a devastating, long-term impact on animal and human health and welfare worldwide. Parasitic helminths (worms) affect the health of billions of animals. Unlocking the systems biology of these neglected pathogens will underpin the design of new and improved interventions against them. Currently, the functional annotation of genomic and transcriptomic sequence data for socio-economically important parasitic worms relies almost exclusively on comparative bioinformatic analyses using model organism- and other databases. However, many genes and gene products of parasitic helminths (often >50%) cannot be annotated using this approach, because they are specific to parasites and/or do not have identifiable homologs in other organisms for which sequence data are available. This inability to fully annotate transcriptomes and predicted proteomes is a major challenge and constrains our understanding of the biology of parasites, interactions with their hosts and of parasitism and the pathogenesis of disease on a molecular level. In the present article, we compiled transcriptomic data sets of key, socioeconomically important parasitic helminths, and constructed and validated a curated database, called HelmDB (www.helmdb.org). We demonstrate how this database can be used effectively for the improvement of functional annotation by employing data integration and clustering. Importantly, HelmDB provides a practical and user-friendly toolkit for sequence browsing and comparative analyses among divergent helminth groups (including nematodes and trematodes), and should be readily adaptable and applicable to a wide range of other organisms. This web-based, integrative database should assist 'systems biology' studies of parasitic helminths, and the discovery and prioritization of novel drug and vaccine targets. This focus provides a pathway toward developing new and improved approaches for the treatment and control of parasitic diseases, with the potential for important biotechnological outcomes.

Collapse

Zhu M, Deng X, Joshi T, Xu D, Stacey G, Cheng J. Reconstructing differentially co-expressed gene modules and regulatory networks of soybean cells. BMC Genomics 2012;13:437. [PMID: 22938179 PMCID: PMC3563468 DOI: 10.1186/1471-2164-13-437] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 08/22/2012] [Indexed: 11/23/2022] Open

Abstract

BACKGROUND

Current experimental evidence indicates that functionally related genes show coordinated expression in order to perform their cellular functions. In this way, the cell transcriptional machinery can respond optimally to internal or external stimuli. This provides a research opportunity to identify and study co-expressed gene modules whose transcription is controlled by shared gene regulatory networks.

RESULTS

We developed and integrated a set of computational methods of differential gene expression analysis, gene clustering, gene network inference, gene function prediction, and DNA motif identification to automatically identify differentially co-expressed gene modules, reconstruct their regulatory networks, and validate their correctness. We tested the methods using microarray data derived from soybean cells grown under various stress conditions. Our methods were able to identify 42 coherent gene modules within which average gene expression correlation coefficients are greater than 0.8 and reconstruct their putative regulatory networks. A total of 32 modules and their regulatory networks were further validated by the coherence of predicted gene functions and the consistency of putative transcription factor binding motifs. Approximately half of the 32 modules were partially supported by the literature, which demonstrates that the bioinformatic methods used can help elucidate the molecular responses of soybean cells upon various environmental stresses.

CONCLUSIONS

The bioinformatics methods and genome-wide data sources for gene expression, clustering, regulation, and function analysis were integrated seamlessly into one modular protocol to systematically analyze and infer modules and networks from only differential expression genes in soybean cells grown under stress conditions. Our approach appears to effectively reduce the complexity of the problem, and is sufficiently robust and accurate to generate a rather complete and detailed view of putative soybean gene transcription logic potentially underlying the responses to the various environmental challenges. The same automated method can also be applied to reconstruct differentially co-expressed gene modules and their regulatory networks from gene expression data of any other transcriptome.

Collapse

Zhang XC, Wang Z, Zhang X, Le MH, Sun J, Xu D, Cheng J, Stacey G. Evolutionary dynamics of protein domain architecture in plants. BMC Evol Biol 2012;12:6. [PMID: 22252370 PMCID: PMC3310802 DOI: 10.1186/1471-2148-12-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 01/17/2012] [Indexed: 12/17/2022] Open

Abstract

Background

Protein domains are the structural, functional and evolutionary units of the protein. Protein domain architectures are the linear arrangements of domain(s) in individual proteins. Although the evolutionary history of protein domain architecture has been extensively studied in microorganisms, the evolutionary dynamics of domain architecture in the plant kingdom remains largely undefined. To address this question, we analyzed the lineage-based protein domain architecture content in 14 completed green plant genomes.

Results

Our analyses show that all 14 plant genomes maintain similar distributions of species-specific, single-domain, and multi-domain architectures. Approximately 65% of plant domain architectures are universally present in all plant lineages, while the remaining architectures are lineage-specific. Clear examples are seen of both the loss and gain of specific protein architectures in higher plants. There has been a dynamic, lineage-wise expansion of domain architectures during plant evolution. The data suggest that this expansion can be largely explained by changes in nuclear ploidy resulting from rounds of whole genome duplications. Indeed, there has been a decrease in the number of unique domain architectures when the genomes were normalized into a presumed ancestral genome that has not undergone whole genome duplications.

Conclusions

Our data show the conservation of universal domain architectures in all available plant genomes, indicating the presence of an evolutionarily conserved, core set of protein components. However, the occurrence of lineage-specific domain architectures indicates that domain architecture diversity has been maintained beyond these core components in plant genomes. Although several features of genome-wide domain architecture content are conserved in plants, the data clearly demonstrate lineage-wise, progressive changes and expansions of individual protein domain architectures, reinforcing the notion that plant genomes have undergone dynamic evolution.

Collapse