1
|
Kutchy NA, Morenikeji OB, Memili A, Ugur MR. Deciphering sperm functions using biological networks. Biotechnol Genet Eng Rev 2023:1-25. [PMID: 36722689 DOI: 10.1080/02648725.2023.2168912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Indexed: 02/02/2023]
Abstract
The global human population is exponentially increasing, which requires the production of quality food through efficient reproduction as well as sustainable production of livestock. Lack of knowledge and technology for assessing semen quality and predicting bull fertility is hindering advances in animal science and food animal production and causing millions of dollars of economic losses annually. The intent of this systemic review is to summarize methods from computational biology for analysis of gene, metabolite, and protein networks to identify potential markers that can be applied to improve livestock reproduction, with a focus on bull fertility. We provide examples of available gene, metabolic, and protein networks and computational biology methods to show how the interactions between genes, proteins, and metabolites together drive the complex process of spermatogenesis and regulate fertility in animals. We demonstrate the use of the National Center for Biotechnology Information (NCBI) and Ensembl for finding gene sequences, and then use them to create and understand gene, protein and metabolite networks for sperm associated factors to elucidate global cellular processes in sperm. This study highlights the value of mapping complex biological pathways among livestock and potential for conducting studies on promoting livestock improvement for global food security.
Collapse
Affiliation(s)
- Naseer A Kutchy
- Department of Anatomy, Physiology and Pharmacology, School of Veterinary Medicine, St. George's University, St. George's, Grenada
- Department of Animal Sciences, School of Environmental and Biological Sciences Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Olanrewaju B Morenikeji
- Division of Biological and Health Sciences, University of Pittsburgh at Bradford, Bradford, PA, USA
| | - Aylin Memili
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | | |
Collapse
|
2
|
Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN. Cells 2022; 11:cells11172648. [PMID: 36078056 PMCID: PMC9454873 DOI: 10.3390/cells11172648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 11/25/2022] Open
Abstract
Proteins are vital for the significant cellular activities of living organisms. However, not all of them are essential. Identifying essential proteins through different biological experiments is relatively more laborious and time-consuming than the computational approaches used in recent times. However, practical implementation of conventional scientific methods sometimes becomes challenging due to poor performance impact in specific scenarios. Thus, more developed and efficient computational prediction models are required for essential protein identification. An effective methodology is proposed in this research, capable of predicting essential proteins in a refined yeast protein–protein interaction network (PPIN). The rule-based refinement is done using protein complex and local interaction density information derived from the neighborhood properties of proteins in the network. Identification and pruning of non-essential proteins are equally crucial here. In the initial phase, careful assessment is performed by applying node and edge weights to identify and discard the non-essential proteins from the interaction network. Three cut-off levels are considered for each node and edge weight for pruning the non-essential proteins. Once the PPIN has been filtered out, the second phase starts with two centralities-based approaches: (1) local interaction density (LID) and (2) local interaction density with protein complex (LIDC), which are successively implemented to identify the essential proteins in the yeast PPIN. Our proposed methodology achieves better performance in comparison to the existing state-of-the-art techniques.
Collapse
|
3
|
Zheng C, Xu R. Large-scale mining disease comorbidity relationships from post-market drug adverse events surveillance data. BMC Bioinformatics 2018; 19:500. [PMID: 30591027 PMCID: PMC6309066 DOI: 10.1186/s12859-018-2468-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Background Systems approaches in studying disease relationship have wide applications in biomedical discovery, such as disease mechanism understanding and drug discovery. The FDA Adverse Event Reporting System (FAERS) contains rich information about patient diseases, medications, drug adverse events and demographics of 17 million case reports. Here, we explored this data resource to mine disease comorbidity relationships using association rule mining algorithm and constructed a disease comorbidity network. Results We constructed a disease comorbidity network with 1059 disease nodes and 12,608 edges using association rule mining of FAERS (14,157 rules). We evaluated the performance of comorbidity mining from FAERS using known disease comorbidities of multiple sclerosis (MS), psoriasis and obesity that represent rare, moderate and common disease respectively. Comorbidities of MS, obesity and psoriasis obtained from our network achieved precisions of 58.6%, 73.7%, 56.2% and recalls 87.5%, 69.2% and 72.7% separately. We performed comparative analysis of the disease comorbidity network with disease semantic network, disease genetic network and disease treatment network. We showed that (1) disease comorbidity clusters exhibit significantly higher semantic similarity than random network (0.18 vs 0.10); (2) disease comorbidity clusters share significantly more genes (0.46 vs 0.06); and (3) disease comorbidity clusters share significantly more drugs (0.64 vs 0.17). Finally, we demonstrated that the disease comorbidity network has potential in uncovering novel disease relationships using asthma as a case study. Conclusions Our study presented the first comprehensive attempt to build a disease comorbidity network from FDA Adverse Event Reporting System. This network shows well correlated with disease semantic similarity, disease genetics and disease treatment, which has great potential in disease genetics prediction and drug discovery.
Collapse
Affiliation(s)
- Chunlei Zheng
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, 2103 Cornell Road, Cleveland, 44106, OH, USA
| | - Rong Xu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, 2103 Cornell Road, Cleveland, 44106, OH, USA.
| |
Collapse
|
4
|
Zheng C, Xu R. The Alzheimer's comorbidity phenome: mining from a large patient database and phenome-driven genetics prediction. JAMIA Open 2018; 2:131-138. [PMID: 30944912 PMCID: PMC6434979 DOI: 10.1093/jamiaopen/ooy050] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 10/23/2018] [Accepted: 12/05/2018] [Indexed: 01/08/2023] Open
Abstract
Objective Alzheimer’s disease (AD) is a severe neurodegenerative disorder and has become a global public health problem. Intensive research has been conducted for AD. But the pathophysiology of AD is still not elucidated. Disease comorbidity often associates diseases with overlapping patterns of genetic markers. This may inform a common etiology and suggest essential protein targets. US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) collects large-scale postmarketing surveillance data that provide a unique opportunity to investigate disease co-occurrence pattern. We aim to construct a heterogeneous network that integrates disease comorbidity network (DCN) from FAERS with protein–protein interaction (PPI) to prioritize the AD risk genes using network-based ranking algorithm. Materials and Methods We built a DCN based on indication data from FAERS using association rule mining. DCN was further integrated with PPI network. We used random walk with restart ranking algorithm to prioritize AD risk genes. Results We evaluated the performance of our approach using AD risk genes curated from genetic association studies. Our approach achieved an area under a receiver operating characteristic curve of 0.770. Top 500 ranked genes achieved 5.53-fold enrichment for known AD risk genes as compared to random expectation. Pathway enrichment analysis using top-ranked genes revealed that two novel pathways, ERBB and coagulation pathways, might be involved in AD pathogenesis. Conclusion We innovatively leveraged FAERS, a comprehensive data resource for FDA postmarket drug safety surveillance, for large-scale AD comorbidity mining. This exploratory study demonstrated the potential of disease-comorbidities mining from FAERS in AD genetics discovery.
Collapse
Affiliation(s)
- Chunlei Zheng
- Department of Population and Quantitative Health Sciences, Institute of Computational Biology, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Rong Xu
- Department of Population and Quantitative Health Sciences, Institute of Computational Biology, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
5
|
Rouillard AD, Hurle MR, Agarwal P. Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets. PLoS Comput Biol 2018; 14:e1006142. [PMID: 29782487 PMCID: PMC5983857 DOI: 10.1371/journal.pcbi.1006142] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 06/01/2018] [Accepted: 04/13/2018] [Indexed: 11/19/2022] Open
Abstract
Target selection is the first and pivotal step in drug discovery. An incorrect choice may not manifest itself for many years after hundreds of millions of research dollars have been spent. We collected a set of 332 targets that succeeded or failed in phase III clinical trials, and explored whether Omic features describing the target genes could predict clinical success. We obtained features from the recently published comprehensive resource: Harmonizome. Nineteen features appeared to be significantly correlated with phase III clinical trial outcomes, but only 4 passed validation schemes that used bootstrapping or modified permutation tests to assess feature robustness and generalizability while accounting for target class selection bias. We also used classifiers to perform multivariate feature selection and found that classifiers with a single feature performed as well in cross-validation as classifiers with more features (AUROC = 0.57 and AUPR = 0.81). The two predominantly selected features were mean mRNA expression across tissues and standard deviation of expression across tissues, where successful targets tended to have lower mean expression and higher expression variance than failed targets. This finding supports the conventional wisdom that it is favorable for a target to be present in the tissue(s) affected by a disease and absent from other tissues. Overall, our results suggest that it is feasible to construct a model integrating interpretable target features to inform target selection. We anticipate deeper insights and better models in the future, as researchers can reuse the data we have provided to improve methods for handling sample biases and learn more informative features. Code, documentation, and data for this study have been deposited on GitHub at https://github.com/arouillard/omic-features-successful-targets.
Collapse
Affiliation(s)
| | - Mark R. Hurle
- Computational Biology, GSK, Collegeville, PA, United States of America
| | - Pankaj Agarwal
- Computational Biology, GSK, Collegeville, PA, United States of America
| |
Collapse
|
6
|
Guala D, Bernhem K, Blal HA, Jans D, Lundberg E, Brismar H, Sonnhammer ELL. Experimental validation of predicted cancer genes using FRET. Methods Appl Fluoresc 2018; 6:035007. [PMID: 29570091 DOI: 10.1088/2050-6120/aab932] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Huge amounts of data are generated in genome wide experiments, designed to investigate diseases with complex genetic causes. Follow up of all potential leads produced by such experiments is currently cost prohibitive and time consuming. Gene prioritization tools alleviate these constraints by directing further experimental efforts towards the most promising candidate targets. Recently a gene prioritization tool called MaxLink was shown to outperform other widely used state-of-the-art prioritization tools in a large scale in silico benchmark. An experimental validation of predictions made by MaxLink has however been lacking. In this study we used Fluorescence Resonance Energy Transfer, an established experimental technique for detection of protein-protein interactions, to validate potential cancer genes predicted by MaxLink. Our results provide confidence in the use of MaxLink for selection of new targets in the battle with polygenic diseases.
Collapse
Affiliation(s)
- Dimitri Guala
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Box 1031, 17121 Solna, Sweden
| | | | | | | | | | | | | |
Collapse
|
7
|
Sandor C, Beer NL, Webber C. Diverse type 2 diabetes genetic risk factors functionally converge in a phenotype-focused gene network. PLoS Comput Biol 2017; 13:e1005816. [PMID: 29059180 PMCID: PMC5667928 DOI: 10.1371/journal.pcbi.1005816] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2017] [Revised: 11/02/2017] [Accepted: 10/11/2017] [Indexed: 12/14/2022] Open
Abstract
Type 2 Diabetes (T2D) constitutes a global health burden. Efforts to uncover predisposing genetic variation have been considerable, yet detailed knowledge of the underlying pathogenesis remains poor. Here, we constructed a T2D phenotypic-linkage network (T2D-PLN), by integrating diverse gene functional information that highlight genes, which when disrupted in mice, elicit similar T2D-relevant phenotypes. Sensitising the network to T2D-relevant phenotypes enabled significant functional convergence to be detected between genes implicated in monogenic or syndromic diabetes and genes lying within genomic regions associated with T2D common risk. We extended these analyses to a recent multiethnic T2D case-control exome of 12,940 individuals that found no evidence of T2D risk association for rare frequency variants outside of previously known T2D risk loci. Examining associations involving protein-truncating variants (PTV), most at low population frequencies, the T2D-PLN was able to identify a convergent set of biological pathways that were perturbed within four of five independent T2D case/control ethnic sets of 2000 to 5000 exomes each. These same pathways were found to be over-represented among both known monogenic or syndromic diabetes genes and genes within T2D-associated common risk loci. Our study demonstrates convergent biology amongst variants representing different classes of T2D genetic risk. Although convergence was observed at the pathway level, few of the contributing genes were found in common between different cohorts or variant classes, most notably between the exome variant sets which suggests that future rare variant studies may be better focusing their power onto a single population of recent common ancestry.
Collapse
Affiliation(s)
- Cynthia Sandor
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Nicola L. Beer
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Caleb Webber
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
8
|
Identification of oral cancer related candidate genes by integrating protein-protein interactions, gene ontology, pathway analysis and immunohistochemistry. Sci Rep 2017; 7:2472. [PMID: 28559546 PMCID: PMC5449392 DOI: 10.1038/s41598-017-02522-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 04/10/2017] [Indexed: 12/12/2022] Open
Abstract
In the recent years, bioinformatics methods have been reported with a high degree of success for candidate gene identification. In this milieu, we have used an integrated bioinformatics approach assimilating information from gene ontologies (GO), protein–protein interaction (PPI) and network analysis to predict candidate genes related to oral squamous cell carcinoma (OSCC). A total of 40973 PPIs were considered for 4704 cancer-related genes to construct human cancer gene network (HCGN). The importance of each node was measured in HCGN by ten different centrality measures. We have shown that the top ranking genes are related to a significantly higher number of diseases as compared to other genes in HCGN. A total of 39 candidate oral cancer target genes were predicted by combining top ranked genes and the genes corresponding to significantly enriched oral cancer related GO terms. Initial verification using literature and available experimental data indicated that 29 genes were related with OSCC. A detailed pathway analysis led us to propose a role for the selected candidate genes in the invasion and metastasis in OSCC. We further validated our predictions using immunohistochemistry (IHC) and found that the gene FLNA was upregulated while the genes ARRB1 and HTT were downregulated in the OSCC tissue samples.
Collapse
|
9
|
The complex genetics of hypoplastic left heart syndrome. Nat Genet 2017; 49:1152-1159. [PMID: 28530678 DOI: 10.1038/ng.3870] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Accepted: 04/24/2017] [Indexed: 12/11/2022]
Abstract
Congenital heart disease (CHD) affects up to 1% of live births. Although a genetic etiology is indicated by an increased recurrence risk, sporadic occurrence suggests that CHD genetics is complex. Here, we show that hypoplastic left heart syndrome (HLHS), a severe CHD, is multigenic and genetically heterogeneous. Using mouse forward genetics, we report what is, to our knowledge, the first isolation of HLHS mutant mice and identification of genes causing HLHS. Mutations from seven HLHS mouse lines showed multigenic enrichment in ten human chromosome regions linked to HLHS. Mutations in Sap130 and Pcdha9, genes not previously associated with CHD, were validated by CRISPR-Cas9 genome editing in mice as being digenic causes of HLHS. We also identified one subject with HLHS with SAP130 and PCDHA13 mutations. Mouse and zebrafish modeling showed that Sap130 mediates left ventricular hypoplasia, whereas Pcdha9 increases penetrance of aortic valve abnormalities, both signature HLHS defects. These findings show that HLHS can arise genetically in a combinatorial fashion, thus providing a new paradigm for the complex genetics of CHD.
Collapse
|
10
|
Guala D, Sonnhammer ELL. A large-scale benchmark of gene prioritization methods. Sci Rep 2017; 7:46598. [PMID: 28429739 PMCID: PMC5399445 DOI: 10.1038/srep46598] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 03/22/2017] [Indexed: 11/16/2022] Open
Abstract
In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
Collapse
Affiliation(s)
- Dimitri Guala
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
11
|
Abstract
Intellectual disability is the most common developmental disorder characterized by a congenital limitation in intellectual functioning and adaptive behavior. It often co-occurs with other mental conditions like attention deficit/hyperactivity disorder and autism spectrum disorder, and can be part of a malformation syndrome that affects other organs. Considering the heterogeneity of its causes (environmental and genetic), its frequency worldwide varies greatly. This review focuses on known genes underlying (syndromic and non-syndromic) intellectual disability, it provides a succinct analysis of their Gene Ontology, and it suggests the use of transcriptional profiling for the prioritization of candidate genes.
Collapse
Affiliation(s)
- Pietro Chiurazzi
- Institute of Genomic Medicine, Catholic University School of Medicine, Rome, Italy
| | - Filomena Pirozzi
- Department of Genetics and Genome Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| |
Collapse
|
12
|
Luo J, Qi Y. Identification of Essential Proteins Based on a New Combination of Local Interaction Density and Protein Complexes. PLoS One 2015; 10:e0131418. [PMID: 26125187 PMCID: PMC4488326 DOI: 10.1371/journal.pone.0131418] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 06/02/2015] [Indexed: 11/18/2022] Open
Abstract
Background Computational approaches aided by computer science have been used to predict essential proteins and are faster than expensive, time-consuming, laborious experimental approaches. However, the performance of such approaches is still poor, making practical applications of computational approaches difficult in some fields. Hence, the development of more suitable and efficient computing methods is necessary for identification of essential proteins. Method In this paper, we propose a new method for predicting essential proteins in a protein interaction network, local interaction density combined with protein complexes (LIDC), based on statistical analyses of essential proteins and protein complexes. First, we introduce a new local topological centrality, local interaction density (LID), of the yeast PPI network; second, we discuss a new integration strategy for multiple bioinformatics. The LIDC method was then developed through a combination of LID and protein complex information based on our new integration strategy. The purpose of LIDC is discovery of important features of essential proteins with their neighbors in real protein complexes, thereby improving the efficiency of identification. Results Experimental results based on three different PPI(protein-protein interaction) networks of Saccharomyces cerevisiae and Escherichia coli showed that LIDC outperformed classical topological centrality measures and some recent combinational methods. Moreover, when predicting MIPS datasets, the better improvement of performance obtained by LIDC is over all nine reference methods (i.e., DC, BC, NC, LID, PeC, CoEWC, WDC, ION, and UC). Conclusions LIDC is more effective for the prediction of essential proteins than other recently developed methods.
Collapse
Affiliation(s)
- Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- * E-mail:
| | - Yi Qi
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
13
|
Abstract
The challenging task of studying and modeling complex dynamics of biological systems in order to describe various human diseases has gathered great interest in recent years. Major biological processes are mediated through protein interactions, hence there is a need to understand the chaotic network that forms these processes in pursuance of understanding human diseases. The applications of protein interaction networks to disease datasets allow the identification of genes and proteins associated with diseases, the study of network properties, identification of subnetworks, and network-based disease gene classification. Although various protein interaction network analysis strategies have been employed, grand challenges are still existing. Global understanding of protein interaction networks via integration of high-throughput functional genomics data from different levels will allow researchers to examine the disease pathways and identify strategies to control them. As a result, it seems likely that more personalized, more accurate and more rapid disease gene diagnostic techniques will be devised in the future, as well as novel strategies that are more personalized. This mini-review summarizes the current practice of protein interaction networks in medical research as well as challenges to be overcome.
Collapse
Affiliation(s)
- Tuba Sevimoglu
- Department of Bioengineering, Marmara University, Goztepe, 34722 Istanbul, Turkey
| | - Kazim Yalcin Arga
- Department of Bioengineering, Marmara University, Goztepe, 34722 Istanbul, Turkey
| |
Collapse
|