1
|
Jahangir M, Nazari M, Babakhanzadeh E, Manshadi SD. Where do obesity and male infertility collide? BMC Med Genomics 2024; 17:128. [PMID: 38730451 PMCID: PMC11088066 DOI: 10.1186/s12920-024-01897-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/30/2024] [Indexed: 05/12/2024] Open
Abstract
The parallel rise in obesity and male infertility in modern societies necessitates the identification of susceptibility genes underlying these interconnected health issues. In our study, we conducted a comprehensive search in the OMIM database to identify genes commonly associated with male infertility and obesity. Subsequently, we performed an insilico analysis using the REVEL algorithm to detect pathogenic single nucleotide polymorphisms (SNPs) in the coding region of these candidate genes. To validate our findings in vivo, we conducted a comprehensive analysis of SNPs and gene expression of candidate genes in 200 obese infertile subjects and 240 obese fertile individuals using ARMS-PCR. Additionally, we analyzed 20 fertile and 22 infertile obese individuals using Realtime-qPCR. By removing duplicated queries, we obtained 197 obesity-related genes and 102 male infertility-related genes from the OMIM database. Interestingly, the APOB gene was found in common between the two datasets. REVEL identified the rs13306194 variant as potentially pathogenic with a calculated score of 0.524. The study identified a significant association between the AA (P value = 0.001) genotype and A allele (P value = 0.003) of the APOB rs13306194 variant and infertility in obese men. APOB expression levels were significantly lower in obese infertile men compared to obese fertile controls (p < 0.01). Moreover, the AA genotype of rs13306194 APOB was associated with a significant decrease in APOB gene expression in obese infertile men (p = 0.05). There is a significant association between the Waist-to-Hip Ratio (WHR) and LH with infertility in the obese infertile group. These results are likely to contribute to a better understanding of the causes of male infertility and its association with obesity.
Collapse
Affiliation(s)
- Melika Jahangir
- Department of Pharmacy, Tehran University of Medical Sciences, P.O. Box: 64155-65117, Tehran, Iran
| | - Majid Nazari
- Department of Medical Genetics, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| | - Emad Babakhanzadeh
- Department of Medical Genetics, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | | |
Collapse
|
2
|
Karaca C, Demir Karaman E, Leblebici A, Kurter H, Ellidokuz H, Koc A, Ellidokuz EB, Isik Z, Basbinar Y. New treatment alternatives for primary and metastatic colorectal cancer by an integrated transcriptome and network analyses. Sci Rep 2024; 14:8762. [PMID: 38627442 PMCID: PMC11021540 DOI: 10.1038/s41598-024-59101-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 04/08/2024] [Indexed: 04/19/2024] Open
Abstract
Metastatic colorectal cancer (CRC) is still in need of effective treatments. This study applies a holistic approach to propose new targets for treatment of primary and liver metastatic CRC and investigates their therapeutic potential in-vitro. An integrative analysis of primary and metastatic CRC samples was implemented for alternative target and treatment proposals. Integrated microarray samples were grouped based on a co-expression network analysis. Significant gene modules correlated with primary CRC and metastatic phenotypes were identified. Network clustering and pathway enrichments were applied to gene modules to prioritize potential targets, which were shortlisted by independent validation. Finally, drug-target interaction search led to three agents for primary and liver metastatic CRC phenotypes. Hesperadin and BAY-1217389 suppress colony formation over a 14-day period, with Hesperadin showing additional efficacy in reducing cell viability within 48 h. As both candidates target the G2/M phase proteins NEK2 or TTK, we confirmed their anti-proliferative properties by Ki-67 staining. Hesperadinin particular arrested the cell cycle at the G2/M phase. IL-29A treatment reduced migration and invasion capacities of TGF-β induced metastatic cell lines. In addition, this anti-metastatic treatment attenuated TGF-β dependent mesenchymal transition. Network analysis suggests IL-29A induces the JAK/STAT pathway in a preventive manner.
Collapse
Affiliation(s)
- Caner Karaca
- Department of Translational Oncology, Institute of Health Sciences, Dokuz Eylul University, Izmir, Turkey
| | - Ezgi Demir Karaman
- Department of Computer Engineering, Faculty of Engineering, Dokuz Eylul University, Izmir, Turkey
| | - Asim Leblebici
- Department of Translational Oncology, Institute of Health Sciences, Dokuz Eylul University, Izmir, Turkey
| | - Hasan Kurter
- Department of Translational Oncology, Institute of Health Sciences, Dokuz Eylul University, Izmir, Turkey
| | - Hulya Ellidokuz
- Department of Preventive Oncology, Institute of Oncology, Dokuz Eylul University, Izmir, Turkey
| | - Altug Koc
- Department of Translational Oncology, Institute of Health Sciences, Dokuz Eylul University, Izmir, Turkey
| | - Ender Berat Ellidokuz
- Department of Gastroenterology, Faculty of Medicine, Dokuz Eylul University, Izmir, Turkey
| | - Zerrin Isik
- Department of Computer Engineering, Faculty of Engineering, Dokuz Eylul University, Izmir, Turkey.
| | - Yasemin Basbinar
- Department of Translational Oncology, Institute of Oncology, Dokuz Eylul University, Izmir, Turkey.
| |
Collapse
|
3
|
Molotkov I, Artomov M. Detecting biased validation of predictive models in the positive-unlabeled setting: disease gene prioritization case study. BIOINFORMATICS ADVANCES 2023; 3:vbad128. [PMID: 37745001 PMCID: PMC10517638 DOI: 10.1093/bioadv/vbad128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/13/2023] [Accepted: 09/12/2023] [Indexed: 09/26/2023]
Abstract
Motivation Positive-unlabeled data consists of points with either positive or unknown labels. It is widespread in medical, genetic, and biological settings, creating a high demand for predictive positive-unlabeled models. The performance of such models is usually estimated using validation sets, assumed to be selected completely at random (SCAR) from known positive examples. For certain metrics, this assumption enables unbiased performance estimation when treating positive-unlabeled data as positive/negative. However, the SCAR assumption is often adopted without proper justifications, simply for the sake of convenience. Results We provide an algorithm that under the weak assumptions of a lower bound on the number of positive examples can test for the violation of the SCAR assumption. Applying it to the problem of gene prioritization for complex genetic traits, we illustrate that the SCAR assumption is often violated there, causing the inflation of performance estimates, which we refer to as validation bias. We estimate the potential impact of validation bias on performance estimation. Our analysis reveals that validation bias is widespread in gene prioritization data and can significantly overestimate the performance of models. This finding elucidates the discrepancy between the reported good performance of models and their limited practical applications. Availability and implementation Python code with examples of application of the validation bias detection algorithm is available at github.com/ArtomovLab/ValidationBias.
Collapse
Affiliation(s)
- Ivan Molotkov
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH, United States
- Department of Pediatrics, The Ohio State University, Columbus, OH, United States
- ITMO University, Saint Petersburg, Russia
| | - Mykyta Artomov
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH, United States
- Department of Pediatrics, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
4
|
Cüvitoğlu A, Isik Z. Network neighborhood operates as a drug repositioning method for cancer treatment. PeerJ 2023; 11:e15624. [PMID: 37456868 PMCID: PMC10340098 DOI: 10.7717/peerj.15624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 06/01/2023] [Indexed: 07/18/2023] Open
Abstract
Computational drug repositioning approaches are important, as they cost less compared to the traditional drug development processes. This study proposes a novel network-based drug repositioning approach, which computes similarities between disease-causing genes and drug-affected genes in a network topology to suggest candidate drugs with highest similarity scores. This new method aims to identify better treatment options by integrating systems biology approaches. It uses a protein-protein interaction network that is the main topology to compute a similarity score between candidate drugs and disease-causing genes. The disease-causing genes were mapped on this network structure. Transcriptome profiles of drug candidates were taken from the LINCS project and mapped individually on the network structure. The similarity of these two networks was calculated by different network neighborhood metrics, including Adamic-Adar, PageRank and neighborhood scoring. The proposed approach identifies the best candidates by choosing the drugs with significant similarity scores. The method was experimented on melanoma, colorectal, and prostate cancers. Several candidate drugs were predicted by applying AUC values of 0.6 or higher. Some of the predictions were approved by clinical phase trials or other in-vivo studies found in literature. The proposed drug repositioning approach would suggest better treatment options with integration of functional information between genes and transcriptome level effects of drug perturbations and diseases.
Collapse
Affiliation(s)
- Ali Cüvitoğlu
- The Graduate School of Natural and Applied Sciences, Dokuz Eylül University, Izmir, Turkiye
| | - Zerrin Isik
- Computer Engineering Department, Engineering Faculty, Dokuz Eylül University, Izmir, Turkiye
| |
Collapse
|
5
|
Ünsal Ü, Cüvitoğlu A, Turhan K, Işık Z. NMSDR: Drug repurposing approach based on transcriptome data and network module similarity. Mol Inform 2023; 42:e2200077. [PMID: 36411244 DOI: 10.1002/minf.202200077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 09/19/2022] [Accepted: 11/21/2022] [Indexed: 11/23/2022]
Abstract
Computational drug repurposing aims to discover new treatment regimens by analyzing approved drugs on the market. This study proposes previously approved compounds that can change the expression profile of disease-causing proteins by developing a network theory-based drug repurposing approach. The novelty of the proposed approach is an exploration of module similarity between a disease-causing network and a compound-specific interaction network; thus, such an association leads to more realistic modeling of molecular cell responses at a system biology level. The overlap of the disease network and each compound-specific network is calculated based on a shortest-path similarity of networks by accounting for all protein pairs between networks. A higher similarity score indicates a significant potential of a compound. The approach was validated for breast and lung cancers. When all compounds are sorted by their normalized-similarity scores, 36 and 16 drugs are proposed as new candidates for breast and lung cancer treatment, respectively. A literature survey on candidate compounds revealed that some of our predictions have been clinically investigated in phase II/III trials for the treatment of two cancer types. As a summary, the proposed approach has provided promising initial results by modeling biochemical cell responses in a network-level data representation.
Collapse
Affiliation(s)
- Ülkü Ünsal
- Department of Biostatistics and Medical Informatics, Karadeniz Technical University, 61080, Trabzon, Türkiye.,Department of Health Management, Karadeniz Technical University, 61080, Trabzon, Türkiye
| | - Ali Cüvitoğlu
- Department of Computer Engineering, Dokuz Eylul University, 35390, İzmir, Türkiye
| | - Kemal Turhan
- Department of Biostatistics and Medical Informatics, Karadeniz Technical University, 61080, Trabzon, Türkiye
| | - Zerrin Işık
- Department of Computer Engineering, Dokuz Eylul University, 35390, İzmir, Türkiye
| |
Collapse
|
6
|
Wu BS, Zhang YR, Yang L, Zhang W, Deng YT, Chen SD, Feng JF, Cheng W, Yu JT. Polygenic Liability to Alzheimer's Disease Is Associated with a Wide Range of Chronic Diseases: A Cohort Study of 312,305 Participants. J Alzheimers Dis 2023; 91:437-447. [PMID: 36442194 DOI: 10.3233/jad-220740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
BACKGROUND Alzheimer's disease (AD) patients rank among the highest levels of comorbidities compared to persons with other diseases. However, it is unclear whether the conditions are caused by shared pathophysiology due to the genetic pleiotropy for AD risk genes. OBJECTIVE To figure out the genetic pleiotropy for AD risk genes in a wide range of diseases. METHODS We estimated the polygenic risk score (PRS) for AD and tested the association between PRS and 16 ICD10 main chapters, 136 ICD10 level-1 chapters, and 377 diseases with cases more than 1,000 in 312,305 individuals without AD diagnosis from the UK Biobank. RESULTS After correction for multiple testing, AD PRS was associated with two main ICD10 chapters: Chapter IV (endocrine, nutritional and metabolic diseases) and Chapter VII (eye and adnexa disorders). When narrowing the definition of the phenotypes, positive associations were observed between AD PRS and other types of dementia (OR = 1.39, 95% CI [1.34, 1.45], p = 1.96E-59) and other degenerative diseases of the nervous system (OR = 1.18, 95% CI [1.13, 1.24], p = 7.74E-10). In contrast, we detected negative associations between AD PRS and diabetes mellitus, obesity, chronic bronchitis, other retinal disorders, pancreas diseases, and cholecystitis without cholelithiasis (ORs range from 0.94 to 0.97, FDR < 0.05). CONCLUSION Our study confirms several associations reported previously and finds some novel results, which extends the knowledge of genetic pleiotropy for AD in a range of diseases. Further mechanistic studies are necessary to illustrate the molecular mechanisms behind these associations.
Collapse
Affiliation(s)
- Bang-Sheng Wu
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| | - Ya-Ru Zhang
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| | - Liu Yang
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| | - Wei Zhang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Yue-Ting Deng
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| | - Shi-Dong Chen
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| | - Jian-Feng Feng
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Department of Computer Science, University of Warwick, Coventry, UK
| | - Wei Cheng
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Ministry of Education, Shanghai, China
- Fudan ISTBI-ZJNU Algorithm Centre for Brain-Inspired Intelligence, Zhejiang Normal University, Jinhua, China
| | - Jin-Tai Yu
- Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China
| |
Collapse
|
7
|
Functional regulations between genetic alteration-driven genes and drug target genes acting as prognostic biomarkers in breast cancer. Sci Rep 2022; 12:10641. [PMID: 35739271 PMCID: PMC9226112 DOI: 10.1038/s41598-022-13835-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 05/30/2022] [Indexed: 12/19/2022] Open
Abstract
Differences in genetic molecular features including mutation, copy number alterations and DNA methylation, can explain interindividual variability in response to anti-cancer drugs in cancer patients. However, identifying genetic alteration-driven genes and characterizing their functional mechanisms in different cancer types are still major challenges for cancer studies. Here, we systematically identified functional regulations between genetic alteration-driven genes and drug target genes and their potential prognostic roles in breast cancer. We identified two mutation and copy number-driven gene pairs (PARP1-ACSL1 and PARP1-SRD5A3), three DNA methylation-driven gene pairs (PRLR-CDKN1C, PRLR-PODXL2 and PRLR-SRD5A3), six gene pairs between mutation-driven genes and drug target genes (SLC19A1-SLC47A2, SLC19A1-SRD5A3, AKR1C3-SLC19A1, ABCB1-SRD5A3, NR3C2-SRD5A3 and AKR1C3-SRD5A3), and four copy number-driven gene pairs (ADIPOR2-SRD5A3, CASP12-SRD5A3, SLC39A11-SRD5A3 and GALNT2-SRD5A3) that all served as prognostic biomarkers of breast cancer. In particular, RARP1 was found to be upregulated by simultaneous copy number amplification and gene mutation. Copy number deletion and downregulated expression of ACSL1 and upregulation of SRD5A3 both were observed in breast cancers. Moreover, copy number deletion of ACSL1 was associated with increased resistance to PARP inhibitors. PARP1-ACSL1 pair significantly correlated with poor overall survival in breast cancer owing to the suppression of the MAPK, mTOR and NF-kB signaling pathways, which induces apoptosis, autophagy and prevents inflammatory processes. Loss of SRD5A3 expression was also associated with increased sensitivity to PARP inhibitors. The PARP1-SRD5A3 pair significantly correlated with poor overall survival in breast cancer through regulating androgen receptors to induce cell proliferation. These results demonstrate that genetic alteration-driven gene pairs might serve as potential biomarkers for the prognosis of breast cancer and facilitate the identification of combination therapeutic targets for breast cancers.
Collapse
|
8
|
Frederiksen SD. Prioritizing Suggestive Candidate Genes in Migraine: An Opinion. Front Neurol 2022; 13:910366. [PMID: 35785356 PMCID: PMC9240222 DOI: 10.3389/fneur.2022.910366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/19/2022] [Indexed: 11/13/2022] Open
|
9
|
Isik Z, Leblebici A, Demir Karaman E, Karaca C, Ellidokuz H, Koc A, Ellidokuz EB, Basbinar Y. In silico identification of novel biomarkers for key players in transition from normal colon tissue to adenomatous polyps. PLoS One 2022; 17:e0267973. [PMID: 35486660 PMCID: PMC9053805 DOI: 10.1371/journal.pone.0267973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 04/19/2022] [Indexed: 11/18/2022] Open
Abstract
Adenomatous polyps of the colon are the most common neoplastic polyps. Although most of adenomatous polyps do not show malign transformation, majority of colorectal carcinomas originate from neoplastic polyps. Therefore, understanding of this transformation process would help in both preventive therapies and evaluation of malignancy risks. This study uncovers alterations in gene expressions as potential biomarkers that are revealed by integration of several network-based approaches. In silico analysis performed on a unified microarray cohort, which is covering 150 normal colon and adenomatous polyp samples. Significant gene modules were obtained by a weighted gene co-expression network analysis. Gene modules with similar profiles were mapped to a colon tissue specific functional interaction network. Several clustering algorithms run on the colon-specific network and the most significant sub-modules between the clusters were identified. The biomarkers were selected by filtering differentially expressed genes which also involve in significant biological processes and pathways. Biomarkers were also validated on two independent datasets based on their differential gene expressions. To the best of our knowledge, such a cascaded network analysis pipeline was implemented for the first time on a large collection of normal colon and polyp samples. We identified significant increases in TLR4 and MSX1 expressions as well as decrease in chemokine profiles with mostly pro-tumoral activities. These biomarkers might appear as both preventive targets and biomarkers for risk evaluation. As a result, this research proposes novel molecular markers that might be alternative to endoscopic approaches for diagnosis of adenomatous polyps.
Collapse
Affiliation(s)
- Zerrin Isik
- Faculty of Engineering, Department of Computer Engineering, Dokuz Eylul University, Izmir, Turkey
| | - Asım Leblebici
- Department of Translational Oncology, Institute of Health Sciences, Dokuz Eylul University, Izmir, Turkey
| | - Ezgi Demir Karaman
- Department of Computer Engineering, Institute of Natural and Applied Sciences, Dokuz Eylul University, Izmir, Turkey
| | - Caner Karaca
- Department of Translational Oncology, Institute of Health Sciences, Dokuz Eylul University, Izmir, Turkey
| | - Hulya Ellidokuz
- Department of Preventive Oncology, Institute of Oncology, Dokuz Eylul University, Izmir, Turkey
| | - Altug Koc
- Gentan Genetic Medical Genetics Diagnosis Center, Izmir, Turkey
| | - Ender Berat Ellidokuz
- Faculty of Medicine, Department of Gastroenterology, Dokuz Eylul University, Izmir, Turkey
| | - Yasemin Basbinar
- Department of Translational Oncology, Institute of Oncology, Dokuz Eylul University, Izmir, Turkey
| |
Collapse
|
10
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
11
|
López-Sánchez M, Loucera C, Peña-Chilet M, Dopazo J. Discovering potential interactions between rare diseases and COVID-19 by combining mechanistic models of viral infection with statistical modeling. Hum Mol Genet 2022; 31:2078-2089. [PMID: 35022696 PMCID: PMC9239744 DOI: 10.1093/hmg/ddac007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 12/30/2021] [Accepted: 01/10/2022] [Indexed: 11/28/2022] Open
Abstract
Recent studies have demonstrated a relevant role of the host genetics in the coronavirus disease 2019 (COVID-19) prognosis. Most of the 7000 rare diseases described to date have a genetic component, typically highly penetrant. However, this vast spectrum of genetic variability remains yet unexplored with respect to possible interactions with COVID-19. Here, a mathematical mechanistic model of the COVID-19 molecular disease mechanism has been used to detect potential interactions between rare disease genes and the COVID-19 infection process and downstream consequences. Out of the 2518 disease genes analyzed, causative of 3854 rare diseases, a total of 254 genes have a direct effect on the COVID-19 molecular disease mechanism and 207 have an indirect effect revealed by a significant strong correlation. This remarkable potential of interaction occurs for >300 rare diseases. Mechanistic modeling of COVID-19 disease map has allowed a holistic systematic analysis of the potential interactions between the loss of function in known rare disease genes and the pathological consequences of COVID-19 infection. The results identify links between disease genes and COVID-19 hallmarks and demonstrate the usefulness of the proposed approach for future preventive measures in some rare diseases.
Collapse
Affiliation(s)
- Macarena López-Sánchez
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio. 41013. Sevilla. Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio. 41013. Sevilla. Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio. 41013. Sevilla. Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio. 41013. Sevilla. Spain
| | - María Peña-Chilet
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio. 41013. Sevilla. Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio. 41013. Sevilla. Spain.,Bioinformatics in Rare Diseases (BiER). Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío. 41013. Sevilla, Spain
| | - Joaquín Dopazo
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio. 41013. Sevilla. Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio. 41013. Sevilla. Spain.,Bioinformatics in Rare Diseases (BiER). Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío. 41013. Sevilla, Spain.,FPS/ELIXIR-es, Hospital Virgen del Rocío, Sevilla, 42013, Spain
| |
Collapse
|
12
|
Su Y, Chen X, Zhou H, Shaw S, Chen J, Isales CM, Zhao J, Shi X. Expression of long noncoding RNA Xist is induced by glucocorticoids. Front Endocrinol (Lausanne) 2022; 13:1005944. [PMID: 36187119 PMCID: PMC9516292 DOI: 10.3389/fendo.2022.1005944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 08/29/2022] [Indexed: 11/19/2022] Open
Abstract
Glucocorticoids (GCs) are potent anti-inflammatory and immunosuppressive agents. However, their clinical usage is limited by severe multisystemic side effects. Glucocorticoid induced osteoporosis results in significant morbidity and mortality but the cellular and molecular mechanisms underlying GC-induced bone loss are not clear. GC use results in decreased osteoblast differentiation with increased marrow adiposity through effects on bone marrow stem cells. GC effects are transduced through its receptor (GR). To identify novel GR regulated genes, we performed RNA sequencing (RNA-Seq) analysis comparing conditional GR knockout mouse made by crossing the floxed GR animal with the Col I promoter-Cre, versus normal floxed GR without Cre, and that testing was specific for Col I promoter active cells, such as bone marrow mesenchymal stem/osteoprogenitor cells (MSCs) and osteoblasts. Results showed 15 upregulated genes (3- to 10-fold) and 70 downregulated genes (-2.7- to -10-fold), with the long noncoding RNA X-inactive specific transcript (Xist) downregulated the most. The differential expression of genes measured by RNA-Seq was validated by qRT-PCR analysis of selected genes and the GC/GR signaling-dependent expression of Xist was further demonstrated by GC (dexamethasone) treatment of GR-deficient MSCs in vitro and by GC injection of C57BL/6 mice (wild-type males and females) in vivo. Our data revealed that the long noncoding RNA Xist is a GR regulated gene and its expression is induced by GC both in vitro and in vivo. To our knowledge, this is the first evidence showing that Xist is transcriptionally regulated by GC/GR signaling.
Collapse
Affiliation(s)
- Yun Su
- Department of Neuroscience & Regenerative Medicine, Augusta University, Augusta, GA, United States
| | - Xing Chen
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Hongyan Zhou
- Department of Neuroscience & Regenerative Medicine, Augusta University, Augusta, GA, United States
- Department of Pathology and Pathophysiology, School of Medicine, Jianghan University, Wuhan, China
| | - Sean Shaw
- Department of Neuroscience & Regenerative Medicine, Augusta University, Augusta, GA, United States
| | - Jie Chen
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Augusta University, Augusta, GA, United States
| | - Carlos M. Isales
- Department of Neuroscience & Regenerative Medicine, Augusta University, Augusta, GA, United States
- Department of Orthopaedic Surgery, Augusta University, Augusta, GA, United States
| | - Jing Zhao
- Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Xingming Shi
- Department of Neuroscience & Regenerative Medicine, Augusta University, Augusta, GA, United States
- Department of Orthopaedic Surgery, Augusta University, Augusta, GA, United States
- *Correspondence: Xingming Shi,
| |
Collapse
|
13
|
Shu J, Li Y, Wang S, Xi B, Ma J. Disease gene prediction with privileged information and heteroscedastic dropout. Bioinformatics 2021; 37:i410-i417. [PMID: 34252957 PMCID: PMC8275341 DOI: 10.1093/bioinformatics/btab310] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. Results In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. Availability and implementation Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.
Collapse
Affiliation(s)
- Juan Shu
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of HongKong, HongKong 999077, China
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Bowei Xi
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Jianzhu Ma
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| |
Collapse
|
14
|
Paliwal S, de Giorgio A, Neil D, Michel JB, Lacoste AM. Preclinical validation of therapeutic targets predicted by tensor factorization on heterogeneous graphs. Sci Rep 2020; 10:18250. [PMID: 33106501 PMCID: PMC7589557 DOI: 10.1038/s41598-020-74922-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 09/30/2020] [Indexed: 12/04/2022] Open
Abstract
Incorrect drug target identification is a major obstacle in drug discovery. Only 15% of drugs advance from Phase II to approval, with ineffective targets accounting for over 50% of these failures1-3. Advances in data fusion and computational modeling have independently progressed towards addressing this issue. Here, we capitalize on both these approaches with Rosalind, a comprehensive gene prioritization method that combines heterogeneous knowledge graph construction with relational inference via tensor factorization to accurately predict disease-gene links. Rosalind demonstrates an increase in performance of 18%-50% over five comparable state-of-the-art algorithms. On historical data, Rosalind prospectively identifies 1 in 4 therapeutic relationships eventually proven true. Beyond efficacy, Rosalind is able to accurately predict clinical trial successes (75% recall at rank 200) and distinguish likely failures (74% recall at rank 200). Lastly, Rosalind predictions were experimentally tested in a patient-derived in-vitro assay for Rheumatoid arthritis (RA), which yielded 5 promising genes, one of which is unexplored in RA.
Collapse
Affiliation(s)
- Saee Paliwal
- BenevolentAI, 1 Dock72 Way, 7th Floor, Brooklyn, NY, 11205, USA.
| | - Alex de Giorgio
- BenevolentAI, 4-6 Maple Street, Bloomsbury, London, W1T5HD, UK
| | - Daniel Neil
- BenevolentAI, 1 Dock72 Way, 7th Floor, Brooklyn, NY, 11205, USA
| | | | - Alix Mb Lacoste
- BenevolentAI, 1 Dock72 Way, 7th Floor, Brooklyn, NY, 11205, USA
| |
Collapse
|
15
|
Peng C, Zheng Y, Huang DS. Capsule Network Based Modeling of Multi-omics Data for Discovery of Breast Cancer-Related Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1605-1612. [PMID: 30969931 DOI: 10.1109/tcbb.2019.2909905] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Breast cancer is one of the most common cancers all over the world, which bring about more than 450,000 deaths each year. Although this malignancy has been extensively studied by a large number of researchers, its prognosis is still poor. Since therapeutic advance can be obtained based on gene signatures, there is an urgent need to discover genes related to breast cancer that may help uncover the mechanisms in cancer progression. We propose a deep learning method for the discovery of breast cancer-related genes by using Capsule Network based Modeling of Multi-omics Data (CapsNetMMD). In CapsNetMMD, we make use of known breast cancer-related genes to transform the issue of gene identification into the issue of supervised classification. The features of genes are generated through comprehensive integration of multi-omics data, e.g., mRNA expression, z scores for mRNA expression, DNA methylation, and two forms of DNA copy-number alterations (CNAs). By modeling features based on the capsule network, we identify breast cancer-related genes with a significantly better performance than other existing machine learning methods. The predicted genes with prognostic values play potential important roles in breast cancer and may serve as candidates for biologists and medical scientists in the future studies of biomarkers.
Collapse
|
16
|
Yang K, Wang R, Liu G, Shu Z, Wang N, Zhang R, Yu J, Chen J, Li X, Zhou X. HerGePred: Heterogeneous Network Embedding Representation for Disease Gene Prediction. IEEE J Biomed Health Inform 2020; 23:1805-1815. [PMID: 31283472 DOI: 10.1109/jbhi.2018.2870728] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The discovery of disease-causing genes is a critical step towards understanding the nature of a disease and determining a possible cure for it. In recent years, many computational methods to identify disease genes have been proposed. However, making full use of disease-related (e.g., symptoms) and gene-related (e.g., gene ontology and protein-protein interactions) information to improve the performance of disease gene prediction is still an issue. Here, we develop a heterogeneous disease-gene-related network (HDGN) embedding representation framework for disease gene prediction (called HerGePred). Based on this framework, a low-dimensional vector representation (LVR) of the nodes in the HDGN can be obtained. Then, we propose two specific algorithms, namely, an LVR-based similarity prediction and a random walk with restart on a reconstructed heterogeneous disease-gene network (RW-RDGN), to predict disease genes with high performance. First, to validate the rationality of the framework, we analyze the similarity-based overlap distribution of disease pairs and design an experiment for disease-gene association recovery, the results of which revealed that the LVR of nodes performs well at preserving the local and global network structure of the HDGN. Then, we apply tenfold cross validation and external validation to compare our methods with other well-known disease gene prediction algorithms. The experimental results show that the RW-RDGN performs better than the state-of-the-art algorithm. The prediction results of disease candidate genes are essential for molecular mechanism investigation and experimental validation. The source codes of HerGePred and experimental data are available at https://github.com/yangkuoone/HerGePred.
Collapse
|
17
|
Yao Y, Ramsey SA. CERENKOV3: Clustering and molecular network-derived features improve computational prediction of functional noncoding SNPs. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:535-546. [PMID: 31797625 PMCID: PMC6897322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Identification of causal noncoding single nucleotide polymorphisms (SNPs) is important for maximizing the knowledge dividend from human genome-wide association studies (GWAS). Recently, diverse machine learning-based methods have been used for functional SNP identification; however, this task remains a fundamental challenge in computational biology. We report CERENKOV3, a machine learning pipeline that leverages clustering-derived and molecular network-derived features to improve prediction accuracy of regulatory SNPs (rSNPs) in the context of post-GWAS analysis. The clustering-derived feature, locus size (number of SNPs in the locus), derives from our locus partitioning procedure and represents the sizes of clusters based on SNP locations. We generated two molecular network-derived features from representation learning on a network representing SNP-gene and gene-gene relations. Based on empirical studies using a ground-truth SNP dataset, CERENKOV3 significantly improves rSNP recognition performance in AUPRC, AUROC, and AVGRANK (a locus-wise rank-based measure of classification accuracy we previously proposed).
Collapse
Affiliation(s)
- Yao Yao
- School of Electrical Engineering and Computer Science, Oregon State University
| | - Stephen A. Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University,Department of Biomedical Sciences, Oregon State University Corvallis, OR 97330, USA
| |
Collapse
|
18
|
Iourov IY, Vorsanova SG, Yurov YB. The variome concept: focus on CNVariome. Mol Cytogenet 2019; 12:52. [PMID: 31890032 PMCID: PMC6924070 DOI: 10.1186/s13039-019-0467-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 12/13/2019] [Indexed: 02/07/2023] Open
Abstract
Background Variome may be used for designating complex system of interplay between genomic variations specific for an individual or a disease. Despite the recognized complexity of genomic basis for phenotypic traits and diseases, studies of genetic causes of a disease are usually dedicated to the identification of single causative genomic changes (mutations). When such an artificially simplified model is employed, genomic basis of phenotypic outcomes remains elusive in the overwhelming majority of human diseases. Moreover, it is repeatedly demonstrated that multiple genomic changes within an individual genome are likely to underlie the phenome. Probably the best example of cumulative effect of variome on the phenotype is CNV (copy number variation) burden. Accordingly, we have proposed a variome concept based on CNV studies providing the evidence for the existence of a CNVariome (the set of CNV affecting an individual genome), a target for genomic analyses useful for unraveling genetic mechanisms of diseases and phenotypic traits. Conclusion Variome (CNVariome) concept suggests that a genomic milieu is determined by the whole set of genomic variations (CNV) within an individual genome. The genomic milieu is likely to result from interplay between these variations. Furthermore, such kind of variome may be either individual or disease-specific. Additionally, such variome may be pathway-specific. The latter is able to affect molecular/cellular pathways of genome stability maintenance leading to occurrence of genomic/chromosome instability and/or somatic mosaicism resulting in somatic variome. This variome type seems to be important for unraveling disease mechanisms, as well. Finally, it appears that bioinformatic analysis of both individual and somatic variomes in the context of diseases- and pathway-specific variomes is the most promising way to determine genomic basis of the phenome and to unravel disease mechanisms for the management and treatment of currently incurable diseases.
Collapse
Affiliation(s)
- Ivan Y Iourov
- Yurov's Laboratory of Molecular Genetics and Cytogenomics of the Brain, Mental Health Research Center, 117152 Moscow, Russia.,2Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Ministry of Health of Russian Federation, 125412 Moscow, Russia
| | - Svetlana G Vorsanova
- Yurov's Laboratory of Molecular Genetics and Cytogenomics of the Brain, Mental Health Research Center, 117152 Moscow, Russia.,2Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Ministry of Health of Russian Federation, 125412 Moscow, Russia
| | - Yuri B Yurov
- Yurov's Laboratory of Molecular Genetics and Cytogenomics of the Brain, Mental Health Research Center, 117152 Moscow, Russia.,2Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Ministry of Health of Russian Federation, 125412 Moscow, Russia
| |
Collapse
|
19
|
Zhang HL, Long JW, Han W, Wang J, Song W, Lin GN, Yin DM. Comparative analysis of cellular expression pattern of schizophrenia risk genes in human versus mouse cortex. Cell Biosci 2019; 9:89. [PMID: 31700606 PMCID: PMC6829839 DOI: 10.1186/s13578-019-0352-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 10/23/2019] [Indexed: 01/07/2023] Open
Abstract
Background Schizophrenia is a common psychiatric disease with high hereditary. The identification of schizophrenia risk genes (SRG) has shed light on its pathophysiological mechanisms. Mouse genetic models have been widely used to study the function of SRG in the brain with a cell type specific fashion. However, whether the cellular expression pattern of SRG is conserved between human and mouse brain is not thoroughly studied. Results We analyzed the single-cell transcription of 180 SRG from human and mouse primary visual cortex (V1). We compared the percentage of glutamatergic, GABAergic and non-neuronal cells that express each SRG between mouse and human V1 cortex. Thirty percent (54/180) of SRG had significantly different expression rate in glutamatergic neurons between mouse and human V1 cortex. By contrast, only 5.6% (10/180) of SRG showed significantly different expression in GABAergic neurons, which is similar with the ratio of SRG (15/180) with species difference in total cell populations. Strikingly, the percentage of non-neuronal cells expressing all SRG are indistinguishable between human and mouse V1 cortex. We further analyzed the biological significance of differentially expressed SRG by gene ontology. The species-different SRG in glutamatergic neurons are highly expressed in dendrite and axon. They are enriched in the biological process of response to stimulus. However, the differentially expressed SRG in GABAergic neurons are enriched in the regulation of organelle organization. Conclusion GABAergic neurons are more conserved in the expression of SRG than glutamatergic neurons while the non-neuronal cells show the species conservation for the expression of all SRG. It should be cautious to use mouse models to study those SRG which show different cellular expression pattern between human and mouse cortex.
Collapse
Affiliation(s)
- Hai-Long Zhang
- 1Key Laboratory of Brain Functional Genomics, Ministry of Education and Shanghai, School of Life Science, East China Normal University, Shanghai, 200062 China
| | - Jia-Wen Long
- 1Key Laboratory of Brain Functional Genomics, Ministry of Education and Shanghai, School of Life Science, East China Normal University, Shanghai, 200062 China
| | - Wei Han
- 1Key Laboratory of Brain Functional Genomics, Ministry of Education and Shanghai, School of Life Science, East China Normal University, Shanghai, 200062 China
| | - Jiuzhou Wang
- 2Department of Mathematics, Southern University of Science and Technology, Shenzhen, China
| | - Weichen Song
- 3Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Guan Ning Lin
- 4School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Min Yin
- 1Key Laboratory of Brain Functional Genomics, Ministry of Education and Shanghai, School of Life Science, East China Normal University, Shanghai, 200062 China
| |
Collapse
|
20
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 222] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
21
|
Chagoyen M, Ranea JAG, Pazos F. Applications of molecular networks in biomedicine. Biol Methods Protoc 2019; 4:bpz012. [PMID: 32395629 PMCID: PMC7200821 DOI: 10.1093/biomethods/bpz012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/20/2019] [Accepted: 08/28/2019] [Indexed: 12/12/2022] Open
Abstract
Due to the large interdependence between the molecular components of living systems, many phenomena, including those related to pathologies, cannot be explained in terms of a single gene or a small number of genes. Molecular networks, representing different types of relationships between molecular entities, embody these large sets of interdependences in a framework that allow their mining from a systemic point of view to obtain information. These networks, often generated from high-throughput omics datasets, are used to study the complex phenomena of human pathologies from a systemic point of view. Complementing the reductionist approach of molecular biology, based on the detailed study of a small number of genes, systemic approaches to human diseases consider that these are better reflected in large and intricate networks of relationships between genes. These networks, and not the single genes, provide both better markers for diagnosing diseases and targets for treating them. Network approaches are being used to gain insight into the molecular basis of complex diseases and interpret the large datasets associated with them, such as genomic variants. Network formalism is also suitable for integrating large, heterogeneous and multilevel datasets associated with diseases from the molecular level to organismal and epidemiological scales. Many of these approaches are available to nonexpert users through standard software packages.
Collapse
Affiliation(s)
- Monica Chagoyen
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| |
Collapse
|
22
|
Wani N, Raza K. Integrative approaches to reconstruct regulatory networks from multi-omics data: A review of state-of-the-art methods. Comput Biol Chem 2019; 83:107120. [PMID: 31499298 DOI: 10.1016/j.compbiolchem.2019.107120] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 02/22/2019] [Accepted: 08/27/2019] [Indexed: 02/06/2023]
Abstract
Data generation using high throughput technologies has led to the accumulation of diverse types of molecular data. These data have different types (discrete, real, string, etc.) and occur in various formats and sizes. Datasets including gene expression, miRNA expression, protein-DNA binding data (ChIP-Seq/ChIP-ChIP), mutation data (copy number variation, single nucleotide polymorphisms), annotations, interactions, and association data are some of the commonly used biological datasets to study various cellular mechanisms of living organisms. Each of them provides a unique, complementary and partly independent view of the genome and hence embed essential information about the regulatory mechanisms of genes and their products. Therefore, integrating these data and inferring regulatory interactions from them offer a system level of biological insight in predicting gene functions and their phenotypic outcomes. To study genome functionality through regulatory networks, different methods have been proposed for collective mining of information from an integrated dataset. We survey here integration methods that reconstruct regulatory networks using state-of-the-art techniques to handle multi-omics (i.e., genomic, transcriptomic, proteomic) and other biological datasets.
Collapse
Affiliation(s)
- Nisar Wani
- Govt. Degree College Baramulla, J & K, India; Department of Computer Science, jamia Milia Islamia, New Delhi, India
| | - Khalid Raza
- Department of Computer Science, jamia Milia Islamia, New Delhi, India.
| |
Collapse
|
23
|
Zhou X, Dai E, Song Q, Ma X, Meng Q, Jiang Y, Jiang W. In silico drug repositioning based on drug-miRNA associations. Brief Bioinform 2019; 21:498-510. [DOI: 10.1093/bib/bbz012] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 12/14/2018] [Accepted: 01/11/2019] [Indexed: 02/06/2023] Open
Abstract
Abstract
Drug repositioning has become a prevailing tactic as this strategy is efficient, economical and low risk for drug discovery. Meanwhile, recent studies have confirmed that small-molecule drugs can modulate the expression of disease-related miRNAs, which indicates that miRNAs are promising therapeutic targets for complex diseases. In this study, we put forward and verified the hypothesis that drugs with similar miRNA profiles may share similar therapeutic properties. Furthermore, a comprehensive drug–drug interaction network was constructed based on curated drug-miRNA associations. Through random network comparison, topological structure analysis and network module extraction, we found that the closely linked drugs in the network tend to treat the same diseases. Additionally, the curated drug–disease relationships (from the CTD) and random walk with restarts algorithm were utilized on the drug–drug interaction network to identify the potential drugs for a given disease. Both internal validation (leave-one-out cross-validation) and external validation (independent drug–disease data set from the ChEMBL) demonstrated the effectiveness of the proposed approach. Finally, by integrating drug-miRNA and miRNA-disease information, we also explain the modes of action of drugs in the view of miRNA regulation. In summary, our work could determine novel and credible drug indications and offer novel insights and valuable perspectives for drug repositioning.
Collapse
Affiliation(s)
- Xu Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, P. R. China
| | - Enyu Dai
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, P. R. China
| | - Qian Song
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, P. R. China
| | - Xueyan Ma
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, P. R. China
| | - Qianqian Meng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, P. R. China
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, P. R. China
| | - Wei Jiang
- Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, P. R. China
| |
Collapse
|
24
|
Kveler K, Starosvetsky E, Ziv-Kenet A, Kalugny Y, Gorelik Y, Shalev-Malul G, Aizenbud-Reshef N, Dubovik T, Briller M, Campbell J, Rieckmann JC, Asbeh N, Rimar D, Meissner F, Wiser J, Shen-Orr SS. Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed. Nat Biotechnol 2018; 36:651-659. [PMID: 29912209 PMCID: PMC6035104 DOI: 10.1038/nbt.4152] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 04/05/2018] [Indexed: 02/07/2023]
Abstract
Cytokines are signaling molecules secreted and sensed by immune and other cell types, enabling dynamic intercellular communication. Although a vast amount of data on these interactions exists, this information is not compiled, integrated or easily searchable. Here we report immuneXpresso, a text-mining engine that structures and standardizes knowledge of immune intercellular communication. We applied immuneXpresso to PubMed to identify relationships between 340 cell types and 140 cytokines across thousands of diseases. The method is able to distinguish between incoming and outgoing interactions, and it includes the effect of the interaction and the cellular function involved. These factors are assigned a confidence score and linked to the disease. By leveraging the breadth of this network, we predicted and experimentally verified previously unappreciated cell-cytokine interactions. We also built a global immune-centric view of diseases and used it to predict cytokine-disease associations. This standardized knowledgebase (http://www.immunexpresso.org) opens up new directions for interpretation of immune data and model-driven systems immunology.
Collapse
Affiliation(s)
- Ksenya Kveler
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
| | - Elina Starosvetsky
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
| | - Amit Ziv-Kenet
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
| | - Yuval Kalugny
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
- CytoReason, Tel-Aviv, 67012, Israel
| | - Yuri Gorelik
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
| | - Gali Shalev-Malul
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
| | - Netta Aizenbud-Reshef
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
| | - Tania Dubovik
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
| | - Mayan Briller
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
| | - John Campbell
- Northrop Grumman IT Health Solutions, Rockville, MD 20850, USA
| | - Jan C. Rieckmann
- Experimental Systems Immunology, Max Planck Institute of Biochemistry, Bayern, 82152, Germany
| | - Nuaman Asbeh
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
| | - Doron Rimar
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
- Rheumatology Unit, Bnai Zion Medical Center, Haifa 31048, Israel
| | - Felix Meissner
- Experimental Systems Immunology, Max Planck Institute of Biochemistry, Bayern, 82152, Germany
| | - Jeff Wiser
- Northrop Grumman IT Health Solutions, Rockville, MD 20850, USA
| | - Shai S. Shen-Orr
- Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3525433, Israel
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 3200003, Israel
| |
Collapse
|
25
|
Shalev SA. Characteristics of genetic diseases in consanguineous populations in the genomic era: Lessons from Arab communities in North Israel. Clin Genet 2018; 95:3-9. [PMID: 29427439 DOI: 10.1111/cge.13231] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Revised: 02/01/2018] [Accepted: 02/06/2018] [Indexed: 01/15/2023]
Abstract
The health outcome of consanguineous/endogamous unions is an increased risk of autosomal recessive disorders in their progeny. This manuscript is focused on consanguineous/endogamous populations living in North Israel. Molecular tools show that spouses' relatedness and hence their risks for congenital diseases among offspring are often greater than the risk calculated on the basis of reported pedigrees. Revealing founder mutations allow for effective genetic counseling, but also induce genetic screening of the whole community in case the mutations are found to be frequent. More complex genetic mechanisms, such as co-inheritance of more than one condition, allelic and even locus heterogeneity, have been identified. These mechanisms make genetic counseling more challenging but with the advancement of molecular techniques, diseases can be better deciphered. Yet, the presence of multiple mutations responsible for genetic diseases in isolated populations, and occasionally locus heterogeneity of diseases, is an unexpected phenomenon that still needs mechanistic clarification. It seems probably that addressing genetic counseling challenges and estimations of risks for genetic morbidity in consanguineous/endogamous couples will be achieved by introducing high-throughput genetic technologies into daily practice. The genomic era has expanded dramatically the translation of research products to genetic counseling tools, and this tendency is expected to yield a stronger impact in a near future.
Collapse
Affiliation(s)
- S A Shalev
- The Genetic Institute Emek Medical Center, Afula, Israel.,Rappaport Faculty of Medicine, Technion, Haifa, Israel
| |
Collapse
|
26
|
Li Y, Sahni N, Yi S. Comparative analysis of protein interactome networks prioritizes candidate genes with cancer signatures. Oncotarget 2018; 7:78841-78849. [PMID: 27791983 PMCID: PMC5346681 DOI: 10.18632/oncotarget.12879] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Accepted: 10/14/2016] [Indexed: 12/12/2022] Open
Abstract
Comprehensive understanding of human cancer mechanisms requires the identification of a thorough list of cancer-associated genes, which could serve as biomarkers for diagnoses and therapies in various types of cancer. Although substantial progress has been made in functional studies to uncover genes involved in cancer, these efforts are often time-consuming and costly. Therefore, it remains challenging to comprehensively identify cancer candidate genes. Network-based methods have accelerated this process through the analysis of complex molecular interactions in the cell. However, the extent to which various interactome networks can contribute to prediction of candidate genes responsible for cancer is still enigmatic. In this study, we evaluated different human protein-protein interactome networks and compared their application to cancer gene prioritization. Our results indicate that network analyses can increase the power to identify novel cancer genes. In particular, such predictive power can be enhanced with the use of unbiased systematic protein interaction maps for cancer gene prioritization. Functional analysis reveals that the top ranked genes from network predictions co-occur often with cancer-related terms in literature, and further, these candidate genes are indeed frequently mutated across cancers. Finally, our study suggests that integrating interactome networks with other omics datasets could provide novel insights into cancer-associated genes and underlying molecular mechanisms.
Collapse
Affiliation(s)
- Yongsheng Li
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Nidhi Sahni
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.,Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Song Yi
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
27
|
Gao L, Uzun Y, Gao P, He B, Ma X, Wang J, Han S, Tan K. Identifying noncoding risk variants using disease-relevant gene regulatory networks. Nat Commun 2018; 9:702. [PMID: 29453388 PMCID: PMC5816022 DOI: 10.1038/s41467-018-03133-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 01/22/2018] [Indexed: 02/01/2023] Open
Abstract
Identifying noncoding risk variants remains a challenging task. Because noncoding variants exert their effects in the context of a gene regulatory network (GRN), we hypothesize that explicit use of disease-relevant GRNs can significantly improve the inference accuracy of noncoding risk variants. We describe Annotation of Regulatory Variants using Integrated Networks (ARVIN), a general computational framework for predicting causal noncoding variants. It employs a set of novel regulatory network-based features, combined with sequence-based features to infer noncoding risk variants. Using known causal variants in gene promoters and enhancers in a number of diseases, we show ARVIN outperforms state-of-the-art methods that use sequence-based features alone. Additional experimental validation using reporter assay further demonstrates the accuracy of ARVIN. Application of ARVIN to seven autoimmune diseases provides a holistic view of the gene subnetwork perturbed by the combinatorial action of the entire set of risk noncoding mutations.
Collapse
Affiliation(s)
- Long Gao
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yasin Uzun
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Peng Gao
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Bing He
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, 710126, Shaanxi, China
| | - Jiahui Wang
- The Jackson Laboratory, Farmington, CT, 06032, USA
| | - Shizhong Han
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Kai Tan
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Department of Cell & Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
28
|
Jalilvand A, Akbari B, Zare Mirakabad F. S-FLN: A sequence-based hierarchical approach for functional linkage network construction. J Theor Biol 2018; 437:149-162. [PMID: 29080781 DOI: 10.1016/j.jtbi.2017.10.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Revised: 07/27/2017] [Accepted: 10/18/2017] [Indexed: 11/24/2022]
Abstract
The functional linkage network (FLN) construction is a primary and important step in drug discovery and disease gene prioritization methods. In order to construct FLN, several methods have been introduced based on integration of various biological data. Although, there are impressive ideas behind these methods, they suffer from low quality of the biological data. In this paper, a hierarchical sequence-based approach is proposed to construct FLN. The proposed approach, denoted as S-FLN (Sequence-based Functional Linkage Network), uses the sequence of proteins as the primary data in three main steps. Firstly, the physicochemical properties of amino-acids are employed to describe the functionality of proteins. As the sequence of proteins is a more comprehensive and accurate primary data, more reliable relations are achieved. Secondly, seven different descriptor methods are used to extract feature vectors from the proteins sequences. Advantage of different descriptor methods lead to obtain diverse ensemble learners in the next step. Finally, a two-layer ensemble learning structure is proposed to calculated the score of protein pairs. The proposed approach has been evaluated using two biological datasets, S.Cerevisiae and H.Pylori, and resulted in 93.9% and 91.15% precision rates, respectively. The results of various experiments indicate the efficiency and validity of the proposed approach.
Collapse
Affiliation(s)
- A Jalilvand
- Department of Electronic and computer engineering,Tarbiat Modares University, Tehran, Iran
| | - B Akbari
- Department of Electronic and computer engineering,Tarbiat Modares University, Tehran, Iran.
| | - F Zare Mirakabad
- Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| |
Collapse
|
29
|
Chen Y, Kho AN, Liebovitz D, Ivory C, Osmundson S, Bian J, Malin BA. Learning bundled care opportunities from electronic medical records. J Biomed Inform 2018; 77:1-10. [PMID: 29174994 PMCID: PMC5771885 DOI: 10.1016/j.jbi.2017.11.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 10/30/2017] [Accepted: 11/21/2017] [Indexed: 01/29/2023]
Abstract
OBJECTIVE The traditional fee-for-service approach to healthcare can lead to the management of a patient's conditions in a siloed manner, inducing various negative consequences. It has been recognized that a bundled approach to healthcare - one that manages a collection of health conditions together - may enable greater efficacy and cost savings. However, it is not always evident which sets of conditions should be managed in a bundled manner. In this study, we investigate if a data-driven approach can automatically learn potential bundles. METHODS We designed a framework to infer health condition collections (HCCs) based on the similarity of their clinical workflows, according to electronic medical record (EMR) utilization. We evaluated the framework with data from over 16,500 inpatient stays from Northwestern Memorial Hospital in Chicago, Illinois. The plausibility of the inferred HCCs for bundled care was assessed through an online survey of a panel of five experts, whose responses were analyzed via an analysis of variance (ANOVA) at a 95% confidence level. We further assessed the face validity of the HCCs using evidence in the published literature. RESULTS The framework inferred four HCCs, indicative of (1) fetal abnormalities, (2) late pregnancies, (3) prostate problems, and (4) chronic diseases, with congestive heart failure featuring prominently. Each HCC was substantiated with evidence in the literature and was deemed plausible for bundled care by the experts at a statistically significant level. CONCLUSIONS The findings suggest that an automated EMR data-driven framework conducted can provide a basis for discovering bundled care opportunities. Still, translating such findings into actual care management will require further refinement, implementation, and evaluation.
Collapse
Affiliation(s)
- You Chen
- Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.
| | - Abel N Kho
- Institute for Public Health and Medicine, Northwestern University, Chicago, IL, USA
| | | | - Catherine Ivory
- School of Nursing, Vanderbilt University, Nashville, TN, USA
| | - Sarah Osmundson
- Dept. of Obstetrics and Gynecology, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Jiang Bian
- Dept. of Health Outcomes and Policy, University of Florida, Gainesville, FL, USA
| | - Bradley A Malin
- Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA; Dept. of Biostatistics, School of Medicine, Vanderbilt University, Nashville, TN, USA; Dept. of Electrical Engineering & Computer Science, School of Engineering, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
30
|
|
31
|
Lin JR, Zhang Q, Cai Y, Morrow BE, Zhang ZD. Integrated rare variant-based risk gene prioritization in disease case-control sequencing studies. PLoS Genet 2017; 13:e1007142. [PMID: 29281626 PMCID: PMC5760082 DOI: 10.1371/journal.pgen.1007142] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 01/09/2018] [Accepted: 12/01/2017] [Indexed: 12/17/2022] Open
Abstract
Rare variants of major effect play an important role in human complex diseases and can be discovered by sequencing-based genome-wide association studies. Here, we introduce an integrated approach that combines the rare variant association test with gene network and phenotype information to identify risk genes implicated by rare variants for human complex diseases. Our data integration method follows a 'discovery-driven' strategy without relying on prior knowledge about the disease and thus maintains the unbiased character of genome-wide association studies. Simulations reveal that our method can outperform a widely-used rare variant association test method by 2 to 3 times. In a case study of a small disease cohort, we uncovered putative risk genes and the corresponding rare variants that may act as genetic modifiers of congenital heart disease in 22q11.2 deletion syndrome patients. These variants were missed by a conventional approach that relied on the rare variant association test alone. Case-control sequencing studies are a promising design to uncover risk genes of human complex diseases implicated by rare variants. The recent development of different types of rare variant association tests has improved the statistical power to identify disease genes that harbor risk rare variants. However, none of the recent sequencing-based genome-wide association studies identified robust disease association of rare variants or genes based on them. Due to limited sample sizes that can be feasibly achieved in real applications, current rare variant association tests can only generate marginal association signals for most risk genes. Here we proposed an integrated method that combined association signals with orthogonal biological evidence to uncover risk genes in sequencing studies. Designed to address the lack-of-power issue, our method was shown to effectively uncover risk genes with marginal association signals in data simulation. Indeed, in a real application demonstrated in our case study our method disclosed important risk genes of congenital heart disease in 22q11.2 deletion syndrome that were missed by the previous study.
Collapse
Affiliation(s)
- Jhih-Rong Lin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Quanwei Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Ying Cai
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Bernice E Morrow
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Zhengdong D Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| |
Collapse
|
32
|
Yang F, Wu D, Lin L, Yang J, Yang T, Zhao J. The integration of weighted gene association networks based on information entropy. PLoS One 2017; 12:e0190029. [PMID: 29272314 PMCID: PMC5741255 DOI: 10.1371/journal.pone.0190029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 12/06/2017] [Indexed: 01/18/2023] Open
Abstract
Constructing genome scale weighted gene association networks (WGAN) from multiple data sources is one of research hot spots in systems biology. In this paper, we employ information entropy to describe the uncertain degree of gene-gene links and propose a strategy for data integration of weighted networks. We use this method to integrate four existing human weighted gene association networks and construct a much larger WGAN, which includes richer biology information while still keeps high functional relevance between linked gene pairs. The new WGAN shows satisfactory performance in disease gene prediction, which suggests the reliability of our integration strategy. Compared with existing integration methods, our method takes the advantage of the inherent characteristics of the component networks and pays less attention to the biology background of the data. It can make full use of existing biological networks with low computational effort.
Collapse
Affiliation(s)
- Fan Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Duzhi Wu
- Rongzhi College of Chongqing Technology and Business, Chongqing, China
- * E-mail: (DW); (JZ)
| | - Limei Lin
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jian Yang
- School of Pharmacy, Second Military Medical University, Shanghai, China
| | - Tinghong Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jing Zhao
- Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
- * E-mail: (DW); (JZ)
| |
Collapse
|
33
|
Lin L, Yang T, Fang L, Yang J, Yang F, Zhao J. Gene gravity-like algorithm for disease gene prediction based on phenotype-specific network. BMC SYSTEMS BIOLOGY 2017; 11:121. [PMID: 29212543 PMCID: PMC5718078 DOI: 10.1186/s12918-017-0519-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 11/24/2017] [Indexed: 01/24/2023]
Abstract
Background Polygenic diseases are usually caused by the dysfunction of multiple genes. Unravelling such disease genes is crucial to fully understand the genetic landscape of diseases on molecular level. With the advent of ‘omic’ data era, network-based methods have prominently boosted disease gene discovery. However, how to make better use of different types of data for the prediction of disease genes remains a challenge. Results In this study, we improved the performance of disease gene prediction by integrating the similarity of disease phenotype, biological function and network topology. First, for each phenotype, a phenotype-specific network was specially constructed by mapping phenotype similarity information of given phenotype onto the protein-protein interaction (PPI) network. Then, we developed a gene gravity-like algorithm, to score candidate genes based on not only topological similarity but also functional similarity. We tested the proposed network and algorithm by conducting leave-one-out and leave-10%-out cross validation and compared them with state-of-art algorithms. The results showed a preference to phenotype-specific network as well as gene gravity-like algorithm. At last, we tested the predicting capacity of proposed algorithms by test gene set derived from the DisGeNET database. Also, potential disease genes of three polygenic diseases, obesity, prostate cancer and lung cancer, were predicted by proposed methods. We found that the predicted disease genes are highly consistent with literature and database evidence. Conclusions The good performance of phenotype-specific networks indicates that phenotype similarity information has positive effect on the prediction of disease genes. The proposed gene gravity-like algorithm outperforms the algorithm of Random Walk with Restart (RWR), implicating its predicting capacity by combing topological similarity with functional similarity. Our work will give an insight to the discovery of disease genes by fusing multiple similarities of genes and diseases. Electronic supplementary material The online version of this article (10.1186/s12918-017-0519-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Limei Lin
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Tinghong Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Ling Fang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jian Yang
- School of Pharmacy, Second Military Medical University, Shanghai, China
| | - Fan Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jing Zhao
- Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
| |
Collapse
|
34
|
Petegrosso R, Park S, Hwang TH, Kuang R. Transfer learning across ontologies for phenome-genome association prediction. Bioinformatics 2017; 33:529-536. [PMID: 27797759 DOI: 10.1093/bioinformatics/btw649] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 10/11/2016] [Indexed: 12/15/2022] Open
Abstract
Motivation To better predict and analyze gene associations with the collection of phenotypes organized in a phenotype ontology, it is crucial to effectively model the hierarchical structure among the phenotypes in the ontology and leverage the sparse known associations with additional training information. In this paper, we first introduce Dual Label Propagation (DLP) to impose consistent associations with the entire phenotype paths in predicting phenotype-gene associations in Human Phenotype Ontology (HPO). DLP is then used as the base model in a transfer learning framework (tlDLP) to incorporate functional annotations in Gene Ontology (GO). By simultaneously reconstructing GO term-gene associations and HPO phenotype-gene associations for all the genes in a protein-protein interaction network, tlDLP benefits from the enriched training associations indirectly through relation with GO terms. Results In the experiments to predict the associations between human genes and phenotypes in HPO based on human protein-protein interaction network, both DLP and tlDLP improved the prediction of gene associations with phenotype paths in HPO in cross-validation and the prediction of the most recent associations added after the snapshot of the training data. Moreover, the transfer learning through GO term-gene associations significantly improved association predictions for the phenotypes with no more specific known associations by a large margin. Examples are also shown to demonstrate how phenotype paths in phenotype ontology and transfer learning with gene ontology can improve the predictions. Availability and Implementation Source code is available at http://compbio.cs.umn.edu/onto phenome . Contact kuang@cs.umn.com. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Raphael Petegrosso
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | - Sunho Park
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Tae Hyun Hwang
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Rui Kuang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| |
Collapse
|
35
|
Peng C, Li A, Wang M. Discovery of Bladder Cancer-related Genes Using Integrative Heterogeneous Network Modeling of Multi-omics Data. Sci Rep 2017; 7:15639. [PMID: 29142286 PMCID: PMC5688092 DOI: 10.1038/s41598-017-15890-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 11/02/2017] [Indexed: 02/06/2023] Open
Abstract
In human health, a fundamental challenge is the identification of disease-related genes. Bladder cancer (BC) is a worldwide malignant tumor, which has resulted in 170,000 deaths in 2010 up from 114,000 in 1990. Moreover, with the emergence of multi-omics data, more comprehensive analysis of human diseases become possible. In this study, we propose a multi-step approach for the identification of BC-related genes by using integrative Heterogeneous Network Modeling of Multi-Omics data (iHNMMO). The heterogeneous network model properly and comprehensively reflects the multiple kinds of relationships between genes in the multi-omics data of BC, including general relationships, unique relationships under BC condition, correlational relationships within each omics and regulatory relationships between different omics. Besides, a network-based propagation algorithm with resistance is utilized to quantize the relationships between genes and BC precisely. The results of comprehensive performance evaluation suggest that iHNMMO significantly outperforms other approaches. Moreover, further analysis suggests that the top ranked genes may be functionally implicated in BC, which also confirms the superiority of iHNMMO. In summary, this study shows that disease-related genes can be better identified through reasonable integration of multi-omics data.
Collapse
Affiliation(s)
- Chen Peng
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, 201804, P.R. China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China.
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230037, China.
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230037, China
| |
Collapse
|
36
|
Bourdakou MM, Spyrou GM. Informed walks: whispering hints to gene hunters inside networks' jungle. BMC SYSTEMS BIOLOGY 2017; 11:97. [PMID: 29020948 PMCID: PMC5637247 DOI: 10.1186/s12918-017-0473-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 10/03/2017] [Indexed: 12/24/2022]
Abstract
Background Systemic approaches offer a different point of view on the analysis of several types of molecular associations as well as on the identification of specific gene communities in several cancer types. However, due to lack of sufficient data needed to construct networks based on experimental evidence, statistical gene co-expression networks are widely used instead. Many efforts have been made to exploit the information hidden in these networks. However, these approaches still need to capitalize comprehensively the prior knowledge encrypted into molecular pathway associations and improve their efficiency regarding the discovery of both exclusive subnetworks as candidate biomarkers and conserved subnetworks that may uncover common origins of several cancer types. Methods In this study we present the development of the Informed Walks model based on random walks that incorporate information from molecular pathways to mine candidate genes and gene-gene links. The proposed model has been applied to TCGA (The Cancer Genome Atlas) datasets from seven different cancer types, exploring the reconstructed co-expression networks of the whole set of genes and driving to highlighted sub-networks for each cancer type. In the sequel, we elucidated the impact of each subnetwork on the indication of underlying exclusive and common molecular mechanisms as well as on the short-listing of drugs that have the potential to suppress the corresponding cancer type through a drug-repurposing pipeline. Conclusions We have developed a method of gene subnetwork highlighting based on prior knowledge, capable to give fruitful insights regarding the underlying molecular mechanisms and valuable input to drug-repurposing pipelines for a variety of cancer types. Electronic supplementary material The online version of this article (10.1186/s12918-017-0473-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marilena M Bourdakou
- Bioinformatics ERA Chair, The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, Ayios Dometios, 2370, Nicosia, Cyprus.,Center of Systems Biology, Biomedical Research Foundation, Academy of Athens, Soranou Ephessiou 4, 115 27, Athens, Greece
| | - George M Spyrou
- Bioinformatics ERA Chair, The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, Ayios Dometios, 2370, Nicosia, Cyprus.
| |
Collapse
|
37
|
Systems Study on the Antirheumatic Mechanism of Tibetan Medicated-Bath Therapy Using Wuwei-Ganlu-Yaoyu-Keli. BIOMED RESEARCH INTERNATIONAL 2017; 2017:2320932. [PMID: 29090217 PMCID: PMC5635470 DOI: 10.1155/2017/2320932] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Revised: 01/22/2017] [Accepted: 07/30/2017] [Indexed: 12/29/2022]
Abstract
In clinical practice at Tibetan area of China, Traditional Tibetan Medicine formula Wuwei-Ganlu-Yaoyu-Keli (WGYK) is commonly added in warm water of bath therapy to treat rheumatoid arthritis (RA). However, its mechanism of action is not well interpreted yet. In this paper, we first verify WGYK's anti-RA effect by an animal experiment. Then, based on gene expression data from microarray experiments, we apply approaches of network pharmacology to further reveal the mechanism of action for WGYK to treat RA by analyzing protein-protein interactions and pathways. This study may facilitate our understanding of anti-RA effect of WGYK from perspective of network pharmacology.
Collapse
|
38
|
Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:905-915. [PMID: 27076459 DOI: 10.1109/tcbb.2016.2550432] [Citation(s) in RCA: 209] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Since the discovery of the regulatory function of microRNA (miRNA), increased attention has focused on identifying the relationship between miRNA and disease. It has been suggested that computational method are an efficient way to identify potential disease-related miRNAs for further confirmation using biological experiments. In this paper, we first highlighted three limitations commonly associated with previous computational methods. To resolve these limitations, we established disease similarity subnetwork and miRNA similarity subnetwork by integrating multiple data sources, where the disease similarity is composed of disease semantic similarity and disease functional similarity, and the miRNA similarity is calculated using the miRNA-target gene and miRNA-lncRNA (long non-coding RNA) associations. Then, a heterogeneous network was constructed by connecting the disease similarity subnetwork and the miRNA similarity subnetwork using the known miRNA-disease associations. We extended random walk with restart to predict miRNA-disease associations in the heterogeneous network. The leave-one-out cross-validation achieved an average area under the curve (AUC) of 0:8049 across 341 diseases and 476 miRNAs. For five-fold cross-validation, our method achieved an AUC from 0:7970 to 0:9249 for 15 human diseases. Case studies further demonstrated the feasibility of our method to discover potential miRNA-disease associations. An online service for prediction is freely available at http://ifmda.aliapp.com.
Collapse
|
39
|
Jo M, Chung AY, Yachie N, Seo M, Jeon H, Nam Y, Seo Y, Kim E, Zhong Q, Vidal M, Park HC, Roth FP, Suk K. Yeast genetic interaction screen of human genes associated with amyotrophic lateral sclerosis: identification of MAP2K5 kinase as a potential drug target. Genome Res 2017; 27:1487-1500. [PMID: 28596290 PMCID: PMC5580709 DOI: 10.1101/gr.211649.116] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 06/06/2017] [Indexed: 12/13/2022]
Abstract
To understand disease mechanisms, a large-scale analysis of human–yeast genetic interactions was performed. Of 1305 human disease genes assayed, 20 genes exhibited strong toxicity in yeast. Human–yeast genetic interactions were identified by en masse transformation of the human disease genes into a pool of 4653 homozygous diploid yeast deletion mutants with unique barcode sequences, followed by multiplexed barcode sequencing to identify yeast toxicity modifiers. Subsequent network analyses focusing on amyotrophic lateral sclerosis (ALS)-associated genes, such as optineurin (OPTN) and angiogenin (ANG), showed that the human orthologs of the yeast toxicity modifiers of these ALS genes are enriched for several biological processes, such as cell death, lipid metabolism, and molecular transport. When yeast genetic interaction partners held in common between human OPTN and ANG were validated in mammalian cells and zebrafish, MAP2K5 kinase emerged as a potential drug target for ALS therapy. The toxicity modifiers identified in this study may deepen our understanding of the pathogenic mechanisms of ALS and other devastating diseases.
Collapse
Affiliation(s)
- Myungjin Jo
- Department of Pharmacology, Brain Science and Engineering Institute, and Department of Biomedical Sciences, BK21 Plus KNU Biomedical Convergence Program, Kyungpook National University School of Medicine, Daegu, 41944, Korea
| | - Ah Young Chung
- Department of Biomedical Sciences, Korea University Ansan Hospital, Ansan-si, Gyeonggi-do, 425-707, Korea
| | - Nozomu Yachie
- Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto and Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario M5G 1X5, Canada
| | - Minchul Seo
- Department of Pharmacology, Brain Science and Engineering Institute, and Department of Biomedical Sciences, BK21 Plus KNU Biomedical Convergence Program, Kyungpook National University School of Medicine, Daegu, 41944, Korea
| | - Hyejin Jeon
- Department of Pharmacology, Brain Science and Engineering Institute, and Department of Biomedical Sciences, BK21 Plus KNU Biomedical Convergence Program, Kyungpook National University School of Medicine, Daegu, 41944, Korea
| | - Youngpyo Nam
- Department of Pharmacology, Brain Science and Engineering Institute, and Department of Biomedical Sciences, BK21 Plus KNU Biomedical Convergence Program, Kyungpook National University School of Medicine, Daegu, 41944, Korea
| | - Yeojin Seo
- Department of Pharmacology, Brain Science and Engineering Institute, and Department of Biomedical Sciences, BK21 Plus KNU Biomedical Convergence Program, Kyungpook National University School of Medicine, Daegu, 41944, Korea
| | - Eunmi Kim
- Department of Biomedical Sciences, Korea University Ansan Hospital, Ansan-si, Gyeonggi-do, 425-707, Korea
| | - Quan Zhong
- Department of Biological Sciences, Wright State University, Dayton, Ohio 45435, USA
| | - Marc Vidal
- Department of Biological Sciences, Wright State University, Dayton, Ohio 45435, USA
| | - Hae Chul Park
- Department of Biomedical Sciences, Korea University Ansan Hospital, Ansan-si, Gyeonggi-do, 425-707, Korea
| | - Frederick P Roth
- Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto and Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario M5G 1X5, Canada.,Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada
| | - Kyoungho Suk
- Department of Pharmacology, Brain Science and Engineering Institute, and Department of Biomedical Sciences, BK21 Plus KNU Biomedical Convergence Program, Kyungpook National University School of Medicine, Daegu, 41944, Korea
| |
Collapse
|
40
|
A novel network regularized matrix decomposition method to detect mutated cancer genes in tumour samples with inter-patient heterogeneity. Sci Rep 2017; 7:2855. [PMID: 28588243 PMCID: PMC5460199 DOI: 10.1038/s41598-017-03141-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 04/20/2017] [Indexed: 01/01/2023] Open
Abstract
Inter-patient heterogeneity is a major challenge for mutated cancer genes detection which is crucial to advance cancer diagnostics and therapeutics. To detect mutated cancer genes in heterogeneous tumour samples, a prominent strategy is to determine whether the genes are recurrently mutated in their interaction network context. However, recent studies show that some cancer genes in different perturbed pathways are mutated in different subsets of samples. Subsequently, these genes may not display significant mutational recurrence and thus remain undiscovered even in consideration of network information. We develop a novel method called mCGfinder to efficiently detect mutated cancer genes in tumour samples with inter-patient heterogeneity. Based on matrix decomposition framework incorporated with gene interaction network information, mCGfinder can successfully measure the significance of mutational recurrence of genes in a subset of samples. When applying mCGfinder on TCGA somatic mutation datasets of five types of cancers, we find that the genes detected by mCGfinder are significantly enriched for known cancer genes, and yield substantially smaller p-values than other existing methods. All the results demonstrate that mCGfinder is an efficient method in detecting mutated cancer genes.
Collapse
|
41
|
Santiago JA, Bottero V, Potashkin JA. Dissecting the Molecular Mechanisms of Neurodegenerative Diseases through Network Biology. Front Aging Neurosci 2017; 9:166. [PMID: 28611656 PMCID: PMC5446999 DOI: 10.3389/fnagi.2017.00166] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 05/12/2017] [Indexed: 12/27/2022] Open
Abstract
Neurodegenerative diseases are rarely caused by a mutation in a single gene but rather influenced by a combination of genetic, epigenetic and environmental factors. Emerging high-throughput technologies such as RNA sequencing have been instrumental in deciphering the molecular landscape of neurodegenerative diseases, however, the interpretation of such large amounts of data remains a challenge. Network biology has become a powerful platform to integrate multiple omics data to comprehensively explore the molecular networks in the context of health and disease. In this review article, we highlight recent advances in network biology approaches with an emphasis in brain-networks that have provided insights into the molecular mechanisms leading to the most prevalent neurodegenerative diseases including Alzheimer’s (AD), Parkinson’s (PD) and Huntington’s diseases (HD). We discuss how integrative approaches using multi-omics data from different tissues have been valuable for identifying biomarkers and therapeutic targets. In addition, we discuss the challenges the field of network medicine faces toward the translation of network-based findings into clinically actionable tools for personalized medicine applications.
Collapse
Affiliation(s)
- Jose A Santiago
- Department of Cellular and Molecular Pharmacology, The Chicago Medical School, Rosalind Franklin University of Medicine and ScienceNorth Chicago, IL, United States
| | - Virginie Bottero
- Department of Cellular and Molecular Pharmacology, The Chicago Medical School, Rosalind Franklin University of Medicine and ScienceNorth Chicago, IL, United States
| | - Judith A Potashkin
- Department of Cellular and Molecular Pharmacology, The Chicago Medical School, Rosalind Franklin University of Medicine and ScienceNorth Chicago, IL, United States
| |
Collapse
|
42
|
Peng C, Li A. A Heterogeneous Network Based Method for Identifying GBM-Related Genes by Integrating Multi-Dimensional Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:713-720. [PMID: 28113912 DOI: 10.1109/tcbb.2016.2555314] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The emergence of multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of human diseases and therefore improving diagnosis, treatment, and prevention. In this study, we proposed a heterogeneous network based method by integrating multi-dimensional data (HNMD) to identify GBM-related genes. The novelty of the method lies in that the multi-dimensional data of GBM from TCGA dataset that provide comprehensive information of genes, are combined with protein-protein interactions to construct a weighted heterogeneous network, which reflects both the general and disease-specific relationships between genes. In addition, a propagation algorithm with resistance is introduced to precisely score and rank GBM-related genes. The results of comprehensive performance evaluation show that the proposed method significantly outperforms the network based methods with single-dimensional data and other existing approaches. Subsequent analysis of the top ranked genes suggests they may be functionally implicated in GBM, which further corroborates the superiority of the proposed method. The source code and the results of HNMD can be downloaded from the following URL: http://bioinformatics.ustc.edu.cn/hnmd/ .
Collapse
|
43
|
Yang J, Yang T, Wu D, Lin L, Yang F, Zhao J. The integration of weighted human gene association networks based on link prediction. BMC SYSTEMS BIOLOGY 2017; 11:12. [PMID: 28137253 PMCID: PMC5282786 DOI: 10.1186/s12918-017-0398-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 01/25/2017] [Indexed: 12/27/2022]
Abstract
Background Physical and functional interplays between genes or proteins have important biological meaning for cellular functions. Some efforts have been made to construct weighted gene association meta-networks by integrating multiple biological resources, where the weight indicates the confidence of the interaction. However, it is found that these existing human gene association networks share only quite limited overlapped interactions, suggesting their incompleteness and noise. Results Here we proposed a workflow to construct a weighted human gene association network using information of six existing networks, including two weighted specific PPI networks and four gene association meta-networks. We applied link prediction algorithm to predict possible missing links of the networks, cross-validation approach to refine each network and finally integrated the refined networks to get the final integrated network. Conclusions The common information among the refined networks increases notably, suggesting their higher reliability. Our final integrated network owns much more links than most of the original networks, meanwhile its links still keep high functional relevance. Being used as background network in a case study of disease gene prediction, the final integrated network presents good performance, implying its reliability and application significance. Our workflow could be insightful for integrating and refining existing gene association data. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0398-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jian Yang
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Tinghong Yang
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Duzhi Wu
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Limei Lin
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Fan Yang
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Jing Zhao
- Department of Mathematics, Logistical Engineering University, Chongqing, China. .,Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
| |
Collapse
|
44
|
Pouladi N, Achour I, Li H, Berghout J, Kenost C, Gonzalez-Garay ML, Lussier YA. Biomechanisms of Comorbidity: Reviewing Integrative Analyses of Multi-omics Datasets and Electronic Health Records. Yearb Med Inform 2016; 25:194-206. [PMID: 27830251 PMCID: PMC5171562 DOI: 10.15265/iy-2016-040] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
OBJECTIVES Disease comorbidity is a pervasive phenomenon impacting patients' health outcomes, disease management, and clinical decisions. This review presents past, current and future research directions leveraging both phenotypic and molecular information to uncover disease similarity underpinning the biology and etiology of disease comorbidity. METHODS We retrieved ~130 publications and retained 59, ranging from 2006 to 2015, that comprise a minimum number of five diseases and at least one type of biomolecule. We surveyed their methods, disease similarity metrics, and calculation of comorbidities in the electronic health records, if present. RESULTS Among the surveyed studies, 44% generated or validated disease similarity metrics in context of comorbidity, with 60% being published in the last two years. As inputs, 87% of studies utilized intragenic loci and proteins while 13% employed RNA (mRNA, LncRNA or miRNA). Network modeling was predominantly used (35%) followed by statistics (28%) to impute similarity between these biomolecules and diseases. Studies with large numbers of biomolecules and diseases used network models or naïve overlap of disease-molecule associations, while machine learning, statistics, and information retrieval were utilized in smaller and moderate sized studies. Multiscale computations comprising shared function, network topology, and phenotypes were performed exclusively on proteins. CONCLUSION This review highlighted the growing methods for identifying the molecular mechanisms underpinning comorbidities that leverage multiscale molecular information and patterns from electronic health records. The survey unveiled that intergenic polymorphisms have been overlooked for similarity imputation compared to their intragenic counterparts, offering new opportunities to bridge the mechanistic and similarity gaps of comorbidity.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Y A Lussier
- Dr. Yves A. Lussier, The University of Arizona, Bio5 Building, 1657 East Helen Street, Tucson, AZ 85721, USA, Fax: +1 520 626 4824, E-Mail:
| |
Collapse
|
45
|
Abstract
In recent years several methods have been proposed to assign pairwise mechanism- based similarity scores to human diseases. Despite their differences in approach and performance, these methods work in a somewhat similar manner: first a set of biomolecules (genes, proteins, chemicals, etc.) is associated with each disease, and then a measure is defined to calculate the similarity between the sets assigned to a pair of diseases. Since the similarity score between two diseases is defined based on the underlying molecular processes, a high score may hint at a shared cause, and therefore a similar treatment, for both diseases. This is of great practical importance especially when a rare or newly-discovered disease, for which limited information is available, is found to be related to a disease with a known treatment. Thus, in this mini-review we briefly discuss the recently developed methods for computing mechanism-based disease- disease similarities.
Collapse
Affiliation(s)
- Mehdi B Hamaneh
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
46
|
Integrated Post-GWAS Analysis Sheds New Light on the Disease Mechanisms of Schizophrenia. Genetics 2016; 204:1587-1600. [PMID: 27754856 DOI: 10.1534/genetics.116.187195] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 09/30/2016] [Indexed: 11/18/2022] Open
Abstract
Schizophrenia is a severe mental disorder with a large genetic component. Recent genome-wide association studies (GWAS) have identified many schizophrenia-associated common variants. For most of the reported associations, however, the underlying biological mechanisms are not clear. The critical first step for their elucidation is to identify the most likely disease genes as the source of the association signals. Here, we describe a general computational framework of post-GWAS analysis for complex disease gene prioritization. We identify 132 putative schizophrenia risk genes in 76 risk regions spanning 120 schizophrenia-associated common variants, 78 of which have not been recognized as schizophrenia disease genes by previous GWAS. Even more significantly, 29 of them are outside the risk regions, likely under regulation of transcriptional regulatory elements contained therein. These putative schizophrenia risk genes are transcriptionally active in both brain and the immune system, and highly enriched among cellular pathways, consistent with leading pathophysiological hypotheses about the pathogenesis of schizophrenia. With their involvement in distinct biological processes, these putative schizophrenia risk genes, with different association strengths, show distinctive temporal expression patterns, and play specific biological roles during brain development.
Collapse
|
47
|
OAHG: an integrated resource for annotating human genes with multi-level ontologies. Sci Rep 2016; 6:34820. [PMID: 27703231 PMCID: PMC5050487 DOI: 10.1038/srep34820] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 09/20/2016] [Indexed: 01/04/2023] Open
Abstract
OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ2 = 0.2428, p < 2.2e-16).
Collapse
|
48
|
PoplarGene: poplar gene network and resource for mining functional information for genes from woody plants. Sci Rep 2016; 6:31356. [PMID: 27515999 PMCID: PMC4981870 DOI: 10.1038/srep31356] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 07/18/2016] [Indexed: 01/05/2023] Open
Abstract
Poplar is not only an important resource for the production of paper, timber and other wood-based products, but it has also emerged as an ideal model system for studying woody plants. To better understand the biological processes underlying various traits in poplar, e.g., wood development, a comprehensive functional gene interaction network is highly needed. Here, we constructed a genome-wide functional gene network for poplar (covering ~70% of the 41,335 poplar genes) and created the network web service PoplarGene, offering comprehensive functional interactions and extensive poplar gene functional annotations. PoplarGene incorporates two network-based gene prioritization algorithms, neighborhood-based prioritization and context-based prioritization, which can be used to perform gene prioritization in a complementary manner. Furthermore, the co-functional information in PoplarGene can be applied to other woody plant proteomes with high efficiency via orthology transfer. In addition to poplar gene sequences, the webserver also accepts Arabidopsis reference gene as input to guide the search for novel candidate functional genes in PoplarGene. We believe that PoplarGene (http://bioinformatics.caf.ac.cn/PoplarGene and http://124.127.201.25/PoplarGene) will greatly benefit the research community, facilitating studies of poplar and other woody plants.
Collapse
|
49
|
Chen HR, Sherr DH, Hu Z, DeLisi C. A network based approach to drug repositioning identifies plausible candidates for breast cancer and prostate cancer. BMC Med Genomics 2016; 9:51. [PMID: 27475327 PMCID: PMC4967295 DOI: 10.1186/s12920-016-0212-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 07/20/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The high cost and the long time required to bring drugs into commerce is driving efforts to repurpose FDA approved drugs-to find new uses for which they weren't intended, and to thereby reduce the overall cost of commercialization, and shorten the lag between drug discovery and availability. We report on the development, testing and application of a promising new approach to repositioning. METHODS Our approach is based on mining a human functional linkage network for inversely correlated modules of drug and disease gene targets. The method takes account of multiple information sources, including gene mutation, gene expression, and functional connectivity and proximity of within module genes. RESULTS The method was used to identify candidates for treating breast and prostate cancer. We found that (i) the recall rate for FDA approved drugs for breast (prostate) cancer is 20/20 (10/11), while the rates for drugs in clinical trials were 131/154 and 82/106; (ii) the ROC/AUC performance substantially exceeds that of comparable methods; (iii) preliminary in vitro studies indicate that 5/5 candidates have therapeutic indices superior to that of Doxorubicin in MCF7 and SUM149 cancer cell lines. We briefly discuss the biological plausibility of the candidates at a molecular level in the context of the biological processes that they mediate. CONCLUSIONS Our method appears to offer promise for the identification of multi-targeted drug candidates that can correct aberrant cellular functions. In particular the computational performance exceeded that of other CMap-based methods, and in vitro experiments indicate that 5/5 candidates have therapeutic indices superior to that of Doxorubicin in MCF7 and SUM149 cancer cell lines. The approach has the potential to provide a more efficient drug discovery pipeline.
Collapse
Affiliation(s)
- Hsiao-Rong Chen
- Bioinformatics Program, College of Engineering, Boston University, Boston, MA, USA.,Graduate Program in Translational Molecular Medicine, Boston University School of Medicine, Boston, MA, USA
| | - David H Sherr
- Department of Environmental Health, Boston University School of Public Health, Boston, MA, USA
| | - Zhenjun Hu
- Bioinformatics Program, College of Engineering, Boston University, Boston, MA, USA
| | - Charles DeLisi
- Bioinformatics Program, College of Engineering, Boston University, Boston, MA, USA. .,Department of Biomedical Engineering, Boston University, Boston, MA, USA.
| |
Collapse
|
50
|
Jalili M, Salehzadeh-Yazdi A, Yaghmaie M, Ghavamzadeh A, Alimoghaddam K. Cancerome: A hidden informative subnetwork of the diseasome. Comput Biol Med 2016; 76:173-7. [PMID: 27468170 DOI: 10.1016/j.compbiomed.2016.07.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2016] [Revised: 06/30/2016] [Accepted: 07/18/2016] [Indexed: 11/18/2022]
Abstract
Neoplastic disorders are a leading cause of mortality and morbidity worldwide. Studying the relationships between different cancers using high throughput-generated data may elucidate undisclosed aspects of cancer etiology, diagnosis, and treatment. Several studies have described relationships between different diseases based on genes, proteins, pathways, gene ontology, comorbidity, symptoms, and other features. In this study, we first constructed an integrated human disease network based on nine different biological aspects, including molecular, functional, and clinical features. Next, we extracted the cancerome as a cancer-related subnetwork. Further investigation of cancerome could reveal hidden mechanisms of cancer and could be useful in developing new diagnostic tests and effective new drugs.
Collapse
Affiliation(s)
- Mahdi Jalili
- Hematology, Oncology and SCT Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Salehzadeh-Yazdi
- Hematology, Oncology and SCT Research Center, Tehran University of Medical Sciences, Tehran, Iran; Department of Systems Biology and Bioinformatics, University of Rostock, 18051 Rostock, Germany
| | - Marjan Yaghmaie
- Hematology, Oncology and SCT Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Ardeshir Ghavamzadeh
- Hematology, Oncology and SCT Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Kamran Alimoghaddam
- Hematology, Oncology and SCT Research Center, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|