1
|
Cassan O, Lecellier CH, Martin A, Bréhélin L, Lèbre S. Optimizing data integration improves gene regulatory network inference in Arabidopsis thaliana. Bioinformatics 2024; 40:btae415. [PMID: 38913855 PMCID: PMC11227367 DOI: 10.1093/bioinformatics/btae415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/12/2024] [Accepted: 06/21/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATIONS Gene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. RESULTS We address this issue for two regression-based GRN inference models, a weighted random forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. AVAILABILITY AND IMPLEMENTATION The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction.
Collapse
Affiliation(s)
- Océane Cassan
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
| | - Charles-Henri Lecellier
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IGMM, Univ Montpellier, CNRS, Montpellier, 34090, France
| | - Antoine Martin
- IPSIM, CNRS, INRAE, Institut Agro, Univ Montpellier, 34060, Montpellier, France
| | | | - Sophie Lèbre
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IMAG, Univ Montpellier, CNRS, Montpellier, 34090, France
- Université Paul-Valéry-Montpellier 3, Montpellier, 34090, France
| |
Collapse
|
2
|
Yu Y, Wang L, Hou W, Xue Y, Liu X, Li Y. Identification and validation of aging-related genes in heart failure based on multiple machine learning algorithms. Front Immunol 2024; 15:1367235. [PMID: 38686376 PMCID: PMC11056574 DOI: 10.3389/fimmu.2024.1367235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/03/2024] [Indexed: 05/02/2024] Open
Abstract
Background In the face of continued growth in the elderly population, the need to understand and combat age-related cardiac decline becomes even more urgent, requiring us to uncover new pathological and cardioprotective pathways. Methods We obtained the aging-related genes of heart failure through WGCNA and CellAge database. We elucidated the biological functions and signaling pathways involved in heart failure and aging through GO and KEGG enrichment analysis. We used three machine learning algorithms: LASSO, RF and SVM-RFE to further screen the aging-related genes of heart failure, and fitted and verified them through a variety of machine learning algorithms. We searched for drugs to treat age-related heart failure through the DSigDB database. Finally, We use CIBERSORT to complete immune infiltration analysis of aging samples. Results We obtained 57 up-regulated and 195 down-regulated aging-related genes in heart failure through WGCNA and CellAge databases. GO and KEGG enrichment analysis showed that aging-related genes are mainly involved in mechanisms such as Cellular senescence and Cell cycle. We further screened aging-related genes through machine learning and obtained 14 key genes. We verified the results on the test set and 2 external validation sets using 15 machine learning algorithm models and 207 combinations, and the highest accuracy was 0.911. Through screening of the DSigDB database, we believe that rimonabant and lovastatin have the potential to delay aging and protect the heart. The results of immune infiltration analysis showed that there were significant differences between Macrophages M2 and T cells CD8 in aging myocardium. Conclusion We identified aging signature genes and potential therapeutic drugs for heart failure through bioinformatics and multiple machine learning algorithms, providing new ideas for studying the mechanism and treatment of age-related cardiac decline.
Collapse
Affiliation(s)
- Yiding Yu
- Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Lin Wang
- Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Wangjun Hou
- Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yitao Xue
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Xiujuan Liu
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yan Li
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| |
Collapse
|
3
|
Mousavi R, Lobo D. Automatic design of gene regulatory mechanisms for spatial pattern formation. NPJ Syst Biol Appl 2024; 10:35. [PMID: 38565850 PMCID: PMC10987498 DOI: 10.1038/s41540-024-00361-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/19/2024] [Indexed: 04/04/2024] Open
Abstract
Gene regulatory mechanisms (GRMs) control the formation of spatial and temporal expression patterns that can serve as regulatory signals for the development of complex shapes. Synthetic developmental biology aims to engineer such genetic circuits for understanding and producing desired multicellular spatial patterns. However, designing synthetic GRMs for complex, multi-dimensional spatial patterns is a current challenge due to the nonlinear interactions and feedback loops in genetic circuits. Here we present a methodology to automatically design GRMs that can produce any given two-dimensional spatial pattern. The proposed approach uses two orthogonal morphogen gradients acting as positional information signals in a multicellular tissue area or culture, which constitutes a continuous field of engineered cells implementing the same designed GRM. To efficiently design both the circuit network and the interaction mechanisms-including the number of genes necessary for the formation of the target spatial pattern-we developed an automated algorithm based on high-performance evolutionary computation. The tolerance of the algorithm can be configured to design GRMs that are either simple to produce approximate patterns or complex to produce precise patterns. We demonstrate the approach by automatically designing GRMs that can produce a diverse set of synthetic spatial expression patterns by interpreting just two orthogonal morphogen gradients. The proposed framework offers a versatile approach to systematically design and discover complex genetic circuits producing spatial patterns.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA
| | - Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA.
- Greenebaum Comprehensive Cancer Center and Center for Stem Cell Biology & Regenerative Medicine, University of Maryland, Baltimore, Baltimore, MD, USA.
| |
Collapse
|
4
|
Li Y, Hu Y, Jiang F, Chen H, Xue Y, Yu Y. Combining WGCNA and machine learning to identify mechanisms and biomarkers of ischemic heart failure development after acute myocardial infarction. Heliyon 2024; 10:e27165. [PMID: 38455553 PMCID: PMC10918227 DOI: 10.1016/j.heliyon.2024.e27165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 01/15/2024] [Accepted: 02/26/2024] [Indexed: 03/09/2024] Open
Abstract
Background Ischemic heart failure (IHF) is a serious complication after acute myocardial infarction (AMI). Understanding the mechanism of IHF after AMI will help us conduct early diagnosis and treatment. Methods We obtained the AMI dataset GSE66360 and the IHF dataset GSE57338 from the GEO database, and screened overlapping genes common to both diseases through WGCNA analysis. Subsequently, we performed GO and KEGG enrichment analysis on overlapping genes to elucidate the common mechanism of AMI and IHF. Machine learning algorithms are also used to identify key biomarkers. Finally, we performed immune cell infiltration analysis on the dataset to further evaluate immune cell changes in AMI and IHF. Results We obtained 74 overlapping genes of AMI and IHF through WGCNA analysis, and the enrichment analysis results mainly focused on immune and inflammation-related mechanisms. Through the three machine learning algorithms of LASSO, RF and SVM-RFE, we finally obtained the four Hub genes of IL1B, TIMP2, IFIT3, and P2RY2, and verified them in the IHF dataset GSE116250, and the diagnostic model AUC = 0.907. The results of immune infiltration analysis showed that 8 types of immune cells were significantly different in AMI samples, and 6 types of immune cells were significantly different in IHF samples. Conclusion We explored the mechanism of IHF after AMI by WGCNA, enrichment analysis, and immune infiltration analysis. Four potential diagnostic candidate genes and therapeutic targets were identified by machine learning algorithms. This provides a new idea for the pathogenesis, diagnosis, and treatment of IHF after AMI.
Collapse
Affiliation(s)
- Yan Li
- Shandong University of Traditional Chinese Medicine, Jinan, 250014, China
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250014, China
| | - Ying Hu
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250014, China
| | - Feng Jiang
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250014, China
| | - Haoyu Chen
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250014, China
| | - Yitao Xue
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250014, China
| | - Yiding Yu
- Shandong University of Traditional Chinese Medicine, Jinan, 250014, China
| |
Collapse
|
5
|
Shi S, Guo Y, Wang Q, Huang Y. Artificial neural network-based gene screening and immune cell infiltration analysis of osteosarcoma feature. J Gene Med 2024; 26:e3622. [PMID: 37964329 DOI: 10.1002/jgm.3622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 10/10/2023] [Accepted: 10/15/2023] [Indexed: 11/16/2023] Open
Abstract
BACKGROUND The present study aimed to construct an artificial neural network (ANN) model that leverages characteristic genes associated with osteosarcoma (OS) to enable accurate prognostication for OS patients. METHODS Our research revealed 467 differentially expressed genes (DEGs) via gene expression contrast analysis, consisting of 345 downregulated genes and 122 upregulated genes. Gene Ontology (GO) enrichment analysis illuminated functions primarily encompassing T-cell activation, secretory granule lumen and antioxidant activity, among others. Through Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, we discovered significant correlations between the DEGs and certain pathways, including phagosome, Staphylococcus aureus infection and human T-cell leukemia virus 1 infection. We then screened out 30 characteristic DEGs (CDEGs) based on random forest analysis and constructed the ANN model using the gene score matrix. To verify the credibility and accuracy of the ANN model, we performed internal and external validation processes, which affirmed our model's predictive capabilities. RESULTS The study further delved into the analysis of immune cell infiltration and its correlation with the target CDEGs, revealing disparities in the infiltration of 22 types of immune cells across different groups and their interrelationships. Moreover, we probed the expression of the two foremost CDEGs (YES1 and MFNG) in OS and normal tissues. We noted a positive relationship between the expression of YES1 and MFNG in OS tissues and the clinicopathological characteristics of OS patients. CONCLUSIONS Collectively, the findings of the present study validate the effectiveness of the CDEGs-based ANN model in predicting OS patients, which might facilitate early diagnosis and treatment of OS.
Collapse
Affiliation(s)
- Shaoyan Shi
- Department of Hand Surgery, Xi'an Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Yunshan Guo
- Department of Hand Surgery, Xi'an Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Qian Wang
- Department of Hand Surgery, Xi'an Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Yansheng Huang
- Department of Hand Surgery, Xi'an Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| |
Collapse
|
6
|
Ni L, Yu Q, You R, Chen C, Peng B. Development of the RF-GSEA Method for Identifying Disulfidptosis-Related Genes and Application in Hepatocellular Carcinoma. Curr Issues Mol Biol 2023; 45:9450-9470. [PMID: 38132439 PMCID: PMC10741996 DOI: 10.3390/cimb45120593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 12/23/2023] Open
Abstract
Disulfidptosis is a newly discovered cellular programmed cell death mode. Presently, a considerable number of genes related to disulfidptosis remain undiscovered, and its significance in hepatocellular carcinoma remains unrevealed. We have developed a powerful analytical method called RF-GSEA for identifying potential genes associated with disulfidptosis. This method draws inspiration from gene regulation networks and graph theory, and it is implemented through a combination of random forest regression model and Gene Set Enrichment Analysis. Subsequently, to validate the practical application value of this method, we applied it to hepatocellular carcinoma. Based on the RF-GSEA method, we developed a disulfidptosis-related signature. Lastly, we looked into how the disulfidptosis-related signature is connected to HCC prognosis, the tumor microenvironment, the effectiveness of immunotherapy, and the sensitivity of chemotherapy drugs. The RF-GSEA method identified a total of 220 disulfidptosis-related genes, from which 7 were selected to construct the disulfidptosis-related signature. The high-disulfidptosis-related score group had a worse prognosis compared to the low-disulfidptosis-related score group and showed lower infiltration levels of immune-promoting cells. The high-disulfidptosis-related score group had a higher likelihood of benefiting from immunotherapy compared to the low-disulfidptosis-related score group. The RF-GSEA method is a powerful tool for identifying disulfidptosis-related genes. The disulfidptosis-related signature effectively predicts HCC prognosis, immunotherapy response, and drug sensitivity.
Collapse
Affiliation(s)
| | | | | | | | - Bin Peng
- School of Public Health, Chongqing Medical University, Chongqing 400016, China
| |
Collapse
|
7
|
Yu Y, Liu X, Xue Y, Li Y. Identification of immune-related genes for the diagnosis of ischemic heart failure based on bioinformatics. iScience 2023; 26:108121. [PMID: 37867954 PMCID: PMC10587531 DOI: 10.1016/j.isci.2023.108121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/14/2023] [Accepted: 09/29/2023] [Indexed: 10/24/2023] Open
Abstract
The role of immune cells in the pathogenesis of ischemic heart failure (IHF) is well-established. However, identifying key genes in patients with IHF remains a challenge. We obtained two IHF datasets from the GEO database (GSE76701 and GSE21610), and identified four potential diagnostic candidate genes for IHF by using bioinformatics and machine learning algorithms, namely RNASE2, MFAP4, CHRDL1, and KCNN3. We constructed nomogram and validated the diagnostic value of these genes on additional GEO datasets (GSE57338). The results showed that these four genes had high diagnostic value (area under the curve value of 0.961). Furthermore, our immune infiltration analysis revealed the presence of three dysregulated immune cells in IHF, namely macrophages M2, monocytes, and T cells gamma delta. We also explored the potential molecular mechanisms of IHF. These findings provide new insights into the pathogenesis, diagnosis, and treatment of IHF.
Collapse
Affiliation(s)
- Yiding Yu
- Shandong University of Traditional Chinese Medicine, Jinan 250014, China
| | - Xiujuan Liu
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan 250014, China
| | - Yitao Xue
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan 250014, China
| | - Yan Li
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan 250014, China
| |
Collapse
|
8
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
9
|
Hsieh PH, Lopes-Ramos CM, Zucknick M, Sandve GK, Glass K, Kuijjer ML. Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data. Bioinformatics 2023; 39:btad610. [PMID: 37802917 PMCID: PMC10598588 DOI: 10.1093/bioinformatics/btad610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 08/05/2023] [Accepted: 10/05/2023] [Indexed: 10/08/2023] Open
Abstract
MOTIVATION Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples. Coordinated expression of genes may indicate that they are controlled by the same transcriptional regulatory program, or involved in common biological processes. Gene co-expression is generally estimated from RNA-Sequencing data, which are commonly normalized to remove technical variability. Here, we demonstrate that certain normalization methods, in particular quantile-based methods, can introduce false-positive associations between genes. These false-positive associations can consequently hamper downstream co-expression network analysis. Quantile-based normalization can, however, be extremely powerful. In particular, when preprocessing large-scale heterogeneous data, quantile-based normalization methods such as smooth quantile normalization can be applied to remove technical variability while maintaining global differences in expression for samples with different biological attributes. RESULTS We developed SNAIL (Smooth-quantile Normalization Adaptation for the Inference of co-expression Links), a normalization method based on smooth quantile normalization specifically designed for modeling of co-expression measurements. We show that SNAIL avoids formation of false-positive associations in co-expression as well as in downstream network analyses. Using SNAIL, one can avoid arbitrary gene filtering and retain associations to genes that only express in small subgroups of samples. This highlights the method's potential future impact on network modeling and other association-based approaches in large-scale heterogeneous data. AVAILABILITY AND IMPLEMENTATION The implementation of the SNAIL algorithm and code to reproduce the analyses described in this work can be found in the GitHub repository https://github.com/kuijjerlab/PySNAIL.
Collapse
Affiliation(s)
- Ping-Han Hsieh
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, Oslo 0318, Norway
- Department of Informatics, University of Oslo, Oslo 0316, Norway
| | - Camila Miranda Lopes-Ramos
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, United States
| | - Manuela Zucknick
- Oslo Centre for Biostatistics and Epidemiology, Institute of Basic Medical Sciences, University of Oslo, Oslo 0317, Norway
| | | | - Kimberly Glass
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, United States
| | - Marieke Lydia Kuijjer
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, Oslo 0318, Norway
- Department of Pathology, Leiden University Medical Center, Leiden 2300RC, The Netherlands
- Leiden Center of Computational Oncology, Leiden University Medical Center,Leiden 2300RC, The Netherlands
| |
Collapse
|
10
|
Henao JD, Lauber M, Azevedo M, Grekova A, Theis F, List M, Ogris C, Schubert B. Multi-omics regulatory network inference in the presence of missing data. Brief Bioinform 2023; 24:bbad309. [PMID: 37670505 PMCID: PMC10516394 DOI: 10.1093/bib/bbad309] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 05/06/2023] [Accepted: 05/29/2023] [Indexed: 09/07/2023] Open
Abstract
A key problem in systems biology is the discovery of regulatory mechanisms that drive phenotypic behaviour of complex biological systems in the form of multi-level networks. Modern multi-omics profiling techniques probe these fundamental regulatory networks but are often hampered by experimental restrictions leading to missing data or partially measured omics types for subsets of individuals due to cost restrictions. In such scenarios, in which missing data is present, classical computational approaches to infer regulatory networks are limited. In recent years, approaches have been proposed to infer sparse regression models in the presence of missing information. Nevertheless, these methods have not been adopted for regulatory network inference yet. In this study, we integrated regression-based methods that can handle missingness into KiMONo, a Knowledge guided Multi-Omics Network inference approach, and benchmarked their performance on commonly encountered missing data scenarios in single- and multi-omics studies. Overall, two-step approaches that explicitly handle missingness performed best for a wide range of random- and block-missingness scenarios on imbalanced omics-layers dimensions, while methods implicitly handling missingness performed best on balanced omics-layers dimensions. Our results show that robust multi-omics network inference in the presence of missing data with KiMONo is feasible and thus allows users to leverage available multi-omics data to its full extent.
Collapse
Affiliation(s)
- Juan D Henao
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
| | - Michael Lauber
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354 Freising
| | - Manuel Azevedo
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
| | - Anastasiia Grekova
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
| | - Fabian Theis
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
- Department of Mathematics, Technical University of Munich, 85748 Garching bei München, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354 Freising
| | - Christoph Ogris
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
| | - Benjamin Schubert
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
- Department of Mathematics, Technical University of Munich, 85748 Garching bei München, Germany
| |
Collapse
|
11
|
Luo L, Lin H, Huang J, Lin B, Huang F, Luo H. Risk factors and prognostic nomogram for patients with second primary cancers after lung cancer using classical statistics and machine learning. Clin Exp Med 2023; 23:1609-1620. [PMID: 35821159 DOI: 10.1007/s10238-022-00858-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 06/20/2022] [Indexed: 11/03/2022]
Abstract
Previous studies have revealed an increased risk of secondary primary cancers (SPC) after lung cancer. The prognostic prediction models for SPC patients after lung cancer are particularly needed to guide screening. Therefore, we study retrospectively analyzed the Surveillance, Epidemiology, and End Results (SEER) database using classical statistics and machine learning to explore the risk factors and construct a novel overall survival (OS) prediction nomogram for patients with SPC after lung cancer. Data of patients with SPC after lung cancer, covering 2000 to 2016, were gathered from the SEER database. The incidence of SPC after lung cancer was calculated by Standardized incidence ratios (SIRs). Cox proportional hazards regression, machine learning (ML), Kaplan-Meier (KM) methods, and log-rank tests were conducted to identify the important prognostic factors for predicting OS. These significant prognostic factors were used for the development of an OS prediction nomogram. Totally, 10,487 SPC samples were randomly divided into training and validation cohorts (model construction and internal validation) from the SEER database. In the random forest (RF) and extreme gradient boosting (XGBoost) feature importance ranking models, age was the most important variable which was also reflected in the nomogram. And, the models that combined machine learning with cox proportional hazards had a better predictive performance than the model that only used cox proportional hazards (AUC = 0.762 in RF, AUC = 0.737 in XGBoost, AUC = 0.722 in COX). Calibration curves and decision curve analysis (DCA) curves also revealed that our nomogram has excellent clinical utility. The web-based dynamic nomogram calculator was accessible on https://httseer.shinyapps.io/DynNomapp/ . The prognosis characteristics of SPC following lung cancer were systematically reviewed. The dynamic nomogram we constructed can provide survival predictions to assist clinicians in making individualized decisions.
Collapse
Affiliation(s)
- Lianxiang Luo
- The Marine Biomedical Research Institute, Guangdong Medical University, Zhanjiang, 524023, Guangdong, China.
- Southern Marine Science and Engineering Guangdong Laboratory (Zhanjiang), Zhanjiang, 524023, Guangdong, China.
- The Marine Biomedical Research Institute of Guangdong Zhanjiang, Zhanjiang, 524023, Guangdong, China.
| | - Haowen Lin
- The First Clinical College, Guangdong Medical University, Zhanjiang, 524023, Guangdong, China
| | - Jiahui Huang
- The First Clinical College, Guangdong Medical University, Zhanjiang, 524023, Guangdong, China
| | - Baixin Lin
- The First Clinical College, Guangdong Medical University, Zhanjiang, 524023, Guangdong, China
| | - Fangfang Huang
- Graduate School, Guangdong Medical University, Zhanjiang, 524023, Guangdong, China
| | - Hui Luo
- The Marine Biomedical Research Institute, Guangdong Medical University, Zhanjiang, 524023, Guangdong, China
- Southern Marine Science and Engineering Guangdong Laboratory (Zhanjiang), Zhanjiang, 524023, Guangdong, China
- The Marine Biomedical Research Institute of Guangdong Zhanjiang, Zhanjiang, 524023, Guangdong, China
| |
Collapse
|
12
|
Mousavi R, Lobo D. Automatic design of gene regulatory mechanisms for spatial pattern formation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.26.550573. [PMID: 37546866 PMCID: PMC10402059 DOI: 10.1101/2023.07.26.550573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Synthetic developmental biology aims to engineer gene regulatory mechanisms (GRMs) for understanding and producing desired multicellular patterns and shapes. However, designing GRMs for spatial patterns is a current challenge due to the nonlinear interactions and feedback loops in genetic circuits. Here we present a methodology to automatically design GRMs that can produce any given spatial pattern. The proposed approach uses two orthogonal morphogen gradients acting as positional information signals in a multicellular tissue area or culture, which constitutes a continuous field of engineered cells implementing the same designed GRM. To efficiently design both the circuit network and the interaction mechanisms-including the number of genes necessary for the formation of the target pattern-we developed an automated algorithm based on high-performance evolutionary computation. The tolerance of the algorithm can be configured to design GRMs that are either simple to produce approximate patterns or complex to produce precise patterns. We demonstrate the approach by automatically designing GRMs that can produce a diverse set of synthetic spatial expression patterns by interpreting just two orthogonal morphogen gradients. The proposed framework offers a versatile approach to systematically design and discover pattern-producing genetic circuits.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
- Greenebaum Comprehensive Cancer Center and Center for Stem Cell Biology & Regenerative Medicine, University of Maryland, School of Medicine, 22 S. Greene Street, Baltimore, MD 21201, USA
| |
Collapse
|
13
|
Young T, Laroche O, Walker SP, Miller MR, Casanovas P, Steiner K, Esmaeili N, Zhao R, Bowman JP, Wilson R, Bridle A, Carter CG, Nowak BF, Alfaro AC, Symonds JE. Prediction of Feed Efficiency and Performance-Based Traits in Fish via Integration of Multiple Omics and Clinical Covariates. BIOLOGY 2023; 12:1135. [PMID: 37627019 PMCID: PMC10452023 DOI: 10.3390/biology12081135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023]
Abstract
Fish aquaculture is a rapidly expanding global industry, set to support growing demands for sources of marine protein. Enhancing feed efficiency (FE) in farmed fish is required to reduce production costs and improve sector sustainability. Recognising that organisms are complex systems whose emerging phenotypes are the product of multiple interacting molecular processes, systems-based approaches are expected to deliver new biological insights into FE and growth performance. Here, we establish 14 diverse layers of multi-omics and clinical covariates to assess their capacities to predict FE and associated performance traits in a fish model (Oncorhynchus tshawytscha) and uncover the influential variables. Inter-omic relatedness between the different layers revealed several significant concordances, particularly between datasets originating from similar material/tissue and between blood indicators and some of the proteomic (liver), metabolomic (liver), and microbiomic layers. Single- and multi-layer random forest (RF) regression models showed that integration of all data layers provide greater FE prediction power than any single-layer model alone. Although FE was among the most challenging of the traits we attempted to predict, the mean accuracy of 40 different FE models in terms of root-mean square errors normalized to percentage was 30.4%, supporting RF as a feature selection tool and approach for complex trait prediction. Major contributions to the integrated FE models were derived from layers of proteomic and metabolomic data, with substantial influence also provided by the lipid composition layer. A correlation matrix of the top 27 variables in the models highlighted FE trait-associations with faecal bacteria (Serratia spp.), palmitic and nervonic acid moieties in whole body lipids, levels of free glycerol in muscle, and N-acetylglutamic acid content in liver. In summary, we identified subsets of molecular characteristics for the assessment of commercially relevant performance-based metrics in farmed Chinook salmon.
Collapse
Affiliation(s)
- Tim Young
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
- The Centre for Biomedical and Chemical Sciences, School of Science, Auckland University of Technology, Private Bag 92006, Auckland 1142, New Zealand
| | | | | | - Matthew R. Miller
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | | | | | - Noah Esmaeili
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Ruixiang Zhao
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - John P. Bowman
- Tasmanian Institute of Agricultural Research, University of Tasmania, Hobart 7005, Australia
| | - Richard Wilson
- Central Science Laboratory, Research Division, University of Tasmania, Hobart 7001, Australia
| | - Andrew Bridle
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Chris G. Carter
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
- Blue Economy Cooperative Research Centre, Launceston 7250, Australia
| | - Barbara F. Nowak
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Andrea C. Alfaro
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
| | - Jane E. Symonds
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| |
Collapse
|
14
|
Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol 2023; 19:e1011254. [PMID: 37561790 PMCID: PMC10414591 DOI: 10.1371/journal.pcbi.1011254] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Abstract
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.
Collapse
Affiliation(s)
- Malvina Marku
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Vera Pancaldi
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
15
|
Tian L, Wu W, Yu T. Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features. Biomolecules 2023; 13:1153. [PMID: 37509188 PMCID: PMC10377046 DOI: 10.3390/biom13071153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/26/2023] [Accepted: 06/30/2023] [Indexed: 07/30/2023] Open
Abstract
Random Forest (RF) is a widely used machine learning method with good performance on classification and regression tasks. It works well under low sample size situations, which benefits applications in the field of biology. For example, gene expression data often involve much larger numbers of features (p) compared to the size of samples (n). Though the predictive accuracy using RF is often high, there are some problems when selecting important genes using RF. The important genes selected by RF are usually scattered on the gene network, which conflicts with the biological assumption of functional consistency between effective features. To improve feature selection by incorporating external topological information between genes, we propose the Graph Random Forest (GRF) for identifying highly connected important features by involving the known biological network when constructing the forest. The algorithm can identify effective features that form highly connected sub-graphs and achieve equivalent classification accuracy to RF. To evaluate the capability of our proposed method, we conducted simulation experiments and applied the method to two real datasets-non-small cell lung cancer RNA-seq data from The Cancer Genome Atlas, and human embryonic stem cell RNA-seq dataset (GSE93593). The resulting high classification accuracy, connectivity of selected sub-graphs, and interpretable feature selection results suggest the method is a helpful addition to graph-based classification models and feature selection procedures.
Collapse
Affiliation(s)
- Leqi Tian
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
| | - Wenbin Wu
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
- Guangdong Provincial Key Laboratory of Big Data Computing, Shenzhen 518172, China
| |
Collapse
|
16
|
Mbebi AJ, Nikoloski Z. Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection. PLoS Comput Biol 2023; 19:e1010832. [PMID: 37523414 PMCID: PMC10414675 DOI: 10.1371/journal.pcbi.1010832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 08/10/2023] [Accepted: 07/11/2023] [Indexed: 08/02/2023] Open
Abstract
Despite extensive research efforts, reconstruction of gene regulatory networks (GRNs) from transcriptomics data remains a pressing challenge in systems biology. While non-linear approaches for reconstruction of GRNs show improved performance over simpler alternatives, we do not yet have understanding if joint modelling of multiple target genes may improve performance, even under linearity assumptions. To address this problem, we propose two novel approaches that cast the GRN reconstruction problem as a blend between regularized multivariate regression and graphical models that combine the L2,1-norm with classical regularization techniques. We used data and networks from the DREAM5 challenge to show that the proposed models provide consistently good performance in comparison to contenders whose performance varies with data sets from simulation and experiments from model unicellular organisms Escherichia coli and Saccharomyces cerevisiae. Since the models' formulation facilitates the prediction of master regulators, we also used the resulting findings to identify master regulators over all data sets as well as their plasticity across different environments. Our results demonstrate that the identified master regulators are in line with experimental evidence from the model bacterium E. coli. Together, our study demonstrates that simultaneous modelling of several target genes results in improved inference of GRNs and can be used as an alternative in different applications.
Collapse
Affiliation(s)
- Alain J. Mbebi
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| | - Zoran Nikoloski
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| |
Collapse
|
17
|
Cho H, Banf M, Shahzad Z, Van Leene J, Bossi F, Ruffel S, Bouain N, Cao P, Krouk G, De Jaeger G, Lacombe B, Brandizzi F, Rhee SY, Rouached H. ARSK1 activates TORC1 signaling to adjust growth to phosphate availability in Arabidopsis. Curr Biol 2023; 33:1778-1786.e5. [PMID: 36963384 PMCID: PMC10175222 DOI: 10.1016/j.cub.2023.03.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 02/12/2023] [Accepted: 03/02/2023] [Indexed: 03/26/2023]
Abstract
Nutrient sensing and signaling are essential for adjusting growth and development to available resources. Deprivation of the essential mineral phosphorus (P) inhibits root growth.1 The molecular processes that sense P limitation to trigger early root growth inhibition are not known yet. Target of rapamycin (TOR) kinase is a central regulatory hub in eukaryotes to adapt growth to internal and external nutritional cues.2,3 How nutritional signals are transduced to TOR to control plant growth remains unclear. Here, we identify Arabidopsis-root-specific kinase 1 (ARSK1), which attenuates initial root growth inhibition in response to P limitation. We demonstrate that ARSK1 phosphorylates and stabilizes the regulatory-associated protein of TOR 1B (RAPTOR1B), a component of the TOR complex 1, to adjust root growth to P availability. These findings uncover signaling components acting upstream of TOR to balance growth to P availability.
Collapse
Affiliation(s)
- Huikyong Cho
- The Plant Resilience Institute, Michigan State University, East Lansing, MI 48824, USA; Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Michael Banf
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Zaigham Shahzad
- Department of Life Sciences, Lahore University of Management Sciences, Lahore 54792, Pakistan
| | - Jelle Van Leene
- Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent, Belgium; VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| | - Flavia Bossi
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Sandrine Ruffel
- Institute for Plant Sciences of Montpellier, University Montpellier, CNRS, INRAE, Montpellier 34060, France
| | - Nadia Bouain
- Institute for Plant Sciences of Montpellier, University Montpellier, CNRS, INRAE, Montpellier 34060, France
| | - Pengfei Cao
- MSU DOE-Plant Research Laboratory, Michigan State University, East Lansing, MI 48824, USA
| | - Gabiel Krouk
- Institute for Plant Sciences of Montpellier, University Montpellier, CNRS, INRAE, Montpellier 34060, France
| | - Geert De Jaeger
- Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent, Belgium; VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| | - Benoit Lacombe
- Institute for Plant Sciences of Montpellier, University Montpellier, CNRS, INRAE, Montpellier 34060, France
| | - Federica Brandizzi
- MSU DOE-Plant Research Laboratory, Michigan State University, East Lansing, MI 48824, USA
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA.
| | - Hatem Rouached
- The Plant Resilience Institute, Michigan State University, East Lansing, MI 48824, USA; Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
18
|
Jihad M, Yet İ. Multiomics Integration at Single-Cell Resolution Using Bayesian Networks: A Case Study in Hepatocellular Carcinoma. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023; 27:24-33. [PMID: 36602810 DOI: 10.1089/omi.2022.0170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Multiomics data integration is one of the leading frontiers of complex disease research and integrative biology. The advances in single-cell sequencing technologies offer yet another crucial dimension in multiomics research. The single-cell studies enable the study and integration of multiomics data simultaneously in the same cell. We report in this study multiomics data integration in single-cell resolution using Bayesian networks (BNs) in a case study of hepatocellular carcinoma (HCC). A BN encodes the conditional dependencies/independencies of variables using a graphical model with an accompanying joint probability. RNA-seq and Reduced Representation Bisulfite Sequencing data were analyzed separately, and copy number variations were estimated by the hidden Markov model method. Several BN models were constructed to reveal omics' causal and associational relationships. These methods were subjected to a validation study using an independent data set. We show the heterogeneity of the multiple cellular layers of HCC at single-cell omics resolution by identifying best-fitted BN models of 295 genes. We also provide novel insights into the multiomics mechanistic relationships in the human lymphocyte antigen class I genes in HCC. To the best of our knowledge, this is the first study to focus on integrating omics data using a machine learning algorithm, BNs, at the single-cell resolution using a case study of HCC.
Collapse
Affiliation(s)
- Muntadher Jihad
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - İdil Yet
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| |
Collapse
|
19
|
Zhang K, Zhang C, Wang K, Teng X, Chen M. Identifying diagnostic markers and constructing a prognostic model for small-cell lung cancer based on blood exosome-related genes and machine-learning methods. Front Oncol 2022; 12:1077118. [PMID: 36620585 PMCID: PMC9814973 DOI: 10.3389/fonc.2022.1077118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022] Open
Abstract
Background Small-cell lung cancer (SCLC) usually presents as an extensive disease with a poor prognosis at the time of diagnosis. Exosomes are rich in biological information and have a powerful impact on tumor progression and metastasis. Therefore, this study aimed to screen for diagnostic markers of blood exosomes in SCLC patients and to build a prognostic model. Methods We identified blood exosome differentially expressed (DE) RNAs in the exoRBase cohort and identified feature RNAs by the LASSO, Random Forest, and SVM-REF three algorithms. Then, we identified DE genes (DEGs) between SCLC tissues and normal lung tissues in the GEO cohort and obtained exosome-associated DEGs (EDEGs) by intersection with exosomal DEmRNAs. Finally, we performed univariate Cox, LASSO, and multivariate Cox regression analyses on EDEGs to construct the model. We then compared the patients' overall survival (OS) between the two risk groups and assessed the independent prognostic value of the model using receiver operating characteristic (ROC) curve analysis. Results We identified 952 DEmRNAs, 210 DElncRNAs, and 190 DEcircRNAs in exosomes and identified 13 feature RNAs with good diagnostic value. Then, we obtained 274 EDEGs and constructed a risk model containing 7 genes (TBX21, ZFHX2, HIST2H2BE, LTBP1, SIAE, HIST1H2AL, and TSPAN9). Low-risk patients had a longer OS time than high-risk patients. The risk model can independently predict the prognosis of SCLC patients with the areas under the ROC curve (AUCs) of 0.820 at 1 year, 0.952 at 3 years, and 0.989 at 5 years. Conclusions We identified 13 valuable diagnostic markers in the exosomes of SCLC patients and constructed a new promising prognostic model for SCLC.
Collapse
|
20
|
Ismail E, Gad W, Hashem M. HEC-ASD: a hybrid ensemble-based classification model for predicting autism spectrum disorder disease genes. BMC Bioinformatics 2022; 23:554. [PMID: 36544099 PMCID: PMC9768984 DOI: 10.1186/s12859-022-05099-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
PURPOSE Autism spectrum disorder (ASD) is the most prevalent disease today. The causes of its infection may be attributed to genetic causes by 80% and environmental causes by 20%. In spite of this, the majority of the current research is concerned with environmental causes, and the least proportion with the genetic causes of the disease. Autism is a complex disease, which makes it difficult to identify the genes that cause the disease. METHODS Hybrid ensemble-based classification (HEC-ASD) model for predicting ASD genes using gradient boosting machines is proposed. The proposed model utilizes gene ontology (GO) to construct a gene functional similarity matrix using hybrid gene similarity (HGS) method. HGS measures the semantic similarity between genes effectively. It combines the graph-based method, such as Wang method with the number of directed children's nodes of gene term from GO. Moreover, an ensemble gradient boosting classifier is adapted to enhance the prediction of genes forming a robust classification model. RESULTS The proposed model is evaluated using the Simons Foundation Autism Research Initiative (SFARI) gene database. The experimental results are promising as they improve the classification performance for predicting ASD genes. The results are compared with other approaches that used gene regulatory network (GRN), protein to protein interaction network (PPI), or GO. The HEC-ASD model reaches the highest prediction accuracy of 0.88% using ensemble learning classifiers. CONCLUSION The proposed model demonstrates that ensemble learning technique using gradient boosting is effective in predicting autism spectrum disorder genes. Moreover, the HEC-ASD model utilized GO rather than using PPI network and GRN.
Collapse
Affiliation(s)
- Eman Ismail
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Walaa Gad
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Mohamed Hashem
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| |
Collapse
|
21
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
22
|
Wang Y, Huang Y, Yang M, Yu Y, Chen X, Ma L, Xiao L, Liu C, Liu B, Yuan X. Comprehensive Pan-Cancer Analyses of Immunogenic Cell Death as a Biomarker in Predicting Prognosis and Therapeutic Response. Cancers (Basel) 2022; 14:cancers14235952. [PMID: 36497433 PMCID: PMC9736000 DOI: 10.3390/cancers14235952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 11/29/2022] [Accepted: 11/29/2022] [Indexed: 12/04/2022] Open
Abstract
Immunogenic cell death (ICD), a form of regulated cell death, is related to anticancer therapy. Due to the absence of widely accepted markers, characterizing ICD-related phenotypes across cancer types remained unexplored. Here, we defined the ICD score to delineate the ICD landscape across 33 cancerous types and 31 normal tissue types based on transcriptomic, proteomic and epigenetics data from multiple databases. We found that ICD score showed cancer type-specific association with genomic and immune features. Importantly, the ICD score had the potential to predict therapy response and patient prognosis in multiple cancer types. We also developed an ICD-related prognostic model by machine learning and cox regression analysis. Single-cell level analysis revealed intra-tumor ICD state heterogeneity and communication between ICD-based clusters of T cells and other immune cells in the tumor microenvironment in colon cancer. For the first time, we identified IGF2BP3 as a potential ICD regulator in colon cancer. In conclusion, our study provides a comprehensive framework for evaluating the relation between ICD and clinical relevance, gaining insights into identification of ICD as a potential cancer-related biomarker and therapeutic target.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Bo Liu
- Correspondence: (B.L.); (X.Y.)
| | | |
Collapse
|
23
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
|
24
|
Hao Y, Lu L, Liu A, Lin X, Xiao L, Kong X, Li K, Liang F, Xiong J, Qu L, Li Y, Li J. Integrating bioinformatic strategies in spatial life science research. Brief Bioinform 2022; 23:bbac415. [PMID: 36198665 PMCID: PMC9677476 DOI: 10.1093/bib/bbac415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 08/15/2022] [Accepted: 08/27/2022] [Indexed: 12/14/2022] Open
Abstract
As space exploration programs progress, manned space missions will become more frequent and farther away from Earth, putting a greater emphasis on astronaut health. Through the collaborative efforts of researchers from various countries, the effect of the space environment factors on living systems is gradually being uncovered. Although a large number of interconnected research findings have been produced, their connection seems to be confused, and many unknown effects are left to be discovered. Simultaneously, several valuable data resources have emerged, accumulating data measuring biological effects in space that can be used to further investigate the unknown biological adaptations. In this review, the previous findings and their correlations are sorted out to facilitate the understanding of biological adaptations to space and the design of countermeasures. The biological effect measurement methods/data types are also organized to provide references for experimental design and data analysis. To aid deeper exploration of the data resources, we summarized common characteristics of the data generated from longitudinal experiments, outlined challenges or caveats in data analysis and provided corresponding solutions by recommending bioinformatics strategies and available models/tools.
Collapse
Affiliation(s)
- Yangyang Hao
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Liang Lu
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Anna Liu
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Xue Lin
- Department of Bioinformatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Li Xiao
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Xiaoyue Kong
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Kai Li
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Fengji Liang
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Jianghui Xiong
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Lina Qu
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Yinghui Li
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Jian Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| |
Collapse
|
25
|
Zhang H, Zhang N, Wu W, Zhou R, Li S, Wang Z, Dai Z, Zhang L, Liu Z, Zhang J, Luo P, Liu Z, Cheng Q. Machine learning-based tumor-infiltrating immune cell-associated lncRNAs for predicting prognosis and immunotherapy response in patients with glioblastoma. Brief Bioinform 2022; 23:6711411. [PMID: 36136350 DOI: 10.1093/bib/bbac386] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/29/2022] [Accepted: 08/10/2022] [Indexed: 12/14/2022] Open
Abstract
Long noncoding ribonucleic acids (RNAs; lncRNAs) have been associated with cancer immunity regulation. However, the roles of immune cell-specific lncRNAs in glioblastoma (GBM) remain largely unknown. In this study, a novel computational framework was constructed to screen the tumor-infiltrating immune cell-associated lncRNAs (TIIClnc) for developing TIIClnc signature by integratively analyzing the transcriptome data of purified immune cells, GBM cell lines and bulk GBM tissues using six machine learning algorithms. As a result, TIIClnc signature could distinguish survival outcomes of GBM patients across four independent datasets, including the Xiangya in-house dataset, and more importantly, showed superior performance than 95 previously established signatures in gliomas. TIIClnc signature was revealed to be an indicator of the infiltration level of immune cells and predicted the response outcomes of immunotherapy. The positive correlation between TIIClnc signature and CD8, PD-1 and PD-L1 was verified in the Xiangya in-house dataset. As a newly demonstrated predictive biomarker, the TIIClnc signature enabled a more precise selection of the GBM population who would benefit from immunotherapy and should be validated and applied in the near future.
Collapse
Affiliation(s)
- Hao Zhang
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China.,Department of Neurosurgery, The Second Affiliated Hospital, Chongqing Medical University, China
| | - Nan Zhang
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,One-third Lab, College of Bioinformatics Science and Technology, Harbin Medical University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Wantao Wu
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,Department of Oncology, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Ran Zhou
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, University of Manchester, UK
| | - Shuyu Li
- Department of Thyroid and Breast Surgery, Tongji Hospital, Tongji Medical College of Huazhong University of Science and Technology, China
| | - Zeyu Wang
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Ziyu Dai
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Liyang Zhang
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Zaoqu Liu
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou, China
| | - Jian Zhang
- Department of Oncology, Zhujiang Hospital, Southern Medical University, China
| | - Peng Luo
- Department of Oncology, Zhujiang Hospital, Southern Medical University, China
| | - Zhixiong Liu
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Quan Cheng
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| |
Collapse
|
26
|
Hawe JS, Saha A, Waldenberger M, Kunze S, Wahl S, Müller-Nurasyid M, Prokisch H, Grallert H, Herder C, Peters A, Strauch K, Theis FJ, Gieger C, Chambers J, Battle A, Heinig M. Network reconstruction for trans acting genetic loci using multi-omics data and prior information. Genome Med 2022; 14:125. [PMID: 36344995 PMCID: PMC9641770 DOI: 10.1186/s13073-022-01124-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/11/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Molecular measurements of the genome, the transcriptome, and the epigenome, often termed multi-omics data, provide an in-depth view on biological systems and their integration is crucial for gaining insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans-QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information can improve network inference. However, previous efforts were limited in the types of priors used or have only been applied to model systems. In this study, we reconstruct the regulatory networks underlying trans-QTL hotspots using human cohort data and data-driven prior information. METHODS We devised a new strategy to integrate QTL with human population scale multi-omics data. State-of-the art network inference methods including BDgraph and glasso were applied to these data. Comprehensive prior information to guide network inference was manually curated from large-scale biological databases. The inference approach was extensively benchmarked using simulated data and cross-cohort replication analyses. Best performing methods were subsequently applied to real-world human cohort data. RESULTS Our benchmarks showed that prior-based strategies outperform methods without prior information in simulated data and show better replication across datasets. Application of our approach to human cohort data highlighted two novel regulatory networks related to schizophrenia and lean body mass for which we generated novel functional hypotheses. CONCLUSIONS We demonstrate that existing biological knowledge can improve the integrative analysis of networks underlying trans associations and generate novel hypotheses about regulatory mechanisms.
Collapse
Affiliation(s)
- Johann S Hawe
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Heart Centre Munich, Department of Cardiology, Technical University Munich, Munich, Germany.,Department of Informatics, Technical University of Munich, Garching, Germany
| | - Ashis Saha
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Waldenberger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Sonja Kunze
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Simone Wahl
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,IBE, Faculty of Medicine, LMU Munich, 81377, Munich, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Department of Internal Medicine I (Cardiology), Hospital of the Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Holger Prokisch
- Institute of Human Genetics, School of Medicine, Technische Universität München, Munich, Germany
| | - Harald Grallert
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Christian Herder
- German Center for Diabetes Research (DZD), Neuherberg, Germany.,Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Düsseldorf, Germany.,Division of Endocrinology and Diabetology, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Annette Peters
- Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Konstantin Strauch
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU Munich, Munich, Germany
| | - Fabian J Theis
- Department of Informatics, Technical University of Munich, Garching, Germany.,Department of Mathematics, Technical University of Munich, Garching, Germany
| | - Christian Gieger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - John Chambers
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK.,Lee Kong Chian School of Medicine, Nanyang Technological University, 308232, Singapore, Singapore
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Matthias Heinig
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany. .,Department of Informatics, Technical University of Munich, Garching, Germany. .,Munich Heart Association, Partner Site Munich, DZHK (German Centre for Cardiovascular Research), 10785, Berlin, Germany.
| |
Collapse
|
27
|
Cummins B, Motta FC, Moseley RC, Deckard A, Campione S, Gameiro M, Gedeon T, Mischaikow K, Haase SB. Experimental guidance for discovering genetic networks through hypothesis reduction on time series. PLoS Comput Biol 2022; 18:e1010145. [PMID: 36215333 PMCID: PMC9584434 DOI: 10.1371/journal.pcbi.1010145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 10/20/2022] [Accepted: 09/05/2022] [Indexed: 11/19/2022] Open
Abstract
Large programs of dynamic gene expression, like cell cyles and circadian rhythms, are controlled by a relatively small "core" network of transcription factors and post-translational modifiers, working in concerted mutual regulation. Recent work suggests that system-independent, quantitative features of the dynamics of gene expression can be used to identify core regulators. We introduce an approach of iterative network hypothesis reduction from time-series data in which increasingly complex features of the dynamic expression of individual, pairs, and entire collections of genes are used to infer functional network models that can produce the observed transcriptional program. The culmination of our work is a computational pipeline, Iterative Network Hypothesis Reduction from Temporal Dynamics (Inherent dynamics pipeline), that provides a priority listing of targets for genetic perturbation to experimentally infer network structure. We demonstrate the capability of this integrated computational pipeline on synthetic and yeast cell-cycle data.
Collapse
Affiliation(s)
- Breschine Cummins
- Department of Mathematical Sciences, Montana State University, Bozeman, Montana, United States of America
- * E-mail:
| | - Francis C. Motta
- Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, Florida, United States of America
| | - Robert C. Moseley
- Department of Biology, Duke University, Durham, North Carolina, United States of America
| | - Anastasia Deckard
- Geometric Data Analytics, Durham, North Carolina, United States of America
| | - Sophia Campione
- Department of Biology, Duke University, Durham, North Carolina, United States of America
| | - Marcio Gameiro
- Department of Mathematics, Rutgers University, New Brunswick, New Jersey, United States of America
- Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, São Paulo, Brazil
| | - Tomáš Gedeon
- Department of Mathematical Sciences, Montana State University, Bozeman, Montana, United States of America
| | - Konstantin Mischaikow
- Department of Mathematics, Rutgers University, New Brunswick, New Jersey, United States of America
| | - Steven B. Haase
- Department of Biology, Duke University, Durham, North Carolina, United States of America
| |
Collapse
|
28
|
Bai Z, Xie M, Hu B, Luo D, Wan C, Peng J, Shi Z. Estimation of Soil Organic Carbon Using Vis-NIR Spectral Data and Spectral Feature Bands Selection in Southern Xinjiang, China. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22166124. [PMID: 36015885 PMCID: PMC9413329 DOI: 10.3390/s22166124] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/13/2022] [Accepted: 08/15/2022] [Indexed: 05/27/2023]
Abstract
Soil organic carbon (SOC) plays an important role in the global carbon cycle and soil fertility supply. Rapid and accurate estimation of SOC content could provide critical information for crop production, soil management and soil carbon pool regulation. Many researchers have confirmed the feasibility and great potential of visible and near-infrared (Vis-NIR) spectroscopy in evaluating SOC content rapidly and accurately. Here, to evaluate the feasibility of different spectral bands variable selection methods for SOC prediction, we collected a total of 330 surface soil samples from the cotton field in the Alar Reclamation area in the southern part of Xinjiang, which is located in the arid region of northwest China. Then, we estimated the SOC content using laboratory Vis-NIR spectral. The Particle Swarm optimization (PSO), Competitive adaptive reweighted sampling (CARS) and Ant colony optimization (ACO) were adopted to select SOC feature bands. The partial least squares regression (PLSR), random forest (RF) and convolutional neural network (CNN) inversion models were constructed by using full-bands (400-2400 nm) spectra (R) and feature bands, respectively. And we also analyzed the effects of spectral feature band selection methods and modeling methods on the prediction accuracy of SOC. The results indicated that: (1) There are significant differences in the feature bands selected using different methods. The feature bands selected methods substantially reduced the spectral variable dimensionality and model complexity. The models built by the feature bands selected by CARS, PSO and ACO methods showed the different potential of improvement in model accuracy compared with the full-band models. (2) The CNN model had the best performance for predicting SOC. The R2 of the optimal CNN model is 0.90 in the validation, which was improved by 0.05 and 0.04 in comparison with the PLSR and RF model, respectively. (3) The highest prediction accuracy was archived by the CNN model using the feature bands selected by CARS (validation set R2 = 0.90, RMSE = 0.97 g kg-1, RPD = 3.18, RPIQ = 3.11). This study indicated that using the CARS method to select spectral feature bands, combined with the CNN modeling method can well predict SOC content with higher accuracy.
Collapse
Affiliation(s)
- Zijin Bai
- College of Agriculture, Tarim University, Alar 843300, China
| | - Modong Xie
- College of Horticulture, Gansu Agricultural University, Lanzhou 730070, China
| | - Bifeng Hu
- Department of Land Resource Management, School of Tourism and Urban Management, Jiangxi University of Finance and Economics, Nanchang 330013, China
| | - Defang Luo
- College of Agriculture, Tarim University, Alar 843300, China
| | - Chang Wan
- College of Mechanical and Electrical Engineering, Tarim University, Alar 843300, China
| | - Jie Peng
- College of Agriculture, Tarim University, Alar 843300, China
| | - Zhou Shi
- Institute of Applied Remote Sensing and Information Technology, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
29
|
Multiomics to elucidate inflammatory bowel disease risk factors and pathways. Nat Rev Gastroenterol Hepatol 2022; 19:399-409. [PMID: 35301463 PMCID: PMC9214275 DOI: 10.1038/s41575-022-00593-y] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/21/2022] [Indexed: 02/07/2023]
Abstract
Inflammatory bowel disease (IBD) is an immune-mediated disease of the intestinal tract, with complex pathophysiology involving genetic, environmental, microbiome, immunological and potentially other factors. Epidemiological data have provided important insights into risk factors associated with IBD, but are limited by confounding, biases and data quality, especially when pertaining to risk factors in early life. Multiomics platforms provide granular high-throughput data on numerous variables simultaneously and can be leveraged to characterize molecular pathways and risk factors for chronic diseases, such as IBD. Herein, we describe omics platforms that can advance our understanding of IBD risk factors and pathways, and available omics data on IBD and other relevant diseases. We highlight knowledge gaps and emphasize the importance of birth, at-risk and pre-diagnostic cohorts, and neonatal blood spots in omics analyses in IBD. Finally, we discuss network analysis, a powerful bioinformatics tool to assemble high-throughput data and derive clinical relevance.
Collapse
|
30
|
Jiang X, Zhang X. RSNET: inferring gene regulatory networks by a redundancy silencing and network enhancement technique. BMC Bioinformatics 2022; 23:165. [PMID: 35524190 PMCID: PMC9074326 DOI: 10.1186/s12859-022-04696-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 04/25/2022] [Indexed: 11/29/2022] Open
Abstract
Background Current gene regulatory network (GRN) inference methods are notorious for a great number of indirect interactions hidden in the predictions. Filtering out the indirect interactions from direct ones remains an important challenge in the reconstruction of GRNs. To address this issue, we developed a redundancy silencing and network enhancement technique (RSNET) for inferring GRNs. Results To assess the performance of RSNET method, we implemented the experiments on several gold-standard networks by using simulation study, DREAM challenge dataset and Escherichia coli network. The results show that RSNET method performed better than the compared methods in sensitivity and accuracy. As a case of study, we used RSNET to construct functional GRN for apple fruit ripening from gene expression data. Conclusions In the proposed method, the redundant interactions including weak and indirect connections are silenced by recursive optimization adaptively, and the highly dependent nodes are constrained in the model to keep the real interactions. This study provides a useful tool for inferring clean networks. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04696-w.
Collapse
Affiliation(s)
- Xiaohan Jiang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China. .,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.
| |
Collapse
|
31
|
Liu X, Shi N, Wang Y, Ji Z, He S. Data-Driven Boolean Network Inference Using a Genetic Algorithm With Marker-Based Encoding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1558-1569. [PMID: 33513105 DOI: 10.1109/tcbb.2021.3055646] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The inference of Boolean networks is crucial for analyzing the topology and dynamics of gene regulatory networks. Many data-driven approaches using evolutionary algorithms have been proposed based on time-series data. However, the ability to infer both network topology and dynamics is restricted by their inflexible encoding schemes. To address this problem, we propose a novel Boolean network inference algorithm for inferring both network topology and dynamics simultaneously. The main idea is that, we use a marker-based genetic algorithm to encode both regulatory nodes and logical operators in a chromosome. By using the markers and introducing more logical operators, the proposed algorithm can infer more diverse candidate Boolean functions. The proposed algorithm is applied to five networks, including two artificial Boolean networks and three real-world gene regulatory networks. Compared with other algorithms, the experimental results demonstrate that our proposed algorithm infers more accurate topology and dynamics.
Collapse
|
32
|
Saremi M, Amirmazlaghani M. Reconstruction of Gene Regulatory Networks Using Multiple Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1827-1839. [PMID: 33539303 DOI: 10.1109/tcbb.2021.3057241] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
MOTIVATION Laboratory gene regulatory data for a species are sporadic. Despite the abundance of gene regulatory network algorithms that employ single data sets, few algorithms can combine the vast but disperse sources of data and extract the potential information. With a motivation to compensate for this shortage, we developed an algorithm called GENEREF that can accumulate information from multiple types of data sets in an iterative manner, with each iteration boosting the performance of the prediction results. RESULTS The algorithm is examined extensively on data extracted from the quintuple DREAM4 networks and DREAM5's Escherichia coli and Saccharomyces cerevisiae networks and sub-networks. Many single-dataset and multi-dataset algorithms were compared to test the performance of the algorithm. Results show that GENEREF surpasses non-ensemble state-of-the-art multi-perturbation algorithms on the selected networks and is competitive to present multiple-dataset algorithms. Specifically, it outperforms dynGENIE3 and is on par with iRafNet. Also, we argued that a scoring method solely based on the AUPR criterion would be more trustworthy than the traditional score. AVAILABILITY The Python implementation along with the data sets and results can be downloaded from github.com/msaremi/GENEREF.
Collapse
|
33
|
Zhang Y, He Y, Chen Q, Yang Y, Gong M. Fusion prior gene network for high reliable single-cell gene regulatory network inference. Comput Biol Med 2022; 143:105279. [PMID: 35134605 DOI: 10.1016/j.compbiomed.2022.105279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 01/25/2022] [Accepted: 01/29/2022] [Indexed: 11/03/2022]
Abstract
Single-Cell RNA sequencing technology provides an opportunity to discover gene regulatory networks(GRN) that control cell differentiation and drive cell type transformation. However, it is faced with the challenge of high loss and high noise of sequencing data and contains many pseudo-connections. To solve these problems, we propose a framework called Fusion prior gene network for Gene Regulatory Network inference Accuracy Enhancement(FGRNAE) to infer a high reliable gene regulatory network. Specifically, based on the Single-Cell RNA-sequencing Network Propagation and network Fusion(scNPF) preprocessing framework, we employ the Random Walk with Restart on the prior gene network to interpolate the missing data. Furthermore, we infer the network using the Random Forest algorithm with the results achieved above. In addition, we apply data from the Co-Function Network to build a meta-gene network and select the regulatory connection with the Markov Random Field. Extensive experiments based on datasets from BEELINE validate the effectiveness of our framework for improving the accuracy of inference.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yuchen He
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Qingyuan Chen
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yihan Yang
- International College, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Meiqin Gong
- West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
34
|
Aluru M, Shrivastava H, Chockalingam SP, Shivakumar S, Aluru S. EnGRaiN: a supervised ensemble learning method for recovery of large-scale gene regulatory networks. Bioinformatics 2022; 38:1312-1319. [PMID: 34888624 DOI: 10.1093/bioinformatics/btab829] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/29/2021] [Accepted: 12/03/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Reconstruction of genome-scale networks from gene expression data is an actively studied problem. A wide range of methods that differ between the types of interactions they uncover with varying trade-offs between sensitivity and specificity have been proposed. To leverage benefits of multiple such methods, ensemble network methods that combine predictions from resulting networks have been developed, promising results better than or as good as the individual networks. Perhaps owing to the difficulty in obtaining accurate training examples, these ensemble methods hitherto are unsupervised. RESULTS In this article, we introduce EnGRaiN, the first supervised ensemble learning method to construct gene networks. The supervision for training is provided by small training datasets of true edge connections (positives) and edges known to be absent (negatives) among gene pairs. We demonstrate the effectiveness of EnGRaiN using simulated datasets as well as a curated collection of Arabidopsis thaliana datasets we created from microarray datasets available from public repositories. EnGRaiN shows better results not only in terms of receiver operating characteristic and PR characteristics for both real and simulated datasets compared with unsupervised methods for ensemble network construction, but also generates networks that can be mined for elucidating complex biological interactions. AVAILABILITY AND IMPLEMENTATION EnGRaiN software and the datasets used in the study are publicly available at the github repository: https://github.com/AluruLab/EnGRaiN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maneesha Aluru
- Department of Biology, Georgia Institute of Technology, Atlanta, GA 30308, USA
| | | | - Sriram P Chockalingam
- Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA 30308, USA
| | - Shruti Shivakumar
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30308, USA
| | - Srinivas Aluru
- Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA 30308, USA.,Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30308, USA
| |
Collapse
|
35
|
On principal graphical models with application to gene network. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2021.107344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
36
|
Abstract
Cancer is a genetic disease in which multiple genes are perturbed. Thus, information about the regulatory relationships between genes is necessary for the identification of biomarkers and therapeutic targets. In this review, methods for inference of gene regulatory networks (GRNs) from transcriptomics data that are used in cancer research are introduced. The methods are classified into three categories according to the analysis model. The first category includes methods that use pair-wise measures between genes, including correlation coefficient and mutual information. The second category includes methods that determine the genetic regulatory relationship using multivariate measures, which consider the expression profiles of all genes concurrently. The third category includes methods using supervised and integrative approaches. The supervised approach estimates the regulatory relationship using a supervised learning method that constructs a regression or classification model for predicting whether there is a regulatory relationship between genes with input data of gene expression profiles and class labels of prior biological knowledge. The integrative method is an expansion of the supervised method and uses more data and biological knowledge for predicting the regulatory relationship. Furthermore, simulation and experimental validation of the estimated GRNs are also discussed in this review. This review identified that most GRN inference methods are not specific for cancer transcriptome data, and such methods are required for better understanding of cancer pathophysiology. In addition, more systematic methods for validation of the estimated GRNs need to be developed in the context of cancer biology.
Collapse
|
37
|
Lecca P. Machine Learning for Causal Inference in Biological Networks: Perspectives of This Challenge. FRONTIERS IN BIOINFORMATICS 2021; 1:746712. [PMID: 36303798 PMCID: PMC9581010 DOI: 10.3389/fbinf.2021.746712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 09/08/2021] [Indexed: 11/13/2022] Open
Abstract
Most machine learning-based methods predict outcomes rather than understanding causality. Machine learning methods have been proved to be efficient in finding correlations in data, but unskilful to determine causation. This issue severely limits the applicability of machine learning methods to infer the causal relationships between the entities of a biological network, and more in general of any dynamical system, such as medical intervention strategies and clinical outcomes system, that is representable as a network. From the perspective of those who want to use the results of network inference not only to understand the mechanisms underlying the dynamics, but also to understand how the network reacts to external stimuli (e. g. environmental factors, therapeutic treatments), tools that can understand the causal relationships between data are highly demanded. Given the increasing popularity of machine learning techniques in computational biology and the recent literature proposing the use of machine learning techniques for the inference of biological networks, we would like to present the challenges that mathematics and computer science research faces in generalising machine learning to an approach capable of understanding causal relationships, and the prospects that achieving this will open up for the medical application domains of systems biology, the main paradigm of which is precisely network biology at any physical scale.
Collapse
|
38
|
Montesinos-López OA, Montesinos-López A, Mosqueda-Gonzalez BA, Montesinos-López JC, Crossa J, Ramirez NL, Singh P, Valladares-Anguiano FA. A zero altered Poisson random forest model for genomic-enabled prediction. G3-GENES GENOMES GENETICS 2021; 11:6042695. [PMID: 33693599 PMCID: PMC8022945 DOI: 10.1093/g3journal/jkaa057] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 12/10/2020] [Indexed: 12/23/2022]
Abstract
In genomic selection choosing the statistical machine learning model is of paramount importance. In this paper, we present an application of a zero altered random forest model with two versions (ZAP_RF and ZAPC_RF) to deal with excess zeros in count response variables. The proposed model was compared with the conventional random forest (RF) model and with the conventional Generalized Poisson Ridge regression (GPR) using two real datasets, and we found that, in terms of prediction performance, the proposed zero inflated random forest model outperformed the conventional RF and GPR models.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430 Guadalajara, Jalisco, México
| | | | | | - José Crossa
- Colegio de Postgraduados, Montecillos, Edo. de México CP 56230, México.,International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Edo. de México, México
| | - Nerida Lozano Ramirez
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Edo. de México, México
| | - Pawan Singh
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Edo. de México, México
| | | |
Collapse
|
39
|
Park Y, Heider D, Hauschild AC. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers (Basel) 2021; 13:3148. [PMID: 34202427 PMCID: PMC8269018 DOI: 10.3390/cancers13133148] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Anne-Christin Hauschild
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
- Department of Medical Informatics, University Medical Center Göttingen, 37075 Göttingen, Germany
| |
Collapse
|
40
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 154] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|
41
|
Camargo Rodriguez AV. Integrative Modelling of Gene Expression and Digital Phenotypes to Describe Senescence in Wheat. Genes (Basel) 2021; 12:909. [PMID: 34208213 PMCID: PMC8230903 DOI: 10.3390/genes12060909] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/19/2021] [Accepted: 06/02/2021] [Indexed: 12/27/2022] Open
Abstract
Senescence is the final stage of leaf development and is critical for plants' fitness as nutrient relocation from leaves to reproductive organs takes place. Although senescence is key in nutrient relocation and yield determination in cereal grain production, there is limited understanding of the genetic and molecular mechanisms that control it in major staple crops such as wheat. Senescence is a highly orchestrated continuum of interacting pathways throughout the lifecycle of a plant. Levels of gene expression, morphogenesis, and phenotypic development all play key roles. Yet, most studies focus on a short window immediately after anthesis. This approach clearly leaves out key components controlling the activation, development, and modulation of the senescence pathway before anthesis, as well as during the later developmental stages, during which grain development continues. Here, a computational multiscale modelling approach integrates multi-omics developmental data to attempt to simulate senescence at the molecular and plant level. To recreate the senescence process in wheat, core principles were borrowed from Arabidopsis Thaliana, a more widely researched plant model. The resulted model describes temporal gene regulatory networks and their effect on plant morphology leading to senescence. Digital phenotypes generated from images using a phenomics platform were used to capture the dynamics of plant development. This work provides the basis for the application of computational modelling to advance understanding of the complex biological trait senescence. This supports the development of a predictive framework enabling its prediction in changing or extreme environmental conditions, with a view to targeted selection for optimal lifecycle duration for improving resilience to climate change.
Collapse
|
42
|
Mousavi R, Konuru SH, Lobo D. Inference of dynamic spatial GRN models with multi-GPU evolutionary computation. Brief Bioinform 2021; 22:6217729. [PMID: 33834216 DOI: 10.1093/bib/bbab104] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 02/15/2021] [Accepted: 03/09/2021] [Indexed: 02/06/2023] Open
Abstract
Reverse engineering mechanistic gene regulatory network (GRN) models with a specific dynamic spatial behavior is an inverse problem without analytical solutions in general. Instead, heuristic machine learning algorithms have been proposed to infer the structure and parameters of a system of equations able to recapitulate a given gene expression pattern. However, these algorithms are computationally intensive as they need to simulate millions of candidate models, which limits their applicability and requires high computational resources. Graphics processing unit (GPU) computing is an affordable alternative for accelerating large-scale scientific computation, yet no method is currently available to exploit GPU technology for the reverse engineering of mechanistic GRNs from spatial phenotypes. Here we present an efficient methodology to parallelize evolutionary algorithms using GPU computing for the inference of mechanistic GRNs that can develop a given gene expression pattern in a multicellular tissue area or cell culture. The proposed approach is based on multi-CPU threads running the lightweight crossover, mutation and selection operators and launching GPU kernels asynchronously. Kernels can run in parallel in a single or multiple GPUs and each kernel simulates and scores the error of a model using the thread parallelism of the GPU. We tested this methodology for the inference of spatiotemporal mechanistic gene regulatory networks (GRNs)-including topology and parameters-that can develop a given 2D gene expression pattern. The results show a 700-fold speedup with respect to a single CPU implementation. This approach can streamline the extraction of knowledge from biological and medical datasets and accelerate the automatic design of GRNs for synthetic biology applications.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| | - Sri Harsha Konuru
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| | - Daniel Lobo
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| |
Collapse
|
43
|
Brooks MD, Juang CL, Katari MS, Alvarez JM, Pasquino A, Shih HJ, Huang J, Shanks C, Cirrone J, Coruzzi GM. ConnecTF: A platform to integrate transcription factor-gene interactions and validate regulatory networks. PLANT PHYSIOLOGY 2021; 185:49-66. [PMID: 33631799 PMCID: PMC8133578 DOI: 10.1093/plphys/kiaa012] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 10/27/2020] [Indexed: 05/08/2023]
Abstract
Deciphering gene regulatory networks (GRNs) is both a promise and challenge of systems biology. The promise lies in identifying key transcription factors (TFs) that enable an organism to react to changes in its environment. The challenge lies in validating GRNs that involve hundreds of TFs with hundreds of thousands of interactions with their genome-wide targets experimentally determined by high-throughput sequencing. To address this challenge, we developed ConnecTF, a species-independent, web-based platform that integrates genome-wide studies of TF-target binding, TF-target regulation, and other TF-centric omic datasets and uses these to build and refine validated or inferred GRNs. We demonstrate the functionality of ConnecTF by showing how integration within and across TF-target datasets uncovers biological insights. Case study 1 uses integration of TF-target gene regulation and binding datasets to uncover TF mode-of-action and identify potential TF partners for 14 TFs in abscisic acid signaling. Case study 2 demonstrates how genome-wide TF-target data and automated functions in ConnecTF are used in precision/recall analysis and pruning of an inferred GRN for nitrogen signaling. Case study 3 uses ConnecTF to chart a network path from NLP7, a master TF in nitrogen signaling, to direct secondary TF2s and to its indirect targets in a Network Walking approach. The public version of ConnecTF (https://ConnecTF.org) contains 3,738,278 TF-target interactions for 423 TFs in Arabidopsis, 839,210 TF-target interactions for 139 TFs in maize (Zea mays), and 293,094 TF-target interactions for 26 TFs in rice (Oryza sativa). The database and tools in ConnecTF will advance the exploration of GRNs in plant systems biology applications for model and crop species.
Collapse
Affiliation(s)
- Matthew D Brooks
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
- USDA ARS Global Change and Photosynthesis Research Unit, Urbana, IL, USA
| | - Che-Lun Juang
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Manpreet Singh Katari
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - José M Alvarez
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
- Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| | - Angelo Pasquino
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Hung-Jui Shih
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Ji Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Carly Shanks
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Jacopo Cirrone
- Courant Institute for Mathematical Sciences, Department of Computer Science, New York University NY, USA
| | - Gloria M Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
- Author for communication: (G.C.)
| |
Collapse
|
44
|
Liu H, Wang X, Tang K, Peng E, Xia D, Chen Z. Machine learning-assisted decision-support models to better predict patients with calculous pyonephrosis. Transl Androl Urol 2021; 10:710-723. [PMID: 33718073 PMCID: PMC7947454 DOI: 10.21037/tau-20-1208] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Background To develop a machine learning (ML)-assisted model capable of accurately identifying patients with calculous pyonephrosis before making treatment decisions by integrating multiple clinical characteristics. Methods We retrospectively collected data from patients with obstructed hydronephrosis who underwent retrograde ureteral stent insertion, percutaneous nephrostomy (PCN), or percutaneous nephrolithotomy (PCNL). The study cohort was divided into training and testing datasets in a 70:30 ratio for further analysis. We developed 5 ML-assisted models from 22 clinical features using logistic regression (LR), LR optimized by least absolute shrinkage and selection operator (Lasso) regularization (Lasso-LR), support vector machine (SVM), extreme gradient boosting (XGBoost), and random forest (RF). The area under the curve (AUC) was applied to determine the model with the highest discrimination. Decision curve analysis (DCA) was used to investigate the clinical net benefit associated with using the predictive models. Results A total of 322 patients were included, with 225 patients in the training dataset, and 97 patients in the testing dataset. The XGBoost model showed good discrimination with the AUC, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.981, 0.991, 0.962, 1.000, 1.000, and 0.989, respectively, followed by SVM [AUC =0.985, 95% confidence interval (CI): 0.970–1.000], Lasso-LR (AUC =0.977, 95% CI: 0.958–0.996), LR (AUC =0.936, 95% CI: 0.905–0.968), and RF (AUC =0.920, 95% CI: 0.870–0.970). Validation of the model showed that SVM yielded the highest AUC (0.977, 95% CI: 0.952–1.000), followed by Lasso-LR (AUC =0.959, 95% CI: 0.921–0.997), XGBoost (AUC =0.958, 95% CI: 0.902–1.000), LR (AUC =0.932, 95% CI: 0.878–0.987), and RF (AUC =0.868, 95% CI: 0.779–0.958) in the testing dataset. Conclusions Our ML-based models had good discrimination in predicting patients with obstructed hydronephrosis at high risk of harboring pyonephrosis, and the use of these models may be greatly beneficial to urologists in treatment planning, patient selection, and decision-making.
Collapse
Affiliation(s)
- Hailang Liu
- Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xinguang Wang
- Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Kun Tang
- Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Ejun Peng
- Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Ding Xia
- Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Zhiqiang Chen
- Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
45
|
Wani N, Raza K. MKL-GRNI: A parallel multiple kernel learning approach for supervised inference of large-scale gene regulatory networks. PeerJ Comput Sci 2021; 7:e363. [PMID: 33817013 PMCID: PMC7924726 DOI: 10.7717/peerj-cs.363] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 12/29/2020] [Indexed: 06/12/2023]
Abstract
High throughput multi-omics data generation coupled with heterogeneous genomic data fusion are defining new ways to build computational inference models. These models are scalable and can support very large genome sizes with the added advantage of exploiting additional biological knowledge from the integration framework. However, the limitation with such an arrangement is the huge computational cost involved when learning from very large datasets in a sequential execution environment. To overcome this issue, we present a multiple kernel learning (MKL) based gene regulatory network (GRN) inference approach wherein multiple heterogeneous datasets are fused using MKL paradigm. We formulate the GRN learning problem as a supervised classification problem, whereby genes regulated by a specific transcription factor are separated from other non-regulated genes. A parallel execution architecture is devised to learn a large scale GRN by decomposing the initial classification problem into a number of subproblems that run as multiple processes on a multi-processor machine. We evaluate the approach in terms of increased speedup and inference potential using genomic data from Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The results thus obtained demonstrate that the proposed method exhibits better classification accuracy and enhanced speedup compared to other state-of-the-art methods while learning large scale GRNs from multiple and heterogeneous datasets.
Collapse
Affiliation(s)
- Nisar Wani
- Govt. Degree College Baramulla, Jammu & Kashmir, India
| | - Khalid Raza
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
46
|
Kimura S, Fukutomi R, Tokuhisa M, Okada M. Inference of Genetic Networks From Time-Series and Static Gene Expression Data: Combining a Random-Forest-Based Inference Method With Feature Selection Methods. Front Genet 2021; 11:595912. [PMID: 33384716 PMCID: PMC7770182 DOI: 10.3389/fgene.2020.595912] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 11/23/2020] [Indexed: 11/17/2022] Open
Abstract
Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.
Collapse
Affiliation(s)
- Shuhei Kimura
- Faculty of Engineering, Tottori University, Tottori, Japan
| | - Ryo Fukutomi
- Graduate School of Sustainability Science, Tottori University, Tottori, Japan
| | | | - Mariko Okada
- Laboratory of Cell Systems, Institute of Protein Research, Osaka University, Osaka, Japan
| |
Collapse
|
47
|
Petralia F, Tignor N, Reva B, Koptyra M, Chowdhury S, Rykunov D, Krek A, Ma W, Zhu Y, Ji J, Calinawan A, Whiteaker JR, Colaprico A, Stathias V, Omelchenko T, Song X, Raman P, Guo Y, Brown MA, Ivey RG, Szpyt J, Guha Thakurta S, Gritsenko MA, Weitz KK, Lopez G, Kalayci S, Gümüş ZH, Yoo S, da Veiga Leprevost F, Chang HY, Krug K, Katsnelson L, Wang Y, Kennedy JJ, Voytovich UJ, Zhao L, Gaonkar KS, Ennis BM, Zhang B, Baubet V, Tauhid L, Lilly JV, Mason JL, Farrow B, Young N, Leary S, Moon J, Petyuk VA, Nazarian J, Adappa ND, Palmer JN, Lober RM, Rivero-Hinojosa S, Wang LB, Wang JM, Broberg M, Chu RK, Moore RJ, Monroe ME, Zhao R, Smith RD, Zhu J, Robles AI, Mesri M, Boja E, Hiltke T, Rodriguez H, Zhang B, Schadt EE, Mani DR, Ding L, Iavarone A, Wiznerowicz M, Schürer S, Chen XS, Heath AP, Rokita JL, Nesvizhskii AI, Fenyö D, Rodland KD, Liu T, Gygi SP, Paulovich AG, Resnick AC, Storm PB, Rood BR, Wang P. Integrated Proteogenomic Characterization across Major Histological Types of Pediatric Brain Cancer. Cell 2020; 183:1962-1985.e31. [PMID: 33242424 PMCID: PMC8143193 DOI: 10.1016/j.cell.2020.10.044] [Citation(s) in RCA: 164] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 06/19/2020] [Accepted: 10/26/2020] [Indexed: 02/06/2023]
Abstract
We report a comprehensive proteogenomics analysis, including whole-genome sequencing, RNA sequencing, and proteomics and phosphoproteomics profiling, of 218 tumors across 7 histological types of childhood brain cancer: low-grade glioma (n = 93), ependymoma (32), high-grade glioma (25), medulloblastoma (22), ganglioglioma (18), craniopharyngioma (16), and atypical teratoid rhabdoid tumor (12). Proteomics data identify common biological themes that span histological boundaries, suggesting that treatments used for one histological type may be applied effectively to other tumors sharing similar proteomics features. Immune landscape characterization reveals diverse tumor microenvironments across and within diagnoses. Proteomics data further reveal functional effects of somatic mutations and copy number variations (CNVs) not evident in transcriptomics data. Kinase-substrate association and co-expression network analysis identify important biological mechanisms of tumorigenesis. This is the first large-scale proteogenomics analysis across traditional histological boundaries to uncover foundational pediatric brain tumor biology and inform rational treatment selection.
Collapse
Affiliation(s)
- Francesca Petralia
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Nicole Tignor
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Boris Reva
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Mateusz Koptyra
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Shrabanti Chowdhury
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Dmitry Rykunov
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Weiping Ma
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Yuankun Zhu
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jiayi Ji
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Anna Calinawan
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Antonio Colaprico
- Department of Public Health Science, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Vasileios Stathias
- Department of Pharmacology, Institute for Data Science and Computing, Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33146, USA
| | - Tatiana Omelchenko
- Cell Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Xiaoyu Song
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Pichai Raman
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Bioinformatics and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yiran Guo
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Miguel A Brown
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Richard G Ivey
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - John Szpyt
- Thermo Fisher Scientific Center for Multiplexed Proteomics, Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Sanjukta Guha Thakurta
- Thermo Fisher Scientific Center for Multiplexed Proteomics, Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Marina A Gritsenko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Karl K Weitz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Gonzalo Lopez
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Selim Kalayci
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zeynep H Gümüş
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Seungyeul Yoo
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Hui-Yin Chang
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Karsten Krug
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02412, USA
| | - Lizabeth Katsnelson
- Institute for Systems Genetics; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Ying Wang
- Institute for Systems Genetics; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Jacob J Kennedy
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | | | - Lei Zhao
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Krutika S Gaonkar
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Bioinformatics and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Brian M Ennis
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Bo Zhang
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Valerie Baubet
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Lamiya Tauhid
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jena V Lilly
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jennifer L Mason
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Bailey Farrow
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Nathan Young
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Sarah Leary
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Cancer and Blood Disorders Center, Seattle Children's Hospital, Seattle, WA 98105, USA; Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
| | - Jamie Moon
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Vladislav A Petyuk
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Javad Nazarian
- Children's National Research Institute, George Washington University School of Medicine, Washington, DC 20010, USA; Department of Oncology, Children's Research Center, University Children's Hospital Zürich, Zürich 8032, Switzerland
| | - Nithin D Adappa
- Department of Otorhinolaryngology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - James N Palmer
- Department of Otorhinolaryngology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Robert M Lober
- Department of Neurosurgery, Dayton Children's Hospital, Dayton, OH 45404, USA
| | - Samuel Rivero-Hinojosa
- Children's National Research Institute, George Washington University School of Medicine, Washington, DC 20010, USA
| | - Liang-Bo Wang
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Joshua M Wang
- Institute for Systems Genetics; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Matilda Broberg
- Institute for Systems Genetics; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Rosalie K Chu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Ronald J Moore
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Rui Zhao
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Richard D Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Jun Zhu
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ana I Robles
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mehdi Mesri
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Emily Boja
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Tara Hiltke
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Henry Rodriguez
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - D R Mani
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02412, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Antonio Iavarone
- Institute for Cancer Genetics, Department of Neurology, Department of Pathology and Cell Biology, Herbert Irving Comprehensive Cancer Center, Columbia University Medical Center, New York, NY 10032, USA
| | - Maciej Wiznerowicz
- Poznan University of Medical Sciences, 61-701 Poznań, Poland; International Institute for Molecular Oncology, 61-203 Poznań, Poland
| | - Stephan Schürer
- Department of Pharmacology, Institute for Data Science and Computing, Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33146, USA
| | - Xi S Chen
- Department of Public Health Science, University of Miami Miller School of Medicine, Miami, FL 33136, USA; Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Allison P Heath
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jo Lynne Rokita
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Bioinformatics and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Alexey I Nesvizhskii
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - David Fenyö
- Institute for Systems Genetics; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Karin D Rodland
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA; Department of Cell, Developmental, and Cancer Biology, Oregon Health & Science University, Portland, OR 97221, USA
| | - Tao Liu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Steven P Gygi
- Thermo Fisher Scientific Center for Multiplexed Proteomics, Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | | | - Adam C Resnick
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| | - Phillip B Storm
- Center for Data-Driven Discovery in Biomedicine, Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| | - Brian R Rood
- Children's National Research Institute, George Washington University School of Medicine, Washington, DC 20010, USA.
| | - Pei Wang
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
48
|
Wang Y, Li M, Ji R, Wang M, Zheng L. Comparison of Soil Total Nitrogen Content Prediction Models Based on Vis-NIR Spectroscopy. SENSORS 2020; 20:s20247078. [PMID: 33321833 PMCID: PMC7763030 DOI: 10.3390/s20247078] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 11/24/2020] [Accepted: 12/07/2020] [Indexed: 01/20/2023]
Abstract
Visible-near-infrared spectrum (Vis-NIR) spectroscopy technology is one of the most important methods for non-destructive and rapid detection of soil total nitrogen (STN) content. In order to find a practical way to build STN content prediction model, three conventional machine learning methods and one deep learning approach are investigated and their predictive performances are compared and analyzed by using a public dataset called LUCAS Soil (19,019 samples). The three conventional machine learning methods include ordinary least square estimation (OLSE), random forest (RF), and extreme learning machine (ELM), while for the deep learning method, three different structures of convolutional neural network (CNN) incorporated Inception module are constructed and investigated. In order to clarify effectiveness of different pre-treatments on predicting STN content, the three conventional machine learning methods are combined with four pre-processing approaches (including baseline correction, smoothing, dimensional reduction, and feature selection) are investigated, compared, and analyzed. The results indicate that the baseline-corrected and smoothed ELM model reaches practical precision (coefficient of determination (R2) = 0.89, root mean square error of prediction (RMSEP) = 1.60 g/kg, and residual prediction deviation (RPD) = 2.34). While among three different structured CNN models, the one with more 1 × 1 convolutions preforms better (R2 = 0.93; RMSEP = 0.95 g/kg; and RPD = 3.85 in optimal case). In addition, in order to evaluate the influence of data set characteristics on the model, the LUCAS data set was divided into different data subsets according to dataset size, organic carbon (OC) content and countries, and the results show that the deep learning method is more effective and practical than conventional machine learning methods and, on the premise of enough data samples, it can be used to build a robust STN content prediction model with high accuracy for the same type of soil with similar agricultural treatment.
Collapse
Affiliation(s)
- Yueting Wang
- Key Laboratory of Modern Precision Agriculture System Integration Research, Ministry of Education, China Agricultural University, Beijing 100083, China; (Y.W.); (M.L.); (R.J.)
- Key Laboratory of Agricultural Informatization Standardization, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100083, China;
| | - Minzan Li
- Key Laboratory of Modern Precision Agriculture System Integration Research, Ministry of Education, China Agricultural University, Beijing 100083, China; (Y.W.); (M.L.); (R.J.)
| | - Ronghua Ji
- Key Laboratory of Modern Precision Agriculture System Integration Research, Ministry of Education, China Agricultural University, Beijing 100083, China; (Y.W.); (M.L.); (R.J.)
| | - Minjuan Wang
- Key Laboratory of Agricultural Informatization Standardization, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100083, China;
| | - Lihua Zheng
- Key Laboratory of Modern Precision Agriculture System Integration Research, Ministry of Education, China Agricultural University, Beijing 100083, China; (Y.W.); (M.L.); (R.J.)
- Key Laboratory of Agricultural Informatization Standardization, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100083, China;
- Correspondence:
| |
Collapse
|
49
|
Kalayci S, Petralia F, Wang P, Gümüş ZH. ProNetView-ccRCC: A Web-Based Portal to Interactively Explore Clear Cell Renal Cell Carcinoma Proteogenomics Networks. Proteomics 2020; 20:e2000043. [PMID: 32358997 PMCID: PMC7606637 DOI: 10.1002/pmic.202000043] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 04/20/2020] [Indexed: 11/11/2022]
Abstract
To better understand the molecular basis of cancer, the NCI's Clinical Proteomics Tumor Analysis Consortium (CPTAC) has been performing comprehensive large-scale proteogenomic characterizations of multiple cancer types. Gene and protein regulatory networks are subsequently being derived based on these proteogenomic profiles, which serve as tools to gain systems-level understanding of the molecular regulatory factories underlying these diseases. On the other hand, it remains a challenge to effectively visualize and navigate the resulting network models, which capture higher order structures in the proteogenomic profiles. There is a pressing need to have a new open community resource tool for intuitive visual exploration, interpretation, and communication of these gene/protein regulatory networks by the cancer research community. In this work, ProNetView-ccRCC (http://ccrcc.cptac-network-view.org/), an interactive web-based network exploration portal for investigating phosphopeptide co-expression network inferred based on the CPTAC clear cell renal cell carcinoma (ccRCC) phosphoproteomics data is introduced. ProNetView-ccRCC enables quick, user-intuitive visual interactions with the ccRCC tumor phosphoprotein co-expression network comprised of 3614 genes, as well as 30 functional pathway-enriched network modules. Users can interact with the network portal and can conveniently query for association between abundance of each phosphopeptide in the network and clinical variables such as tumor grade.
Collapse
Affiliation(s)
- Selim Kalayci
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Francesca Petralia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Zeynep H. Gümüş
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| |
Collapse
|
50
|
Liu W, Sun X, Peng L, Zhou L, Lin H, Jiang Y. RWRNET: A Gene Regulatory Network Inference Algorithm Using Random Walk With Restart. Front Genet 2020; 11:591461. [PMID: 33101398 PMCID: PMC7545090 DOI: 10.3389/fgene.2020.591461] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 09/02/2020] [Indexed: 11/30/2022] Open
Abstract
Inferring gene regulatory networks from expression data is essential in identifying complex regulatory relationships among genes and revealing the mechanism of certain diseases. Various computation methods have been developed for inferring gene regulatory networks. However, these methods focus on the local topology of the network rather than on the global topology. From network optimisation standpoint, emphasising the global topology of the network also reduces redundant regulatory relationships. In this study, we propose a novel network inference algorithm using Random Walk with Restart (RWRNET) that combines local and global topology relationships. The method first captures the local topology through three elements of random walk and then combines the local topology with the global topology by Random Walk with Restart. The Markov Blanket discovery algorithm is then used to deal with isolated genes. The proposed method is compared with several state-of-the-art methods on the basis of six benchmark datasets. Experimental results demonstrated the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, China
| | - Xingen Sun
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Lili Zhou
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Hui Lin
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Yi Jiang
- School of Computer Science, Xiangtan University, Xiangtan, China
| |
Collapse
|