1
|
Qin XY, Shirakami Y, Honda M, Yeh SH, Numata K, Lai YY, Li CL, Wei F, Xu Y, Imai K, Takai K, Chuma M, Komatsu N, Furutani Y, Gailhouste L, Aikata H, Chayama K, Enomoto M, Tateishi R, Kawaguchi K, Yamashita T, Kaneko S, Nagaoka K, Tanaka M, Sasaki Y, Tanaka Y, Baba H, Miura K, Ochi S, Masaki T, Kojima S, Matsuura T, Shimizu M, Chen PJ, Moriwaki H, Suzuki H. Serum MYCN as a predictive biomarker of prognosis and therapeutic response in the prevention of hepatocellular carcinoma recurrence. Int J Cancer 2024; 155:582-594. [PMID: 38380807 DOI: 10.1002/ijc.34893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Revised: 01/22/2024] [Accepted: 01/31/2024] [Indexed: 02/22/2024]
Abstract
The proto-oncogene MYCN expression marked a cancer stem-like cell population in hepatocellular carcinoma (HCC) and served as a therapeutic target of acyclic retinoid (ACR), an orally administered vitamin A derivative that has demonstrated promising efficacy and safety in reducing HCC recurrence. This study investigated the role of MYCN as a predictive biomarker for therapeutic response to ACR and prognosis of HCC. MYCN gene expression in HCC was analyzed in the Cancer Genome Atlas and a Taiwanese cohort (N = 118). Serum MYCN protein levels were assessed in healthy controls (N = 15), patients with HCC (N = 116), pre- and post-surgical patients with HCC (N = 20), and a subset of patients from a phase 3 clinical trial of ACR (N = 68, NCT01640808). The results showed increased MYCN gene expression in HCC tumors, which positively correlated with HCC recurrence in non-cirrhotic or single-tumor patients. Serum MYCN protein levels were higher in patients with HCC, decreased after surgical resection of HCC, and were associated with liver functional reserve and fibrosis markers, as well as long-term HCC prognosis (>4 years). Subgroup analysis of a phase 3 clinical trial of ACR identified serum MYCN as the risk factor most strongly associated with HCC recurrence. Patients with HCC with higher serum MYCN levels after a 4-week treatment of ACR exhibited a significantly higher risk of recurrence (hazard ratio 3.27; p = .022). In conclusion, serum MYCN holds promise for biomarker-based precision medicine for the prevention of HCC, long-term prognosis of early-stage HCC, and identification of high-response subgroups for ACR-based treatment.
Collapse
Affiliation(s)
- Xian-Yang Qin
- Laboratory for Cellular Function Conversion Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Liver Cancer Prevention Research Unit, RIKEN Cluster for Pioneering Research, Saitama, Japan
| | - Yohei Shirakami
- Department of Gastroenterology, Graduate School of Medicine, Gifu University, Gifu, Japan
| | - Masao Honda
- Department of Gastroenterology, Graduate School of Medical Sciences, Kanazawa University, Kanazawa, Japan
| | - Shiou-Hwei Yeh
- Department of Microbiology, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Kazushi Numata
- Gastroenterological Center, Yokohama City University Medical Center, Yokohama, Japan
| | - Ya-Yun Lai
- Department of Microbiology, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Chiao-Ling Li
- Department of Microbiology, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Feifei Wei
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
| | - Yali Xu
- Laboratory for Cellular Function Conversion Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kenji Imai
- Department of Gastroenterology, Graduate School of Medicine, Gifu University, Gifu, Japan
| | - Koji Takai
- Department of Gastroenterology, Graduate School of Medicine, Gifu University, Gifu, Japan
| | - Makoto Chuma
- Gastroenterological Center, Yokohama City University Medical Center, Yokohama, Japan
| | - Nagisa Komatsu
- Gastroenterological Center, Yokohama City University Medical Center, Yokohama, Japan
| | - Yutaka Furutani
- Liver Cancer Prevention Research Unit, RIKEN Cluster for Pioneering Research, Saitama, Japan
- Department of Laboratory Medicine, The Jikei University School of Medicine, Tokyo, Japan
| | - Luc Gailhouste
- Liver Cancer Prevention Research Unit, RIKEN Cluster for Pioneering Research, Saitama, Japan
- Laboratory for Brain Development and Disorders, RIKEN Center for Brain Science, Saitama, Japan
| | | | - Kazuaki Chayama
- Collaborative Research Laboratory of Medical Innovation, Hiroshima University, Hiroshima, Japan
- Hiroshima Institute of Life Sciences, Hiroshima, Japan
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Masaru Enomoto
- Department of Hepatology, Osaka City University Graduate School of Medicine, Osaka, Japan
| | - Ryosuke Tateishi
- Department of Gastroenterology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kazunori Kawaguchi
- Department of Gastroenterology, Graduate School of Medical Sciences, Kanazawa University, Kanazawa, Japan
| | - Tatsuya Yamashita
- Department of Gastroenterology, Graduate School of Medical Sciences, Kanazawa University, Kanazawa, Japan
| | - Shuichi Kaneko
- Department of Gastroenterology, Graduate School of Medical Sciences, Kanazawa University, Kanazawa, Japan
| | - Katsuya Nagaoka
- Department of Gastroenterology and Hepatology, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
| | - Motohiko Tanaka
- Department of Gastroenterology and Hepatology, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
- Public Health and Welfare Bureau, City of Kumamoto, Kumamoto, Japan
| | - Yutaka Sasaki
- Department of Gastroenterology and Hepatology, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
- Department of Gastroenterology, Osaka Central Hospital, Osaka, Japan
| | - Yasuhito Tanaka
- Department of Gastroenterology and Hepatology, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
| | - Hideo Baba
- Department of Gastroenterological Surgery, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
| | - Kouichi Miura
- Division of Gastroenterology, Jichi Medical University School of Medicine, Tochigi, Japan
| | - Sae Ochi
- Department of Laboratory Medicine, The Jikei University School of Medicine, Tokyo, Japan
| | - Takahiro Masaki
- Department of Laboratory Medicine, The Jikei University School of Medicine, Tokyo, Japan
| | - Soichi Kojima
- Liver Cancer Prevention Research Unit, RIKEN Cluster for Pioneering Research, Saitama, Japan
| | - Tomokazu Matsuura
- Liver Cancer Prevention Research Unit, RIKEN Cluster for Pioneering Research, Saitama, Japan
- Department of Laboratory Medicine, The Jikei University School of Medicine, Tokyo, Japan
| | - Masahito Shimizu
- Department of Gastroenterology, Graduate School of Medicine, Gifu University, Gifu, Japan
| | - Pei-Jer Chen
- Graduate Institute of Clinical Medicine, Department of Internal Medicine, National Taiwan University College of Medicine and Hospital, Taipei, Taiwan
| | - Hisataka Moriwaki
- Department of Gastroenterology, Graduate School of Medicine, Gifu University, Gifu, Japan
| | - Harukazu Suzuki
- Laboratory for Cellular Function Conversion Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| |
Collapse
|
2
|
Gómez-Pascual A, Rocamora-Pérez G, Ibanez L, Botía JA. Targeted co-expression networks for the study of traits. Sci Rep 2024; 14:16675. [PMID: 39030261 PMCID: PMC11271532 DOI: 10.1038/s41598-024-67329-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 07/10/2024] [Indexed: 07/21/2024] Open
Abstract
Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used approach for the generation of gene co-expression networks. However, networks generated with this tool usually create large modules with a large set of functional annotations hard to decipher. We have developed TGCN, a new method to create Targeted Gene Co-expression Networks. This method identifies the transcripts that best predict the trait of interest based on gene expression using a refinement of the LASSO regression. Then, it builds the co-expression modules around those transcripts. Algorithm properties were characterized using the expression of 13 brain regions from the Genotype-Tissue Expression project. When comparing our method with WGCNA, TGCN networks lead to more precise modules that have more specific and yet rich biological meaning. Then, we illustrate its applicability by creating an APP-TGCN on The Religious Orders Study and Memory and Aging Project dataset, aiming to identify the molecular pathways specifically associated with APP role in Alzheimer's disease. Main biological findings were further validated in two independent cohorts. In conclusion, we provide a new framework that serves to create targeted networks that are smaller, biologically relevant and useful in high throughput hypothesis driven research. The TGCN R package is available on Github: https://github.com/aliciagp/TGCN .
Collapse
Affiliation(s)
- A Gómez-Pascual
- Communications Engineering and Information Department, University of Murcia, 30100, Murcia, Spain
| | - G Rocamora-Pérez
- Department of Genetics and Genomic Medicine Research and Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK
| | - L Ibanez
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, 63110, USA
- Department of Neurology, Washington University School of Medicine, Saint Louis, MO, 63110, USA
| | - J A Botía
- Communications Engineering and Information Department, University of Murcia, 30100, Murcia, Spain.
| |
Collapse
|
3
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
4
|
Niu Y, Luo J, Zong C. Single-cell total-RNA profiling unveils regulatory hubs of transcription factors. Nat Commun 2024; 15:5941. [PMID: 39009595 PMCID: PMC11251146 DOI: 10.1038/s41467-024-50291-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 07/03/2024] [Indexed: 07/17/2024] Open
Abstract
Recent development of RNA velocity uses master equations to establish the kinetics of the life cycle of RNAs from unspliced RNA to spliced RNA (i.e., mature RNA) to degradation. To feed this kinetic analysis, simultaneous measurement of unspliced RNA and spliced RNA in single cells is greatly desired. However, the majority of single-cell RNA-seq chemistry primarily captures mature RNA species to measure gene expressions. Here, we develop a one-step total-RNA chemistry-based single-cell RNA-seq method: snapTotal-seq. We benchmark this method with multiple single-cell RNA-seq assays in their performance in kinetic analysis of cell cycle by RNA velocity. Next, with LASSO regression between transcription factors, we identify the critical regulatory hubs mediating the cell cycle dynamics. We also apply snapTotal-seq to profile the oncogene-induced senescence and identify the key regulatory hubs governing the entry of senescence. Furthermore, from the comparative analysis of unspliced RNA and spliced RNA, we identify a significant portion of genes whose expression changes occur in spliced RNA but not to the same degree in unspliced RNA, indicating these gene expression changes are mainly controlled by post-transcriptional regulation. Overall, we demonstrate that snapTotal-seq can provide enriched information about gene regulation, especially during the transition between cell states.
Collapse
Affiliation(s)
- Yichi Niu
- Department of Molecular and Human Genetics, Houston, TX, USA
- Genetics & Genomics Program, Houston, TX, USA
| | - Jiayi Luo
- Department of Molecular and Human Genetics, Houston, TX, USA
- Cancer and Cell Biology Program, Houston, TX, USA
| | - Chenghang Zong
- Department of Molecular and Human Genetics, Houston, TX, USA.
- Genetics & Genomics Program, Houston, TX, USA.
- Cancer and Cell Biology Program, Houston, TX, USA.
- Integrative Molecular and Biomedical Sciences Program, Houston, TX, USA.
- Dan L Duncan Comprehensive Cancer Center, Houston, TX, USA.
- McNair Medical Institute, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
5
|
Khemka N, Morris G, Kazemzadeh L, Costard LS, Neubert V, Bauer S, Rosenow F, Venø MT, Kjems J, Henshall DC, Prehn JHM, Connolly NMC. Integrative network analysis of miRNA-mRNA expression profiles during epileptogenesis in rats reveals therapeutic targets after emergence of first spontaneous seizure. Sci Rep 2024; 14:15313. [PMID: 38961125 PMCID: PMC11222454 DOI: 10.1038/s41598-024-66117-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 06/27/2024] [Indexed: 07/05/2024] Open
Abstract
Epileptogenesis is the process by which a normal brain becomes hyperexcitable and capable of generating spontaneous recurrent seizures. The extensive dysregulation of gene expression associated with epileptogenesis is shaped, in part, by microRNAs (miRNAs) - short, non-coding RNAs that negatively regulate protein levels. Functional miRNA-mediated regulation can, however, be difficult to elucidate due to the complexity of miRNA-mRNA interactions. Here, we integrated miRNA and mRNA expression profiles sampled over multiple time-points during and after epileptogenesis in rats, and applied bi-clustering and Bayesian modelling to construct temporal miRNA-mRNA-mRNA interaction networks. Network analysis and enrichment of network inference with sequence- and human disease-specific information identified key regulatory miRNAs with the strongest influence on the mRNA landscape, and miRNA-mRNA interactions closely associated with epileptogenesis and subsequent epilepsy. Our findings underscore the complexity of miRNA-mRNA regulation, can be used to prioritise miRNA targets in specific systems, and offer insights into key regulatory processes in epileptogenesis with therapeutic potential for further investigation.
Collapse
Affiliation(s)
- Niraj Khemka
- Centre for Systems Medicine & Dept. of Physiology & Medical Physics, RCSI University of Medicine and Health Sciences, Dublin, Ireland
| | - Gareth Morris
- FutureNeuro SFI Research Centre, RCSI University of Medicine and Health Sciences, Dublin, Ireland
- Neuroscience, Physiology and Pharmacology, University College London, London, UK
- Division of Neuroscience, University of Manchester, Manchester, UK
| | - Laleh Kazemzadeh
- Centre for Systems Medicine & Dept. of Physiology & Medical Physics, RCSI University of Medicine and Health Sciences, Dublin, Ireland
| | - Lara S Costard
- Epilepsy Center, Department of Neurology, Philipps University Marburg, Marburg, Germany
- Epilepsy Center Frankfurt Rhine-Main, Neurocenter, University Hospital Frankfurt and Center for Personalized Translational Epilepsy Research, Goethe-University, Frankfurt, Germany
| | - Valentin Neubert
- Epilepsy Center, Department of Neurology, Philipps University Marburg, Marburg, Germany
- Epilepsy Center Frankfurt Rhine-Main, Neurocenter, University Hospital Frankfurt and Center for Personalized Translational Epilepsy Research, Goethe-University, Frankfurt, Germany
| | - Sebastian Bauer
- Epilepsy Center, Department of Neurology, Philipps University Marburg, Marburg, Germany
- Epilepsy Center Frankfurt Rhine-Main, Neurocenter, University Hospital Frankfurt and Center for Personalized Translational Epilepsy Research, Goethe-University, Frankfurt, Germany
| | - Felix Rosenow
- Epilepsy Center, Department of Neurology, Philipps University Marburg, Marburg, Germany
- Epilepsy Center Frankfurt Rhine-Main, Neurocenter, University Hospital Frankfurt and Center for Personalized Translational Epilepsy Research, Goethe-University, Frankfurt, Germany
| | - Morten T Venø
- Interdisciplinary Nanoscience Center, Dept. of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
- Omiics ApS, Aarhus, Denmark
| | - Jørgen Kjems
- Interdisciplinary Nanoscience Center, Dept. of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - David C Henshall
- Centre for Systems Medicine & Dept. of Physiology & Medical Physics, RCSI University of Medicine and Health Sciences, Dublin, Ireland
- FutureNeuro SFI Research Centre, RCSI University of Medicine and Health Sciences, Dublin, Ireland
| | - Jochen H M Prehn
- Centre for Systems Medicine & Dept. of Physiology & Medical Physics, RCSI University of Medicine and Health Sciences, Dublin, Ireland.
- FutureNeuro SFI Research Centre, RCSI University of Medicine and Health Sciences, Dublin, Ireland.
| | - Niamh M C Connolly
- Centre for Systems Medicine & Dept. of Physiology & Medical Physics, RCSI University of Medicine and Health Sciences, Dublin, Ireland.
- FutureNeuro SFI Research Centre, RCSI University of Medicine and Health Sciences, Dublin, Ireland.
| |
Collapse
|
6
|
Chee FT, Harun S, Mohd Daud K, Sulaiman S, Nor Muhammad NA. Exploring gene regulation and biological processes in insects: Insights from omics data using gene regulatory network models. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 189:1-12. [PMID: 38604435 DOI: 10.1016/j.pbiomolbio.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 04/13/2024]
Abstract
Gene regulatory network (GRN) comprises complicated yet intertwined gene-regulator relationships. Understanding the GRN dynamics will unravel the complexity behind the observed gene expressions. Insect gene regulation is often complicated due to their complex life cycles and diverse ecological adaptations. The main interest of this review is to have an update on the current mathematical modelling methods of GRNs to explain insect science. Several popular GRN architecture models are discussed, together with examples of applications in insect science. In the last part of this review, each model is compared from different aspects, including network scalability, computation complexity, robustness to noise and biological relevancy.
Collapse
Affiliation(s)
- Fong Ting Chee
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Sarahani Harun
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor, Malaysia
| | - Suhaila Sulaiman
- FGV R&D Sdn Bhd, FGV Innovation Center, PT23417 Lengkuk Teknologi, Bandar Baru Enstek, 71760 Nilai, Negeri Sembilan, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.
| |
Collapse
|
7
|
Sun L, Zhang A, Liang F. Time-varying dynamic Bayesian network learning for an fMRI study of emotion processing. Stat Med 2024; 43:2713-2733. [PMID: 38690642 PMCID: PMC11195441 DOI: 10.1002/sim.10096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 04/01/2024] [Accepted: 04/19/2024] [Indexed: 05/02/2024]
Abstract
This article presents a novel method for learning time-varying dynamic Bayesian networks. The proposed method breaks down the dynamic Bayesian network learning problem into a sequence of regression inference problems and tackles each problem using the Markov neighborhood regression technique. Notably, the method demonstrates scalability concerning data dimensionality, accommodates time-varying network structure, and naturally handles multi-subject data. The proposed method exhibits consistency and offers superior performance compared to existing methods in terms of estimation accuracy and computational efficiency, as supported by extensive numerical experiments. To showcase its effectiveness, we apply the proposed method to an fMRI study investigating the effective connectivity among various regions of interest (ROIs) during an emotion-processing task. Our findings reveal the pivotal role of the subcortical-cerebellum in emotion processing.
Collapse
Affiliation(s)
- Lizhe Sun
- Beijing International Center for Mathematical Research, Peking University and Department of Statistics, Purdue University
| | | | - Faming Liang
- Department of Statistics, Purdue University, West Lafayette, IN 47907
| |
Collapse
|
8
|
Unger Avila P, Padvitski T, Leote AC, Chen H, Saez-Rodriguez J, Kann M, Beyer A. Gene regulatory networks in disease and ageing. Nat Rev Nephrol 2024:10.1038/s41581-024-00849-7. [PMID: 38867109 DOI: 10.1038/s41581-024-00849-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/14/2024]
Abstract
The precise control of gene expression is required for the maintenance of cellular homeostasis and proper cellular function, and the declining control of gene expression with age is considered a major contributor to age-associated changes in cellular physiology and disease. The coordination of gene expression can be represented through models of the molecular interactions that govern gene expression levels, so-called gene regulatory networks. Gene regulatory networks can represent interactions that occur through signal transduction, those that involve regulatory transcription factors, or statistical models of gene-gene relationships based on the premise that certain sets of genes tend to be coexpressed across a range of conditions and cell types. Advances in experimental and computational technologies have enabled the inference of these networks on an unprecedented scale and at unprecedented precision. Here, we delineate different types of gene regulatory networks and their cell-biological interpretation. We describe methods for inferring such networks from large-scale, multi-omics datasets and present applications that have aided our understanding of cellular ageing and disease mechanisms.
Collapse
Affiliation(s)
- Paula Unger Avila
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Tsimafei Padvitski
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Ana Carolina Leote
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - He Chen
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Julio Saez-Rodriguez
- Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg University, Heidelberg, Germany
| | - Martin Kann
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Andreas Beyer
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany.
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
- Institute for Genetics, Faculty of Mathematics and Natural Sciences, University of Cologne, Cologne, Germany.
| |
Collapse
|
9
|
Roohani Y, Huang K, Leskovec J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat Biotechnol 2024; 42:927-935. [PMID: 37592036 PMCID: PMC11180609 DOI: 10.1038/s41587-023-01905-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 07/12/2023] [Indexed: 08/19/2023]
Abstract
Understanding cellular responses to genetic perturbation is central to numerous biomedical applications, from identifying genetic interactions involved in cancer to developing methods for regenerative medicine. However, the combinatorial explosion in the number of possible multigene perturbations severely limits experimental interrogation. Here, we present graph-enhanced gene activation and repression simulator (GEARS), a method that integrates deep learning with a knowledge graph of gene-gene relationships to predict transcriptional responses to both single and multigene perturbations using single-cell RNA-sequencing data from perturbational screens. GEARS is able to predict outcomes of perturbing combinations consisting of genes that were never experimentally perturbed. GEARS exhibited 40% higher precision than existing approaches in predicting four distinct genetic interaction subtypes in a combinatorial perturbation screen and identified the strongest interactions twice as well as prior approaches. Overall, GEARS can predict phenotypically distinct effects of multigene perturbations and thus guide the design of perturbational experiments.
Collapse
Affiliation(s)
- Yusuf Roohani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
10
|
Wu Z, Sinha S. SPREd: a simulation-supervised neural network tool for gene regulatory network reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae011. [PMID: 38444538 PMCID: PMC10913396 DOI: 10.1093/bioadv/vbae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/08/2023] [Accepted: 01/18/2024] [Indexed: 03/07/2024]
Abstract
Summary Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd," is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g. correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA, and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold-standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step toward incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction. Availability and implementation Data and code are available from https://github.com/iiiime/SPREd.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| |
Collapse
|
11
|
Hou L, Geng Z, Yuan Z, Shi X, Wang C, Chen F, Li H, Xue F. MRSL: a causal network pruning algorithm based on GWAS summary data. Brief Bioinform 2024; 25:bbae086. [PMID: 38487847 PMCID: PMC10940843 DOI: 10.1093/bib/bbae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 02/01/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Causal discovery is a powerful tool to disclose underlying structures by analyzing purely observational data. Genetic variants can provide useful complementary information for structure learning. Recently, Mendelian randomization (MR) studies have provided abundant marginal causal relationships of traits. Here, we propose a causal network pruning algorithm MRSL (MR-based structure learning algorithm) based on these marginal causal relationships. MRSL combines the graph theory with multivariable MR to learn the conditional causal structure using only genome-wide association analyses (GWAS) summary statistics. Specifically, MRSL utilizes topological sorting to improve the precision of structure learning. It proposes MR-separation instead of d-separation and three candidates of sufficient separating set for MR-separation. The results of simulations revealed that MRSL had up to 2-fold higher F1 score and 100 times faster computing time than other eight competitive methods. Furthermore, we applied MRSL to 26 biomarkers and 44 International Classification of Diseases 10 (ICD10)-defined diseases using GWAS summary data from UK Biobank. The results cover most of the expected causal links that have biological interpretations and several new links supported by clinical case reports or previous observational literatures.
Collapse
Affiliation(s)
- Lei Hou
- Beijing International Center for Mathematical Research, Peking University, Beijing, People’s Republic of China, 100871
| | - Zhi Geng
- School of Mathematics and Statistics, Beijing Technology and Business University, Beijing, People’s Republic of China, 100048
| | - Zhongshang Yuan
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
| | - Xu Shi
- Department of Biostatistics, University of Michigan, Ann Arbor, USA
| | - Chuan Wang
- Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Feng Chen
- School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Hongkai Li
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
| | - Fuzhong Xue
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| |
Collapse
|
12
|
Chaturvedi A, Som A. Inference of Dynamic Growth Regulatory Network in Cancer Using High-Throughput Transcriptomic Data. Methods Mol Biol 2024; 2719:51-77. [PMID: 37803112 DOI: 10.1007/978-1-0716-3461-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Growth is regulated by gene expression variation at different developmental stages of biological processes such as cell differentiation, disease progression, or drug response. In cancer, a stage-specific regulatory model constructed to infer the dynamic expression changes in genes contributing to tissue growth or proliferation is referred as a dynamic growth regulatory network (dGRN). Over the past decade, gene expression data has been widely used for reconstructing dGRN by computing correlations between the differentially expressed genes (DEGs). A wide variety of pipelines are available to construct the GRNs using DEGs and the choice of a particular method or tool depends on the nature of the study. In this protocol, we have outlined a step-by-step guide for the analysis of DEGs using RNA-Seq data, beginning from data acquisition, pre-processing, mapping to reference genome, and construction of a correlation-based co-expression network to further downstream analysis. We have also outlined the steps for the inclusion of publicly available interaction/regulation information into the dGRN followed by relevant topological inferences. This tutorial has been designed in a way that early researchers can refer to for an easy and comprehensive glimpse of methodologies used in the inference of dGRN using transcriptomics data.
Collapse
Affiliation(s)
- Aparna Chaturvedi
- Centre of Bioinformatics, Institute of Interdisciplinary Studies, University of Allahabad, Prayagraj, India
| | - Anup Som
- Centre of Bioinformatics, Institute of Interdisciplinary Studies, University of Allahabad, Prayagraj, India
| |
Collapse
|
13
|
Bernaola N, Michiels M, Larrañaga P, Bielza C. Learning massive interpretable gene regulatory networks of the human brain by merging Bayesian networks. PLoS Comput Biol 2023; 19:e1011443. [PMID: 38039337 PMCID: PMC10745139 DOI: 10.1371/journal.pcbi.1011443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 12/22/2023] [Accepted: 08/19/2023] [Indexed: 12/03/2023] Open
Abstract
We present the Fast Greedy Equivalence Search (FGES)-Merge, a new method for learning the structure of gene regulatory networks via merging locally learned Bayesian networks, based on the fast greedy equivalent search algorithm. The method is competitive with the state of the art in terms of the Matthews correlation coefficient, which takes into account both precision and recall, while also improving upon it in terms of speed, scaling up to tens of thousands of variables and being able to use empirical knowledge about the topological structure of gene regulatory networks. To showcase the ability of our method to scale to massive networks, we apply it to learning the gene regulatory network for the full human genome using data from samples of different brain structures (from the Allen Human Brain Atlas). Furthermore, this Bayesian network model should predict interactions between genes in a way that is clear to experts, following the current trends in explainable artificial intelligence. To achieve this, we also present a new open-access visualization tool that facilitates the exploration of massive networks and can aid in finding nodes of interest for experimental tests.
Collapse
Affiliation(s)
- Niko Bernaola
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
| | - Mario Michiels
- Centro Integral de Neurociencias Abarca Campal, Hospital Universitario HM Puerta del Sur, Madrid, Spain
| | - Pedro Larrañaga
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
| | - Concha Bielza
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
14
|
Kudo H, Han N, Yokoyama D, Matsumoto T, Chien MF, Kikuchi J, Inoue C. Bayesian network highlights the contributing factors for efficient arsenic phytoextraction by Pteris vittata in a contaminated field. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 899:165654. [PMID: 37478955 DOI: 10.1016/j.scitotenv.2023.165654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/17/2023] [Accepted: 07/17/2023] [Indexed: 07/23/2023]
Abstract
Phytoextraction is a low-cost and eco-friendly method for removing pollutants, such as arsenic (As), from contaminated soil. One of the most studied As hyperaccumulators for soil remediation include Pteris vittata. Although phytoextraction using plant-assisted microbes has been considered a promising soil remediation method, microbial harnessing has not been achieved due to the complex and difficult to understand interactions between microbes and plants. This problem can possibly be addressed with a multi-omics approach using a Bayesian network. However, limited studies have used Bayesian networks to analyze plant-microbe interactions. Therefore, to understand this complex interaction and to facilitate efficient As phytoextraction using microbial inoculants, we conducted field cultivation experiments at two sites with different total As contents (62 and 8.9 mg/kg). Metabolome and microbiome data were obtained from rhizosphere soil samples using nuclear magnetic resonance and high-throughput sequencing, respectively, and a Bayesian network was applied to the obtained multi-omics data. In a highly As-contaminated site, inoculation with Pseudomonas sp. strain m307, which is an arsenite-oxidizing microbe having multiple copies of the arsenite oxidase gene, increased As concentration in the shoots of P. vittata to 157.5 mg/kg under this treatment; this was 1.5-fold higher than that of the other treatments. Bayesian network demonstrated that strain m307 contributed to As accumulation in P. vittata. Furthermore, the network showed that microbes belonging to the MND1 order positively contributed to As accumulation in P. vittata. Based on the ecological characteristics of MND1, it was suggested that the rhizosphere of P. vittata inoculated with strain m307 was under low-nitrogen conditions. Strain m307 may have induced low-nitrogen conditions via arsenite oxidation accompanied by nitrate reduction, potentially resulting in microbial iron reduction or the prevention of microbial iron oxidation. These conditions may have enhanced the bioavailability of arsenate, leading to increased As accumulation in P. vittata.
Collapse
Affiliation(s)
- Hiroshi Kudo
- Graduate School of Environmental Studies, Tohoku University, 6-6-20 Aoba, Aramaki, Aoba-ku, Sendai, Miyagi 980-8579, Japan.
| | - Ning Han
- Graduate School of Environmental Studies, Tohoku University, 6-6-20 Aoba, Aramaki, Aoba-ku, Sendai, Miyagi 980-8579, Japan
| | - Daiki Yokoyama
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.; Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Tomoko Matsumoto
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Mei-Fang Chien
- Graduate School of Environmental Studies, Tohoku University, 6-6-20 Aoba, Aramaki, Aoba-ku, Sendai, Miyagi 980-8579, Japan
| | - Jun Kikuchi
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.; Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan; Graduate School of Bioagricultural Sciences, Nagoya University, 1 Furo-cho, Chikusa-ku, Nagoya, Aichi 464-0810, Japan
| | - Chihiro Inoue
- Graduate School of Environmental Studies, Tohoku University, 6-6-20 Aoba, Aramaki, Aoba-ku, Sendai, Miyagi 980-8579, Japan
| |
Collapse
|
15
|
Wu Z, Sinha S. SPREd: A simulation-supervised neural network tool for gene regulatory network reconstruction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566399. [PMID: 38014297 PMCID: PMC10680606 DOI: 10.1101/2023.11.09.566399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd" is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g., correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step towards incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30318, USA
| |
Collapse
|
16
|
Jiang Z, Chen C, Xu Z, Wang X, Zhang M, Zhang D. SIGNET: transcriptome-wide causal inference for gene regulatory networks. Sci Rep 2023; 13:19371. [PMID: 37938594 PMCID: PMC10632394 DOI: 10.1038/s41598-023-46295-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/30/2023] [Indexed: 11/09/2023] Open
Abstract
Gene regulation plays an important role in understanding the mechanisms of human biology and diseases. However, inferring causal relationships between all genes is challenging due to the large number of genes in the transcriptome. Here, we present SIGNET (Statistical Inference on Gene Regulatory Networks), a flexible software package that reveals networks of causal regulation between genes built upon large-scale transcriptomic and genotypic data at the population level. Like Mendelian randomization, SIGNET uses genotypic variants as natural instrumental variables to establish such causal relationships but constructs a transcriptome-wide gene regulatory network with high confidence. SIGNET makes such a computationally heavy task feasible by deploying a well-designed statistical algorithm over a parallel computing environment. It also provides a user-friendly interface allowing for parameter tuning, efficient parallel computing scheduling, interactive network visualization, and confirmatory results retrieval. The Open source SIGNET software is freely available ( https://www.zstats.org/signet/ ).
Collapse
Affiliation(s)
- Zhongli Jiang
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | | | - Zhenyu Xu
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | | | - Min Zhang
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
- Department of Epidemiology and Biostatistics, University of California, Irvine, CA, 92617, USA
| | - Dabao Zhang
- Department of Epidemiology and Biostatistics, University of California, Irvine, CA, 92617, USA.
| |
Collapse
|
17
|
Alipourfard B, Gao J. From correlation to causation using directed topological overlap matrix: Applications in genomics. Methods 2023; 219:58-67. [PMID: 37743033 DOI: 10.1016/j.ymeth.2023.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/23/2023] [Accepted: 09/06/2023] [Indexed: 09/26/2023] Open
Abstract
Most causal discovery tools assume the local causal Markov condition. However, the theoretical assumptions that underlie the local causal Markov condition are often not met in practice. This is especially marked in genomics, where the unwanted presence of measurement errors, averaging effects, and feedback loops significantly undermine the legitimacy of the local causal Markov condition. Furthermore, these causal discovery algorithms require very large samples, orders above what is often available. In this paper, relaxing the local causal Markov condition and using Reichenbach's common cause principle instead, we present a more flexible approach to causal discovery, the directed topological overlap matrix (DTOM). DTOM is robust w.r.t. the presence of measurement errors, averaging effects, feedback loops, and is significantly more sample efficient. We study the utility of DTOM for discovering causal relations in biological data using three real gene expression data-sets. We first examine if DTOM can help distinguish the Myostatin mutation in the Piedmontese cattle by contrasting the muscle transcriptomes of the Piedmontese and Wagyu crosses: the Myostatin mutation is the cause of the double-muscling the Piedmontese cattle are famous for. We then consider a large-scale gene deletion study in yeast. We show that DTOM allows us to distinguish the deleted gene in a sample knowing only the set of differentially expressed genes in that sample. We then examine the progression of Alzheimer's disease (AD) under the lens of DTOM. The genes implicated as having a causal role in the progression of AD by our DTOM analysis were significantly enriched in cellular components that had been repeatedly implicated in the progression of AD.
Collapse
Affiliation(s)
| | - Jean Gao
- University Of Texas At Arlington, 701 W Nedderman Dr, Arlington, 76013, TX, USA
| |
Collapse
|
18
|
Anwar MA, Arshed N, Tiwari AK. Nexus between biomass energy, economic growth, and ecological footprints: empirical investigation from belt and road initiative economies. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:115527-115542. [PMID: 37884709 DOI: 10.1007/s11356-023-30481-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Several emerging economies, including economies in belt and road initiative (BRI), are experiencing difficulty attaining sustainable development goals. The efficient utilization of biomass energy sources plays an essential role in attaining sustainable development goals, especially among developing economies. This study empirically investigates the ecological footprints, biomass energy demand, and per capita income association for 30 BRI economies from 1995 to 2021. The study incorporates cointegration and panel quantile regression (PQR) to identify the relationship among discussed variables. Empirical outcomes indicate a negative significant biomass energy demand and ecological footprints relationship, especially among the economies with high traits of ecological footprints. Moreover, the empirical findings also confirm the negative significant per capita income and ecological footprints relationship, while the square of per capita income approves a significant positive association with ecological footprints. These estimates confirm the EKC hypothesis among per capita income and ecological footprints. The findings of the current study help to determine the optimum level of modern biomass energy consumption, which helps to attain economic growth without compromising ecological sustainability.
Collapse
Affiliation(s)
- Muhammad Awais Anwar
- Department of Economics, Division of Management and Administrative Science, University of Education, Lahore, Pakistan.
| | - Noman Arshed
- Department of Economics, Division of Management and Administrative Science, University of Education, Lahore, Pakistan
| | | |
Collapse
|
19
|
Khatun R, Akter M, Islam MM, Uddin MA, Talukder MA, Kamruzzaman J, Azad AKM, Paul BK, Almoyad MAA, Aryal S, Moni MA. Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data. Genes (Basel) 2023; 14:1802. [PMID: 37761941 PMCID: PMC10530870 DOI: 10.3390/genes14091802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/10/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.
Collapse
Affiliation(s)
- Rabea Khatun
- Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka 1207, Bangladesh;
| | - Maksuda Akter
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Md. Manowarul Islam
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Md. Ashraf Uddin
- School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia; (M.A.U.); (S.A.)
| | - Md. Alamin Talukder
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Joarder Kamruzzaman
- Centre for Smart Analytics, Federation University Australia, Ballarat, VIC 3842, Australia;
| | - AKM Azad
- Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11564, Saudi Arabia;
| | - Bikash Kumar Paul
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh;
- Department of Software Engineering, Daffodil International University (DIU), Dhaka 1342, Bangladesh
| | - Muhammad Ali Abdulllah Almoyad
- Department of Basic Medical Sciences, College of Applied Medical Sciences in Khamis Mushyt King Khalid University, Abha 61412, Saudi Arabia;
| | - Sunil Aryal
- School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia; (M.A.U.); (S.A.)
| | - Mohammad Ali Moni
- Artificial Intelligence & Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
20
|
Cao X, Zhang L, Islam MK, Zhao M, He C, Zhang K, Liu S, Sha Q, Wei H. TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization. NAR Genom Bioinform 2023; 5:lqad083. [PMID: 37711605 PMCID: PMC10498345 DOI: 10.1093/nargab/lqad083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 05/30/2023] [Accepted: 08/30/2023] [Indexed: 09/16/2023] Open
Abstract
Four statistical selection methods for inferring transcription factor (TF)-target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed for inferring pathway gene regulatory networks (GRNs) by combining Huber or MSE loss function with a network (Net)-based penalty. To solve these regressions, we ameliorated an accelerated proximal gradient descent (APGD) algorithm to optimize parameter selection processes, resulting in an equally effective but much faster algorithm than the commonly used convex optimization solver. The synthetic data generated in a general setting was used to test four TF-TG identification methods, ENET-based methods performed better than Lasso-based methods. Synthetic data generated from two network settings was used to test Huber-Net and MSE-Net, which outperformed all other methods. The TF-TG identification methods were also tested with SND1 and gl3 overexpression transcriptomic data, Huber-ENET and MSE-ENET outperformed all other methods when genome-wide predictions were performed. The TF-TG identification methods fill the gap of lacking a method for genome-wide TG prediction of a TF, and potential for validating ChIP/DAP-seq results, while the two Net-based methods are instrumental for predicting pathway GRNs.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Ling Zhang
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Md Khairul Islam
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Mingxia Zhao
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Cheng He
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Hairong Wei
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| |
Collapse
|
21
|
Federico A, Kern J, Varelas X, Monti S. Structure Learning for Gene Regulatory Networks. PLoS Comput Biol 2023; 19:e1011118. [PMID: 37200395 DOI: 10.1371/journal.pcbi.1011118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 05/31/2023] [Accepted: 04/20/2023] [Indexed: 05/20/2023] Open
Abstract
Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput "omics" data typically available. To overcome this challenge, often referred to as the "small n, large p problem," we exploit known organizing principles of biological networks that are sparse, modular, and likely share a large portion of their underlying architecture. We present SHINE-Structure Learning for Hierarchical Networks-a framework for defining data-driven structural constraints and incorporating a shared learning paradigm for efficiently learning multiple Markov networks from high-dimensional data at large p/n ratios not previously feasible. We evaluated SHINE on Pan-Cancer data comprising 23 tumor types, and found that learned tumor-specific networks exhibit expected graph properties of real biological networks, recapture previously validated interactions, and recapitulate findings in literature. Application of SHINE to the analysis of subtype-specific breast cancer networks identified key genes and biological processes for tumor maintenance and survival as well as potential therapeutic targets for modulating known breast cancer disease genes.
Collapse
Affiliation(s)
- Anthony Federico
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Joseph Kern
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Xaralabos Varelas
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Stefano Monti
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| |
Collapse
|
22
|
Saint-Antoine M, Singh A. Benchmarking Gene Regulatory Network Inference Methods on Simulated and Experimental Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.12.540581. [PMID: 37215029 PMCID: PMC10197678 DOI: 10.1101/2023.05.12.540581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Although the challenge of gene regulatory network inference has been studied for more than a decade, it is still unclear how well network inference methods work when applied to real data. Attempts to benchmark these methods on experimental data have yielded mixed results, in which sometimes even the best methods fail to outperform random guessing, and in other cases they perform reasonably well. So, one of the most valuable contributions one can currently make to the field of network inference is to benchmark methods on experimental data for which the true underlying network is already known, and report the results so that we can get a clearer picture of their efficacy. In this paper, we report results from the first, to our knowledge, benchmarking of network inference methods on single cell E. coli transcriptomic data. We report a moderate level of accuracy for the methods, better than random chance but still far from perfect. We also find that some methods that were quite strong and accurate on microarray and bulk RNA-seq data did not perform as well on the single cell data. Additionally, we benchmark a simple network inference method (Pearson correlation), on data generated through computer simulations in order to draw conclusions about general best practices in network inference studies. We predict that network inference would be more accurate using proteomic data rather than transcriptomic data, which could become relevant if high-throughput proteomic experimental methods are developed in the future. We also show through simulations that using a simplified model of gene expression that skips the mRNA step tends to substantially overestimate the accuracy of network inference methods, and advise against using this model for future in silico benchmarking studies.
Collapse
Affiliation(s)
- Michael Saint-Antoine
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE USA 19716
| | - Abhyudai Singh
- Department of Electrical and Computer Engineering, Biomedical Engineering, Mathematical Sciences, Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE USA 19716
| |
Collapse
|
23
|
Wang Q, Guo M, Chen J, Duan R. A gene regulatory network inference model based on pseudo-siamese network. BMC Bioinformatics 2023; 24:163. [PMID: 37085776 PMCID: PMC10122305 DOI: 10.1186/s12859-023-05253-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 03/24/2023] [Indexed: 04/23/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) arise from the intricate interactions between transcription factors (TFs) and their target genes during the growth and development of organisms. The inference of GRNs can unveil the underlying gene interactions in living systems and facilitate the investigation of the relationship between gene expression patterns and phenotypic traits. Although several machine-learning models have been proposed for inferring GRNs from single-cell RNA sequencing (scRNA-seq) data, some of these models, such as Boolean and tree-based networks, suffer from sensitivity to noise and may encounter difficulties in handling the high noise and dimensionality of actual scRNA-seq data, as well as the sparse nature of gene regulation relationships. Thus, inferring large-scale information from GRNs remains a formidable challenge. RESULTS This study proposes a multilevel, multi-structure framework called a pseudo-Siamese GRN (PSGRN) for inferring large-scale GRNs from time-series expression datasets. Based on the pseudo-Siamese network, we applied a gated recurrent unit to capture the time features of each TF and target matrix and learn the spatial features of the matrices after merging by applying the DenseNet framework. Finally, we applied a sigmoid function to evaluate interactions. We constructed two maize sub-datasets, including gene expression levels and GRNs, using existing open-source maize multi-omics data and compared them to other GRN inference methods, including GENIE3, GRNBoost2, nonlinear ordinary differential equations, CNNC, and DGRNS. Our results show that PSGRN outperforms state-of-the-art methods. This study proposed a new framework: a PSGRN that allows GRNs to be inferred from scRNA-seq data, elucidating the temporal and spatial features of TFs and their target genes. The results show the model's robustness and generalization, laying a theoretical foundation for maize genotype-phenotype associations with implications for breeding work.
Collapse
Affiliation(s)
- Qian Wang
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
| | - Jian Chen
- College of Agronomy and Biotechnology, China Agricultural University, Beijing, China
| | - Ran Duan
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
24
|
Pandey AK, Loscalzo J. Network medicine: an approach to complex kidney disease phenotypes. Nat Rev Nephrol 2023:10.1038/s41581-023-00705-0. [PMID: 37041415 DOI: 10.1038/s41581-023-00705-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2023] [Indexed: 04/13/2023]
Abstract
Scientific reductionism has been the basis of disease classification and understanding for more than a century. However, the reductionist approach of characterizing diseases from a limited set of clinical observations and laboratory evaluations has proven insufficient in the face of an exponential growth in data generated from transcriptomics, proteomics, metabolomics and deep phenotyping. A new systematic method is necessary to organize these datasets and build new definitions of what constitutes a disease that incorporates both biological and environmental factors to more precisely describe the ever-growing complexity of phenotypes and their underlying molecular determinants. Network medicine provides such a conceptual framework to bridge these vast quantities of data while providing an individualized understanding of disease. The modern application of network medicine principles is yielding new insights into the pathobiology of chronic kidney diseases and renovascular disorders by expanding the understanding of pathogenic mediators, novel biomarkers and new options for renal therapeutics. These efforts affirm network medicine as a robust paradigm for elucidating new advances in the diagnosis and treatment of kidney disorders.
Collapse
Affiliation(s)
- Arvind K Pandey
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, USA
| | - Joseph Loscalzo
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
25
|
Furxhi I, Bengalli R, Motta G, Mantecca P, Kose O, Carriere M, Haq EU, O’Mahony C, Blosi M, Gardini D, Costa A. Data-Driven Quantitative Intrinsic Hazard Criteria for Nanoproduct Development in a Safe-by-Design Paradigm: A Case Study of Silver Nanoforms. ACS APPLIED NANO MATERIALS 2023; 6:3948-3962. [PMID: 36938492 PMCID: PMC10012170 DOI: 10.1021/acsanm.3c00173] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 01/20/2023] [Indexed: 06/18/2023]
Abstract
The current European (EU) policies, that is, the Green Deal, envisage safe and sustainable practices for chemicals, which include nanoforms (NFs), at the earliest stages of innovation. A theoretically safe and sustainable by design (SSbD) framework has been established from EU collaborative efforts toward the definition of quantitative criteria in each SSbD dimension, namely, the human and environmental safety dimension and the environmental, social, and economic sustainability dimensions. In this study, we target the safety dimension, and we demonstrate the journey toward quantitative intrinsic hazard criteria derived from findable, accessible, interoperable, and reusable data. Data were curated and merged for the development of new approach methodologies, that is, quantitative structure-activity relationship models based on regression and classification machine learning algorithms, with the intent to predict a hazard class. The models utilize system (i.e., hydrodynamic size and polydispersity index) and non-system (i.e., elemental composition and core size)-dependent nanoscale features in combination with biological in vitro attributes and experimental conditions for various silver NFs, functional antimicrobial textiles, and cosmetics applications. In a second step, interpretable rules (criteria) followed by a certainty factor were obtained by exploiting a Bayesian network structure crafted by expert reasoning. The probabilistic model shows a predictive capability of ≈78% (average accuracy across all hazard classes). In this work, we show how we shifted from the conceptualization of the SSbD framework toward the realistic implementation with pragmatic instances. This study reveals (i) quantitative intrinsic hazard criteria to be considered in the safety aspects during synthesis stage, (ii) the challenges within, and (iii) the future directions for the generation and distillation of such criteria that can feed SSbD paradigms. Specifically, the criteria can guide material engineers to synthesize NFs that are inherently safer from alternative nanoformulations, at the earliest stages of innovation, while the models enable a fast and cost-efficient in silico toxicological screening of previously synthesized and hypothetical scenarios of yet-to-be synthesized NFs.
Collapse
Affiliation(s)
- Irini Furxhi
- Transgero
Ltd, Limerick V42V384, Ireland
- Department
of Accounting and Finance, Kemmy Business School, University of Limerick, Limerick V94T9PX, Ireland
| | - Rossella Bengalli
- Department
of Earth and Environmental Sciences, University
of Milano-Bicocca, Piazza
della Scienza 1, Milano 20126, Italy
| | - Giulia Motta
- Department
of Earth and Environmental Sciences, University
of Milano-Bicocca, Piazza
della Scienza 1, Milano 20126, Italy
| | - Paride Mantecca
- Department
of Earth and Environmental Sciences, University
of Milano-Bicocca, Piazza
della Scienza 1, Milano 20126, Italy
| | - Ozge Kose
- Univ.
Grenoble Alpes, CEA, CNRS, Grenoble INP, IRIG, SYMMES, Grenoble 38000, France
| | - Marie Carriere
- Univ.
Grenoble Alpes, CEA, CNRS, Grenoble INP, IRIG, SYMMES, Grenoble 38000, France
| | - Ehtsham Ul Haq
- Department
of Physics, and Bernal Institute, University
of Limerick, Limerick V94TC9PX, Ireland
| | - Charlie O’Mahony
- Department
of Physics, and Bernal Institute, University
of Limerick, Limerick V94TC9PX, Ireland
| | - Magda Blosi
- Istituto
di Scienza e Tecnologia dei Materiali Ceramici (CNR-ISTEC), Via Granarolo, 64, Faenza 48018, Ravenna, Italy
| | - Davide Gardini
- Istituto
di Scienza e Tecnologia dei Materiali Ceramici (CNR-ISTEC), Via Granarolo, 64, Faenza 48018, Ravenna, Italy
| | - Anna Costa
- Istituto
di Scienza e Tecnologia dei Materiali Ceramici (CNR-ISTEC), Via Granarolo, 64, Faenza 48018, Ravenna, Italy
| |
Collapse
|
26
|
Constrained expectation-maximisation for inference of social graphs explaining online user–user interactions. SOCIAL NETWORK ANALYSIS AND MINING 2023. [DOI: 10.1007/s13278-023-01037-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
27
|
Zhao H, Datta S, Duan ZH. An Integrated Approach of Learning Genetic Networks From Genome-Wide Gene Expression Data Using Gaussian Graphical Model and Monte Carlo Method. Bioinform Biol Insights 2023; 17:11779322231152972. [PMID: 36865982 PMCID: PMC9972065 DOI: 10.1177/11779322231152972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 01/02/2023] [Indexed: 03/02/2023] Open
Abstract
Global genetic networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single genes or local networks. The Gaussian graphical model (GGM) is widely applied to learn genetic networks because it defines an undirected graph decoding the conditional dependence between genes. Many algorithms based on the GGM have been proposed for learning genetic network structures. Because the number of gene variables is typically far more than the number of samples collected, and a real genetic network is typically sparse, the graphical lasso implementation of GGM becomes a popular tool for inferring the conditional interdependence among genes. However, graphical lasso, although showing good performance in low dimensional data sets, is computationally expensive and inefficient or even unable to work directly on genome-wide gene expression data sets. In this study, the method of Monte Carlo Gaussian graphical model (MCGGM) was proposed to learn global genetic networks of genes. This method uses a Monte Carlo approach to sample subnetworks from genome-wide gene expression data and graphical lasso to learn the structures of the subnetworks. The learned subnetworks are then integrated to approximate a global genetic network. The proposed method was evaluated with a relatively small real data set of RNA-seq expression levels. The results indicate the proposed method shows a strong ability of decoding the interactions with high conditional dependences among genes. The method was then applied to genome-wide data sets of RNA-seq expression levels. The gene interactions with high interdependence from the estimated global networks show that most of the predicted gene-gene interactions have been reported in the literatures playing important roles in different human cancers. Also, the results validate the ability and reliability of the proposed method to identify high conditional dependences among genes in large-scale data sets.
Collapse
Affiliation(s)
- Haitao Zhao
- Department of Mathematics and Computer
Science, The University of North Carolina at Pembroke, Pembroke, NC, USA,Haitao Zhao, Department of Mathematics and
Computer Science, The University of North Carolina at Pembroke, Pembroke, NC,
USA.
| | - Sujay Datta
- Department of Statistics, The
University of Akron, Akron, OH, USA
| | - Zhong-Hui Duan
- Department of Computer Science, The
University of Akron, Akron, OH, USA
| |
Collapse
|
28
|
Valentim CA, Rabi JA, David SA. Cellular-automaton model for tumor growth dynamics: Virtualization of different scenarios. Comput Biol Med 2023; 153:106481. [PMID: 36587567 DOI: 10.1016/j.compbiomed.2022.106481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/08/2022] [Accepted: 12/25/2022] [Indexed: 12/29/2022]
Abstract
Mathematical Oncology has emerged as a research field that applies either continuous or discrete models to mathematically describe cancer-related phenomena. Such methods are usually expressed in terms of differential equations, however tumor composition involves specific cellular structure and can demonstrate probabilistic nature, often requiring tailor-made approaches. In this context, cell-based models allow monitoring independent single parameters, which might vary in both time and space. By relying on extant tumor growth models in the literature, this study introduces cellular-automata simulation strategies that admit heterogeneous cell population while capturing both single-cell and cluster-cell behaviors. In this agent-based computational model, tumor cells are limited to follow four possible courses of action, namely: proliferation, migration, apoptosis or quiescence. Despite the apparent simplicity of those actions, the model can represent different complex tumor features depending on parameter settings. This study virtualized five different scenarios, showcasing model capabilities of representing tumor dynamics including alternate dormancy periods, cell death instability and cluster formation. Implementation techniques are also explored together with prospective model expansion towards deterministic features. The proposed stochastic cellular automaton model is able to effectively simulate different scenarios regarding tumor growth effectively, figuring as an interesting tool for in silico modeling, with promising capabilities of expansion to support research in mathematical oncology, thus improving diagnosis tools and/or personalized treatment.
Collapse
Affiliation(s)
- Carlos A Valentim
- Department of Biosystems Engineering, University of São Paulo, Pirassununga, Brazil.
| | - José A Rabi
- Department of Biosystems Engineering, University of São Paulo, Pirassununga, Brazil.
| | - Sergio A David
- Department of Biosystems Engineering, University of São Paulo, Pirassununga, Brazil.
| |
Collapse
|
29
|
Alsharaiah MA, Samarasinghe S, Kulasiri D. Proteins as fuzzy controllers: Auto tuning a biological fuzzy inference system to predict protein dynamics in complex biological networks. Biosystems 2023; 224:104826. [PMID: 36610587 DOI: 10.1016/j.biosystems.2023.104826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 11/30/2022] [Accepted: 01/02/2023] [Indexed: 01/06/2023]
Abstract
Biological systems such as mammalian cell cycle are complex systems consisting of a large number of molecular species interacting in ways that produce complex nonlinear systems dynamics. Discrete models such as Boolean models and continuous models such as Ordinary Differential Equations (ODEs) have been widely used to study these systems. Boolean models are simple and can capture qualitative systems behaviour, but they cannot capture the continuous trends of protein concentrations, while ODE models capture continuous trends but require kinetics parameters that are limited. Further, as systems get larger, complexity of these models becomes an issue for parameterization, analysis and interpretation. Also, molecular systems operate under the conditions of uncertainty and noise and our understanding of molecular processes in general is more at a qualitative level characterised by vagueness, imprecision and ambiguity. Hence, as more data are generated, there is a greater need for simpler data driven methods that can approximate continuous system behaviour while representing vagueness and ambiguity without requiring kinetic parameters. Fuzzy inferencing is one such promising method with the ability to work with qualitative vague/imprecise biological knowledge. In this study, we propose a fuzzy inference system for representing continuous behaviour of proteins and apply to some key proteins in the mammalian cell cycle system. The methods we introduced here is novel to protein interaction systems and cell cycle proteins. Our study proposes a three-stage approach to develop fuzzy protein controllers. In stage one, protein system is studied for interactions. We studied some significant core controllers of mammalian cell cycle and their producers and degraders as presented in a published ODE model. Based on the observations from a dataset generated from it, we developed Fuzzy inference systems (FIS) in the second stage, that involved deriving fuzzy IF-THEN rules and their processing, and manually tuned the FIS to predict the dynamics of individual proteins. In stage three, we employed Particle Swarm Optimisation (PSO) for optimising the FIS to further enhance prediction accuracy. Systems dynamics simulation results of the optimised FIS models were in close agreement with the benchmark ODE model results. The results show that the FIS models provide a close approximation to the comprehensive benchmark model in robustly representing continuous protein dynamics while representing the control of protein behavior in an intuitive and transparent format without requiring kinetic parameters. Therefore, FIS models can be an alternative to ODEs in network modelling. Further, FIS models can be assembled to develop large complex systems without losing information or accuracy.
Collapse
Affiliation(s)
| | - Sandhya Samarasinghe
- Complex Systems, Big Data and Informatics Initiative (CSBII), Lincoln University, Christchurch, New Zealand; Centre for Advanced Computational Solutions, Lincoln University, Christchurch, New Zealand.
| | - Don Kulasiri
- Complex Systems, Big Data and Informatics Initiative (CSBII), Lincoln University, Christchurch, New Zealand; Centre for Advanced Computational Solutions, Lincoln University, Christchurch, New Zealand
| |
Collapse
|
30
|
Choi S, Kim Y, Park G. Densely connected sub-Gaussian linear structural equation model learning via ℓ1- and ℓ2-regularized regressions. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2023.107691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
31
|
Ye Q, Guo NL. Inferencing Bulk Tumor and Single-Cell Multi-Omics Regulatory Networks for Discovery of Biomarkers and Therapeutic Targets. Cells 2022; 12:cells12010101. [PMID: 36611894 PMCID: PMC9818242 DOI: 10.3390/cells12010101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/28/2022] Open
Abstract
There are insufficient accurate biomarkers and effective therapeutic targets in current cancer treatment. Multi-omics regulatory networks in patient bulk tumors and single cells can shed light on molecular disease mechanisms. Integration of multi-omics data with large-scale patient electronic medical records (EMRs) can lead to the discovery of biomarkers and therapeutic targets. In this review, multi-omics data harmonization methods were introduced, and common approaches to molecular network inference were summarized. Our Prediction Logic Boolean Implication Networks (PLBINs) have advantages over other methods in constructing genome-scale multi-omics networks in bulk tumors and single cells in terms of computational efficiency, scalability, and accuracy. Based on the constructed multi-modal regulatory networks, graph theory network centrality metrics can be used in the prioritization of candidates for discovering biomarkers and therapeutic targets. Our approach to integrating multi-omics profiles in a patient cohort with large-scale patient EMRs such as the SEER-Medicare cancer registry combined with extensive external validation can identify potential biomarkers applicable in large patient populations. These methodologies form a conceptually innovative framework to analyze various available information from research laboratories and healthcare systems, accelerating the discovery of biomarkers and therapeutic targets to ultimately improve cancer patient survival outcomes.
Collapse
Affiliation(s)
- Qing Ye
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Department of Occupational and Environmental Health Sciences, School of Public Health, West Virginia University, Morgantown, WV 26506, USA
- Correspondence: ; Tel.: +1-304-293-6455
| |
Collapse
|
32
|
Discovery and classification of complex multimorbidity patterns: unravelling chronicity networks and their social profiles. Sci Rep 2022; 12:20004. [PMID: 36411299 PMCID: PMC9678882 DOI: 10.1038/s41598-022-23617-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 11/02/2022] [Indexed: 11/23/2022] Open
Abstract
Multimorbidity can be defined as the presence of two or more chronic diseases in an individual. This condition is associated with reduced quality of life, increased disability, greater functional impairment, increased health care utilisation, greater fragmentation of care and complexity of treatment, and increased mortality. Thus, understanding its epidemiology and inherent complexity is essential to improve the quality of life of patients and to reduce the costs associated with multi-pathology. In this paper, using data from the European Health Survey, we explore the application of Mixed Graphical Models and its combination with social network analysis techniques for the discovery and classification of complex multimorbidity patterns. The results obtained show the usefulness and versatility of this approach for the study of multimorbidity based on the use of graphs, which offer the researcher a holistic view of the relational structure of data with variables of different types and high dimensionality.
Collapse
|
33
|
Das T, Kaur H, Gour P, Prasad K, Lynn AM, Prakash A, Kumar V. Intersection of network medicine and machine learning towards investigating the key biomarkers and pathways underlying amyotrophic lateral sclerosis: a systematic review. Brief Bioinform 2022; 23:6780269. [PMID: 36411673 DOI: 10.1093/bib/bbac442] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 08/12/2022] [Accepted: 09/13/2022] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Network medicine is an emerging area of research that focuses on delving into the molecular complexity of the disease, leading to the discovery of network biomarkers and therapeutic target discovery. Amyotrophic lateral sclerosis (ALS) is a complicated rare disease with unknown pathogenesis and no available treatment. In ALS, network properties appear to be potential biomarkers that can be beneficial in disease-related applications when explored independently or in tandem with machine learning (ML) techniques. OBJECTIVE This systematic literature review explores recent trends in network medicine and implementations of network-based ML algorithms in ALS. We aim to provide an overview of the identified primary studies and gather details on identifying the potential biomarkers and delineated pathways. METHODS The current study consists of searching for and investigating primary studies from PubMed and Dimensions.ai, published between 2018 and 2022 that reported network medicine perspectives and the coupling of ML techniques. Each abstract and full-text study was individually evaluated, and the relevant studies were finally included in the review for discussion once they met the inclusion and exclusion criteria. RESULTS We identified 109 eligible publications from primary studies representing this systematic review. The data coalesced into two themes: application of network science to identify disease modules and promising biomarkers in ALS, along with network-based ML approaches. Conclusion This systematic review gives an overview of the network medicine approaches and implementations of network-based ML algorithms in ALS to determine new disease genes, and identify critical pathways and therapeutic target discovery for personalized treatment.
Collapse
Affiliation(s)
- Trishala Das
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India
| | - Harbinder Kaur
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India
| | - Pratibha Gour
- Dept. of Plant Molecular Biology, University of Delhi, South Campus, New Delhi-110021, India
| | - Kartikay Prasad
- Amity Institute of Neuropsychology & Neurosciences (AINN), Amity University, Noida, UP-201303, India
| | - Andrew M Lynn
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India
| | - Amresh Prakash
- Amity Institute of Integrative Sciences and Health, Amity University Haryana, Gurgaon-122413, India
| | - Vijay Kumar
- Amity Institute of Neuropsychology & Neurosciences (AINN), Amity University, Noida, UP-201303, India
| |
Collapse
|
34
|
Bhandari N, Walambe R, Kotecha K, Khare SP. A comprehensive survey on computational learning methods for analysis of gene expression data. Front Mol Biosci 2022; 9:907150. [PMID: 36458095 PMCID: PMC9706412 DOI: 10.3389/fmolb.2022.907150] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 09/28/2022] [Indexed: 09/19/2023] Open
Abstract
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
Collapse
Affiliation(s)
- Nikita Bhandari
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Rahee Walambe
- Electronics and Telecommunication Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Ketan Kotecha
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Satyajeet P. Khare
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, India
| |
Collapse
|
35
|
Wang Q, Dong A, Zhao J, Wang C, Griffin C, Gragnoli C, Xue F, Wu R. Vaginal microbiota networks as a mechanistic predictor of aerobic vaginitis. Front Microbiol 2022; 13:998813. [DOI: 10.3389/fmicb.2022.998813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 09/09/2022] [Indexed: 11/13/2022] Open
Abstract
Aerobic vaginitis (AV) is a complex vaginal dysbiosis that is thought to be caused by the micro-ecological change of the vaginal microbiota. While most studies have focused on how changes in the abundance of individual microbes are associated with the emergence of AV, we still do not have a complete mechanistic atlas of the microbe-AV link. Network modeling is central to understanding the structure and function of any microbial community assembly. By encapsulating the abundance of microbes as nodes and ecological interactions among microbes as edges, microbial networks can reveal how each microbe functions and how one microbe cooperate or compete with other microbes to mediate the dynamics of microbial communities. However, existing approaches can only estimate either the strength of microbe-microbe link or the direction of this link, failing to capture full topological characteristics of a network, especially from high-dimensional microbial data. We combine allometry scaling law and evolutionary game theory to derive a functional graph theory that can characterize bidirectional, signed, and weighted interaction networks from any data domain. We apply our theory to characterize the causal interdependence between microbial interactions and AV. From functional networks arising from different functional modules, we find that, as the only favorable genus from Firmicutes among all identified genera, the role of Lactobacillus in maintaining vaginal microbial symbiosis is enabled by upregulation from other microbes, rather than through any intrinsic capacity. Among Lactobacillus species, the proportion of L. crispatus to L. iners is positively associated with more healthy acid vaginal ecosystems. In a less healthy alkaline ecosystem, L. crispatus establishes a contradictory relationship with other microbes, leading to population decrease relative to L. iners. We identify topological changes of vaginal microbiota networks when the menstrual cycle of women changes from the follicular to luteal phases. Our network tool provides a mechanistic approach to disentangle the internal workings of the microbiota assembly and predict its causal relationships with human diseases including AV.
Collapse
|
36
|
Kelly J, Berzuini C, Keavney B, Tomaszewski M, Guo H. A review of causal discovery methods for molecular network analysis. Mol Genet Genomic Med 2022; 10:e2055. [PMID: 36087049 PMCID: PMC9544222 DOI: 10.1002/mgg3.2055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/12/2022] [Accepted: 08/18/2022] [Indexed: 11/08/2022] Open
Abstract
BACKGROUND With the increasing availability and size of multi-omics datasets, investigating the casual relationships between molecular phenotypes has become an important aspect of exploring underlying biology andgenetics. There are an increasing number of methodlogies that have been developed and applied to moleular networks to investigate these causal interactions. METHODS We have introduced and reviewed the available methods for building large-scale causal molecular networks that have been developed and applied in the past decade. RESULTS In this review we have identified and summarized the existing methods for infering causality in large-scale causal molecular networks, and discussed important factors that will need to be considered in future research in this area. CONCLUSION Existing methods to infering causal molecular networks have their own strengths and limitations so there is no one best approach, and it is instead down to the discretion of the researcher. This review also to discusses some of the current limitations to biological interpretation of these networks, and important factors to consider for future studies on molecular networks.
Collapse
Affiliation(s)
- Jack Kelly
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| | - Carlo Berzuini
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| | - Bernard Keavney
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
- Division of Cardiology and Manchester Academic Health Science CentreManchester University NHS Foundation TrustManchesterUK
| | - Maciej Tomaszewski
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
- Manchester Heart Centre and Manchester Academic Health Science CentreManchester University NHS Foundation TrustManchesterUK
| | - Hui Guo
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| |
Collapse
|
37
|
Shan D, Ali M, Shahid M, Arif A, Waheed MQ, Xia X, Trethowan R, Tester M, Poland J, Ogbonnaya FC, Rasheed A, He Z, Li H. Genetic networks underlying salinity tolerance in wheat uncovered with genome-wide analyses and selective sweeps. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:2925-2941. [PMID: 35915266 DOI: 10.1007/s00122-022-04153-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 06/16/2022] [Indexed: 06/15/2023]
Abstract
A genetic framework underpinning salinity tolerance at reproductive stage was revealed by genome-wide SNP markers and major adaptability genes in synthetic-derived wheats, and trait-associated loci were used to predict phenotypes. Using wild relatives of crops to identify genes related to improved productivity and resilience to climate extremes is a prioritized area of crop genetic improvement. High salinity is a widespread crop production constraint, and development of salt-tolerant cultivars is a sustainable solution. We evaluated a panel of 294 wheat accessions comprising synthetic-derived wheat lines (SYN-DERs) and modern bread wheat advanced lines under control and high salinity conditions at two locations. The GWAS analysis revealed a quantitative genetic framework of more than 200 loci with minor effect underlying salinity tolerance at reproductive stage. The significant trait-associated SNPs were used to predict phenotypes using a GBLUP model, and the prediction accuracy (r2) ranged between 0.57 and 0.74. The r2 values for flag leaf weight, days to flowering, biomass, and number of spikes per plant were all above 0.70, validating the phenotypic effects of the loci discovered in this study. Furthermore, the germplasm sets were compared to identify selection sweeps associated with salt tolerance loci in SYN-DERs. Six loci associated with salinity tolerance were found to be differentially selected in the SYN-DERs (12.4 Mb on chromosome (chr)1B, 7.1 Mb on chr2A, 11.2 Mb on chr2D, 200 Mb on chr3D, 600 Mb on chr6B, and 700.9 Mb on chr7B). A total of 228 reported markers and genes, including 17 well-characterized genes, were uncovered using GWAS and EigenGWAS. A linkage disequilibrium (LD) block on chr5A, including the Vrn-A1 gene at 575 Mb and its homeologs on chr5D, were strongly associated with multiple yield-related traits and flowering time under salinity stress conditions. The diversity panel was screened with more than 68 kompetitive allele-specific PCR (KASP) markers of functional genes in wheat, and the pleiotropic effects of superior alleles of Rht-1, TaGASR-A1, and TaCwi-A1 were revealed under salinity stress. To effectively utilize the extensive genetic information obtained from the GWAS analysis, a genetic interaction network was constructed to reveal correlations among the investigated traits. The genetic network data combined with GWAS, selective sweeps, and the functional gene survey provided a quantitative genetic framework for identifying differentially retained loci associated with salinity tolerance in wheat.
Collapse
Affiliation(s)
- Danting Shan
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT-China Office, 12 Zhongguancun South Street, Beijing, 100081, China
- Nanfan Research Institute, CAAS, Sanya, 572024, Hainan, China
| | - Mohsin Ali
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT-China Office, 12 Zhongguancun South Street, Beijing, 100081, China
- Nanfan Research Institute, CAAS, Sanya, 572024, Hainan, China
| | - Mohammed Shahid
- International Center for Biosaline Agriculture (ICBA), Al Ruwayyah 2, Academic City, Dubai, UAE
| | - Anjuman Arif
- National Institute of Agriculture and Biology (NIAB), Faisalabad, Pakistan
| | | | - Xianchun Xia
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT-China Office, 12 Zhongguancun South Street, Beijing, 100081, China
| | - Richard Trethowan
- Plant Breeding Institute, School of Life and Environmental Sciences, The University of Sydney, Sydney, 2006, Australia
| | - Mark Tester
- Division of Biological and Environmental Sciences and Engineering (BESE), King Abdullah University of Science and Technology (KASUT), Thuwal, 23955-6900, Saudi Arabia
| | - Jesse Poland
- Division of Biological and Environmental Sciences and Engineering (BESE), King Abdullah University of Science and Technology (KASUT), Thuwal, 23955-6900, Saudi Arabia
- Kansas State University, Manhattan, KS, USA
| | | | - Awais Rasheed
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT-China Office, 12 Zhongguancun South Street, Beijing, 100081, China.
| | - Zhonghu He
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT-China Office, 12 Zhongguancun South Street, Beijing, 100081, China
| | - Huihui Li
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT-China Office, 12 Zhongguancun South Street, Beijing, 100081, China.
- Nanfan Research Institute, CAAS, Sanya, 572024, Hainan, China.
| |
Collapse
|
38
|
Dirmeier S, Beerenwinkel N. Structured hierarchical models for probabilistic inference from perturbation screening data. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Simon Dirmeier
- Department of Biosystems Science and Engineering, ETH Zurich
| | | |
Collapse
|
39
|
Gamage HN, Chetty M, Shatte A, Hallinan J. Filter feature selection based Boolean Modelling for Genetic Network Inference. Biosystems 2022; 221:104757. [PMID: 36007675 DOI: 10.1016/j.biosystems.2022.104757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 08/04/2022] [Accepted: 08/04/2022] [Indexed: 11/02/2022]
Abstract
The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency.
Collapse
Affiliation(s)
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | - Adrian Shatte
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | | |
Collapse
|
40
|
Chowdhury S, Wang R, Yu Q, Huntoon CJ, Karnitz LM, Kaufmann SH, Gygi SP, Birrer MJ, Paulovich AG, Peng J, Wang P. DAGBagM: learning directed acyclic graphs of mixed variables with an application to identify protein biomarkers for treatment response in ovarian cancer. BMC Bioinformatics 2022; 23:321. [PMID: 35931981 PMCID: PMC9354326 DOI: 10.1186/s12859-022-04864-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Applying directed acyclic graph (DAG) models to proteogenomic data has been shown effective for detecting causal biomarkers of complex diseases. However, there remain unsolved challenges in DAG learning to jointly model binary clinical outcome variables and continuous biomarker measurements. RESULTS In this paper, we propose a new tool, DAGBagM, to learn DAGs with both continuous and binary nodes. By using appropriate models, DAGBagM allows for either continuous or binary nodes to be parent or child nodes. It employs a bootstrap aggregating strategy to reduce false positives in edge inference. At the same time, the aggregation procedure provides a flexible framework to robustly incorporate prior information on edges. CONCLUSIONS Through extensive simulation experiments, we demonstrate that DAGBagM has superior performance compared to alternative strategies for modeling mixed types of nodes. In addition, DAGBagM is computationally more efficient than two competing methods. When applying DAGBagM to proteogenomic datasets from ovarian cancer studies, we identify potential protein biomarkers for platinum refractory/resistant response in ovarian cancer. DAGBagM is made available as a github repository at https://github.com/jie108/dagbagM .
Collapse
Affiliation(s)
- Shrabanti Chowdhury
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Ru Wang
- Department of Statistics, University of California, Davis, CA, 95616, USA
| | - Qing Yu
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Catherine J Huntoon
- Division of Oncology Research and Department of Oncology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Larry M Karnitz
- Division of Oncology Research and Department of Oncology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Scott H Kaufmann
- Division of Oncology Research, Mayo Clinic, Rochester, MN, 55905, USA
| | - Steven P Gygi
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Michael J Birrer
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Amanda G Paulovich
- Clinical Research Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jie Peng
- Department of Statistics, University of California, Davis, CA, 95616, USA.
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
41
|
Pušnik Ž, Mraz M, Zimic N, Moškon M. Review and assessment of Boolean approaches for inference of gene regulatory networks. Heliyon 2022; 8:e10222. [PMID: 36033302 PMCID: PMC9403406 DOI: 10.1016/j.heliyon.2022.e10222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 04/22/2022] [Accepted: 08/03/2022] [Indexed: 10/25/2022] Open
Abstract
Boolean descriptions of gene regulatory networks can provide an insight into interactions between genes. Boolean networks hold predictive power, are easy to understand, and can be used to simulate the observed networks in different scenarios. We review fundamental and state-of-the-art methods for inference of Boolean networks. We introduce a methodology for a straightforward evaluation of Boolean inference approaches based on the generation of evaluation datasets, application of selected inference methods, and evaluation of performance measures to guide the selection of the best method for a given inference problem. We demonstrate this procedure on inference methods REVEAL (REVerse Engineering ALgorithm), Best-Fit Extension, MIBNI (Mutual Information-based Boolean Network Inference), GABNI (Genetic Algorithm-based Boolean Network Inference) and ATEN (AND/OR Tree ENsemble algorithm), which infers Boolean descriptions of gene regulatory networks from discretised time series data. Boolean inference approaches tend to perform better in terms of dynamic accuracy, and slightly worse in terms of structural correctness. We believe that the proposed methodology and provided guidelines will help researchers to develop Boolean inference approaches with a good predictive capability while maintaining structural correctness and biological relevance.
Collapse
Affiliation(s)
- Žiga Pušnik
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Miha Mraz
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Nikolaj Zimic
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Miha Moškon
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| |
Collapse
|
42
|
Gupta S, Vundavilli H, Osorio RSA, Itoh MN, Mohsen A, Datta A, Mizuguchi K, Tripathi LP. Integrative Network Modeling Highlights the Crucial Roles of Rho-GDI Signaling Pathway in the Progression of Non-Small Cell Lung Cancer. IEEE J Biomed Health Inform 2022; 26:4785-4793. [PMID: 35820010 DOI: 10.1109/jbhi.2022.3190038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Non-small cell lung cancer (NSCLC) is the most prevalent form of lung cancer and a leading cause of cancer-related deaths worldwide. Using an integrative approach, we analyzed a publicly available merged NSCLC transcriptome dataset using machine learning, protein-protein interaction (PPI) networks and bayesian modeling to pinpoint key cellular factors and pathways likely to be involved with the onset and progression of NSCLC. First, we generated multiple prediction models using various machine learning classifiers to classify NSCLC and healthy cohorts. Our models achieved prediction accuracies ranging from 0.83 to 1.0, with XGBoost emerging as the best performer. Next, using functional enrichment analysis (and gene co-expression network analysis with WGCNA) of the machine learning feature-selected genes, we determined that genes involved in Rho GTPase signaling that modulate actin stability and cytoskeleton were likely to be crucial in NSCLC. We further assembled a PPI network for the feature-selected genes that was partitioned using Markov clustering to detect protein complexes functionally relevant to NSCLC. Finally, we modeled the perturbations in RhoGDI signaling using a bayesian network; our simulations suggest that aberrations in ARHGEF19 and/or RAC2 gene activities contributed to impaired MAPK signaling and disrupted actin and cytoskeleton organization and were arguably key contributors to the onset of tumorigenesis in NSCLC. We hypothesize that targeted measures to restore aberrant ARHGEF19 and/or RAC2 functions could conceivably rescue the cancerous phenotype in NSCLC. Our findings offer promising avenues for early predictive biomarker discovery, targeted therapeutic intervention and improved clinical outcomes in NSCLC.
Collapse
|
43
|
Chaudhuri A, Mohanty AK, Satpathy M. A Parallelizable Model for Analyzing Cancer Tissue Heterogeneity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2039-2048. [PMID: 34077367 DOI: 10.1109/tcbb.2021.3085894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In a cancer study, the heterogeneous nature of a cell population creates a lot of challenges. Efficient determination of the compositional breakup of a cell population, from gene expression measurements, is critical to the success in a cancer study. This paper presents a new model for analyzing heterogeneity in cancer tissue using Markov chain Monte Carlo (MCMC) algorithms; we aim to compute the proportion wise breakup of the cell population on a GPU. We also show that the model computation time does not depend on the input data size, because the computation required to estimate the compositional breakup are parallelized. This model uses qPCR (quantitative polymerase chain reaction) gene expression data to determine compositional breakup in the heterogeneous cell population. We test this model on synthetic data and real-world data collected from fibroblasts. We also show how well this model scales to hundreds of gene expression data.
Collapse
|
44
|
Dsouza KB, Maslova A, Al-Jibury E, Merkenschlager M, Bhargava VK, Libbrecht MW. Learning representations of chromatin contacts using a recurrent neural network identifies genomic drivers of conformation. Nat Commun 2022; 13:3704. [PMID: 35764630 PMCID: PMC9240038 DOI: 10.1038/s41467-022-31337-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 06/15/2022] [Indexed: 11/28/2022] Open
Abstract
Despite the availability of chromatin conformation capture experiments, discerning the relationship between the 1D genome and 3D conformation remains a challenge, which limits our understanding of their affect on gene expression and disease. We propose Hi-C-LSTM, a method that produces low-dimensional latent representations that summarize intra-chromosomal Hi-C contacts via a recurrent long short-term memory neural network model. We find that these representations contain all the information needed to recreate the observed Hi-C matrix with high accuracy, outperforming existing methods. These representations enable the identification of a variety of conformation-defining genomic elements, including nuclear compartments and conformation-related transcription factors. They furthermore enable in-silico perturbation experiments that measure the influence of cis-regulatory elements on conformation.
Collapse
Affiliation(s)
- Kevin B Dsouza
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada.
| | - Alexandra Maslova
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | - Ediem Al-Jibury
- MRC, London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
- Department of Computing, Imperial College London, London, UK
| | - Matthias Merkenschlager
- MRC, London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
| | - Vijay K Bhargava
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada
| | | |
Collapse
|
45
|
Thermodynamic Modelling of Transcriptional Control: A Sensitivity Analysis. MATHEMATICS 2022. [DOI: 10.3390/math10132169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Modelling is a tool used to decipher the biochemical mechanisms involved in transcriptional control. Experimental evidence in genetics is usually supported by theoretical models in order to evaluate the effects of all the possible interactions that can occur in these complicated processes. Models derived from the thermodynamic method are critical in this labour because they are able to take into account multiple mechanisms operating simultaneously at the molecular micro-scale and relate them to transcriptional initiation at the tissular macro-scale. This work is devoted to adapting computational techniques to this context in order to theoretically evaluate the role played by several biochemical mechanisms. The interest of this theoretical analysis relies on the fact that it can be contrasted against those biological experiments where the response to perturbations in the transcriptional machinery environment is evaluated in terms of genetically activated/repressed regions. The theoretical reproduction of these experiments leads to a sensitivity analysis whose results are expressed in terms of the elasticity of a threshold function determining those activated/repressed regions. The study of this elasticity function in thermodynamic models already proposed in the literature reveals that certain modelling approaches can alter the balance between the biochemical mechanisms considered, and this can cause false/misleading outcomes. The reevaluation of classical thermodynamic models gives us a more accurate and complete picture of the interactions involved in gene regulation and transcriptional control, which enables more specific predictions. This sensitivity approach provides a definite advantage in the interpretation of a wide range of genetic experimental results.
Collapse
|
46
|
Suter P, Kuipers J, Beerenwinkel N. Discovering gene regulatory networks of multiple phenotypic groups using dynamic Bayesian networks. Brief Bioinform 2022; 23:6604993. [PMID: 35679575 PMCID: PMC9294428 DOI: 10.1093/bib/bbac219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/29/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Dynamic Bayesian networks (DBNs) can be used for the discovery of gene regulatory networks (GRNs) from time series gene expression data. Here, we suggest a strategy for learning DBNs from gene expression data by employing a Bayesian approach that is scalable to large networks and is targeted at learning models with high predictive accuracy. Our framework can be used to learn DBNs for multiple groups of samples and highlight differences and similarities in their GRNs. We learn these DBN models based on different structural and parametric assumptions and select the optimal model based on the cross-validated predictive accuracy. We show in simulation studies that our approach is better equipped to prevent overfitting than techniques used in previous studies. We applied the proposed DBN-based approach to two time series transcriptomic datasets from the Gene Expression Omnibus database, each comprising data from distinct phenotypic groups of the same tissue type. In the first case, we used DBNs to characterize responders and non-responders to anti-cancer therapy. In the second case, we compared normal to tumor cells of colorectal tissue. The classification accuracy reached by the DBN-based classifier for both datasets was higher than reported previously. For the colorectal cancer dataset, our analysis suggested that GRNs for cancer and normal tissues have a lot of differences, which are most pronounced in the neighborhoods of oncogenes and known cancer tissue markers. The identified differences in gene networks of cancer and normal cells may be used for the discovery of targeted therapies.
Collapse
Affiliation(s)
- Polina Suter
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Switzerland
| |
Collapse
|
47
|
Network Approaches to Integrate Analyses of Genetics and Metabolomics Data with Applications to Fetal Programming Studies. Metabolites 2022; 12:metabo12060512. [PMID: 35736446 PMCID: PMC9229972 DOI: 10.3390/metabo12060512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 05/27/2022] [Accepted: 05/27/2022] [Indexed: 02/04/2023] Open
Abstract
The integration of genetics and metabolomics data demands careful accounting of complex dependencies, particularly when modelling familial omics data, e.g., to study fetal programming of related maternal–offspring phenotypes. Efforts to identify genetically determined metabotypes using classic genome wide association approaches have proven useful for characterizing complex disease, but conclusions are often limited to a series of variant–metabolite associations. We adapt Bayesian network models to integrate metabotypes with maternal–offspring genetic dependencies and metabolic profile correlations in order to investigate mechanisms underlying maternal–offspring phenotypic associations. Using data from the multiethnic Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study, we demonstrate that the strategic specification of ordered dependencies, pre-filtering of candidate metabotypes, incorporation of metabolite dependencies, and penalized network estimation methods clarify potential mechanisms for fetal programming of newborn adiposity and metabolic outcomes. The exploration of Bayesian network growth over a range of penalty parameters, coupled with interactive plotting, facilitate the interpretation of network edges. These methods are broadly applicable to integration of diverse omics data for related individuals.
Collapse
|
48
|
Yu C, Wang J. Data mining and mathematical models in cancer prognosis and prediction. MEDICAL REVIEW (BERLIN, GERMANY) 2022; 2:285-307. [PMID: 37724193 PMCID: PMC10388766 DOI: 10.1515/mr-2021-0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/29/2021] [Indexed: 09/20/2023]
Abstract
Cancer is a fetal and complex disease. Individual differences of the same cancer type or the same patient at different stages of cancer development may require distinct treatments. Pathological differences are reflected in tissues, cells and gene levels etc. The interactions between the cancer cells and nearby microenvironments can also influence the cancer progression and metastasis. It is a huge challenge to understand all of these mechanistically and quantitatively. Researchers applied pattern recognition algorithms such as machine learning or data mining to predict cancer types or classifications. With the rapidly growing and available computing powers, researchers begin to integrate huge data sets, multi-dimensional data types and information. The cells are controlled by the gene expressions determined by the promoter sequences and transcription regulators. For example, the changes in the gene expression through these underlying mechanisms can modify cell progressing in the cell-cycle. Such molecular activities can be governed by the gene regulations through the underlying gene regulatory networks, which are essential for cancer study when the information and gene regulations are clear and available. In this review, we briefly introduce several machine learning methods of cancer prediction and classification which include Artificial Neural Networks (ANNs), Decision Trees (DTs), Support Vector Machine (SVM) and naive Bayes. Then we describe a few typical models for building up gene regulatory networks such as Correlation, Regression and Bayes methods based on available data. These methods can help on cancer diagnosis such as susceptibility, recurrence, survival etc. At last, we summarize and compare the modeling methods to analyze the development and progression of cancer through gene regulatory networks. These models can provide possible physical strategies to analyze cancer progression in a systematic and quantitative way.
Collapse
Affiliation(s)
- Chong Yu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin, China
- Department of Statistics, JiLin University of Finance and Economics, Changchun, Jilin Province, China
| | - Jin Wang
- Department of Chemistry and of Physics and Astronomy, State University of New York, Stony Brook, NY, USA
| |
Collapse
|
49
|
Dynamic Uncertainty Quantification and Risk Prediction Based on the Grey Mathematics and Outcrossing Theory. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Embarked from the practical conditions of small samples in time-invariant and time-variant uncertainties, a complete non-probabilistic analysis procedure containing uncertainty quantification, uncertainty propagation, and reliability evaluation is presented in this paper. Firstly, the Grey systematic approach is proposed to determine the boundary laws of static intervals and dynamic interval processes. Through a combination of the policies of the second-order Taylor expansion and the smallest parametric interval set, the structural response histories via quantitative uncertainty results are further confirmed. Additionally, according to the first-passage idea from classical random process theory, the study on the time-dependent reliability measurement on the basis of the interval process model is carried out to achieve a more elaborate estimation for structural safety during its whole life cycle. A numerical example and one experimental application are eventually discussed for demonstration of the usage and reasonability of the methodology developed.
Collapse
|
50
|
Vadapalli S, Abdelhalim H, Zeeshan S, Ahmed Z. Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine. Brief Bioinform 2022; 23:6590150. [PMID: 35595537 DOI: 10.1093/bib/bbac191] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/02/2022] [Accepted: 04/26/2022] [Indexed: 12/16/2022] Open
Abstract
Precision medicine uses genetic, environmental and lifestyle factors to more accurately diagnose and treat disease in specific groups of patients, and it is considered one of the most promising medical efforts of our time. The use of genetics is arguably the most data-rich and complex components of precision medicine. The grand challenge today is the successful assimilation of genetics into precision medicine that translates across different ancestries, diverse diseases and other distinct populations, which will require clever use of artificial intelligence (AI) and machine learning (ML) methods. Our goal here was to review and compare scientific objectives, methodologies, datasets, data sources, ethics and gaps of AI/ML approaches used in genomics and precision medicine. We selected high-quality literature published within the last 5 years that were indexed and available through PubMed Central. Our scope was narrowed to articles that reported application of AI/ML algorithms for statistical and predictive analyses using whole genome and/or whole exome sequencing for gene variants, and RNA-seq and microarrays for gene expression. We did not limit our search to specific diseases or data sources. Based on the scope of our review and comparative analysis criteria, we identified 32 different AI/ML approaches applied in variable genomics studies and report widely adapted AI/ML algorithms for predictive diagnostics across several diseases.
Collapse
Affiliation(s)
- Sreya Vadapalli
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Habiba Abdelhalim
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Saman Zeeshan
- Rutgers Cancer Institute of New Jersey, Rutgers University, 195 Little Albany St, New Brunswick, NJ, USA
| | - Zeeshan Ahmed
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA.,Department of Medicine, Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, 125 Paterson St, New Brunswick, NJ, USA
| |
Collapse
|