1
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
2
|
Wekesa JS, Kimwele M. A review of multi-omics data integration through deep learning approaches for disease diagnosis, prognosis, and treatment. Front Genet 2023; 14:1199087. [PMID: 37547471 PMCID: PMC10398577 DOI: 10.3389/fgene.2023.1199087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 07/11/2023] [Indexed: 08/08/2023] Open
Abstract
Accurate diagnosis is the key to providing prompt and explicit treatment and disease management. The recognized biological method for the molecular diagnosis of infectious pathogens is polymerase chain reaction (PCR). Recently, deep learning approaches are playing a vital role in accurately identifying disease-related genes for diagnosis, prognosis, and treatment. The models reduce the time and cost used by wet-lab experimental procedures. Consequently, sophisticated computational approaches have been developed to facilitate the detection of cancer, a leading cause of death globally, and other complex diseases. In this review, we systematically evaluate the recent trends in multi-omics data analysis based on deep learning techniques and their application in disease prediction. We highlight the current challenges in the field and discuss how advances in deep learning methods and their optimization for application is vital in overcoming them. Ultimately, this review promotes the development of novel deep-learning methodologies for data integration, which is essential for disease detection and treatment.
Collapse
|
3
|
Park H, Imoto S, Miyano S. Gene Regulatory Network-Classifier: Gene Regulatory Network-Based Classifier and Its Applications to Gastric Cancer Drug (5-Fluorouracil) Marker Identification. J Comput Biol 2023; 30:223-243. [PMID: 36450117 DOI: 10.1089/cmb.2022.0181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
The complex mechanisms of diseases involve the disturbance of the molecular network, rather than disorder in a single gene, implying that single gene-based analysis is insufficient to understand these mechanisms. Gene regulatory networks (GRNs) have attracted a lot of interest and various approaches have been developed for their statistical inference and gene network-based analysis. Although various computational methods have been developed, relatively little attention has been paid to incorporation of biological knowledge into the computational approaches. Furthermore, existing studies on network-based analysis perform prediction/classification of status of cell lines based on preconstructed GRNs, implying that we cannot extract prediction/classification-specific gene networks, leading to difficulty in interpretation of biological mechanisms and marker identification related to the status of cancer cell lines. We developed a novel strategy to build a GRN-based classifier, called a GRN-classifier. The proposed GRN-classifier estimates GRNs and classifies cell lines simultaneously, where the gene network is estimated to minimize error in gene network estimation and the negative log-likelihood for classifying cell lines. Thus, we can identify biological status-specific gene regulatory systems, enabling us to achieve biologically reliable interpretation of the classification. We also propose an algorithm to implement the GRN-classifier based on coordinate descent update. Monte Carlo simulations were conducted to examine performance of the GRN-classifier. Results: Our strategy provides effective results in feature selection in the classification model and edge selection in gene network estimation. The GRN-classifier also shows outstanding classification accuracy. We apply the GRN-classifier to classify cancer cell lines into anticancer drug-related status, that is, 5-fluorouracil (5-FU)-sensitive/resistant and 5-FU target/nontarget cancer cell lines. We then identified 5-FU markers based on 5-FU-related status classification-specific gene networks. The mechanisms of the identified markers were verified through literature survey. Our results suggest that the molecular interplay between MYOF and AHNAK2 may play a crucial role in drug resistance and can provide information on the chemotherapy efficiency of 5-FU. It is also suggested that suppression of the identified 5-FU markers, including MYOF/AHNAK2 and AKR1C1/AKR1C3 may improve 5-FU resistance of cancer cell lines.
Collapse
Affiliation(s)
- Heewon Park
- M&D Data Science Center, Tokyo Medical and Dental University, Tokyo, Japan
| | - Seiya Imoto
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Satoru Miyano
- M&D Data Science Center, Tokyo Medical and Dental University, Tokyo, Japan.,Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
4
|
Tiong KL, Sintupisut N, Lin MC, Cheng CH, Woolston A, Lin CH, Ho M, Lin YW, Padakanti S, Yeang CH. An integrated analysis of the cancer genome atlas data discovers a hierarchical association structure across thirty three cancer types. PLOS DIGITAL HEALTH 2022; 1:e0000151. [PMID: 36812605 PMCID: PMC9931374 DOI: 10.1371/journal.pdig.0000151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 10/31/2022] [Indexed: 06/18/2023]
Abstract
Cancer cells harbor molecular alterations at all levels of information processing. Genomic/epigenomic and transcriptomic alterations are inter-related between genes, within and across cancer types and may affect clinical phenotypes. Despite the abundant prior studies of integrating cancer multi-omics data, none of them organizes these associations in a hierarchical structure and validates the discoveries in extensive external data. We infer this Integrated Hierarchical Association Structure (IHAS) from the complete data of The Cancer Genome Atlas (TCGA) and compile a compendium of cancer multi-omics associations. Intriguingly, diverse alterations on genomes/epigenomes from multiple cancer types impact transcriptions of 18 Gene Groups. Half of them are further reduced to three Meta Gene Groups enriched with (1) immune and inflammatory responses, (2) embryonic development and neurogenesis, (3) cell cycle process and DNA repair. Over 80% of the clinical/molecular phenotypes reported in TCGA are aligned with the combinatorial expressions of Meta Gene Groups, Gene Groups, and other IHAS subunits. Furthermore, IHAS derived from TCGA is validated in more than 300 external datasets including multi-omics measurements and cellular responses upon drug treatments and gene perturbations in tumors, cancer cell lines, and normal tissues. To sum up, IHAS stratifies patients in terms of molecular signatures of its subunits, selects targeted genes or drugs for precision cancer therapy, and demonstrates that associations between survival times and transcriptional biomarkers may vary with cancer types. These rich information is critical for diagnosis and treatments of cancers.
Collapse
Affiliation(s)
- Khong-Loon Tiong
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Nardnisa Sintupisut
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Min-Chin Lin
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- Psomagen, Rockville, Maryland, United States of America
| | - Chih-Hung Cheng
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Andrew Woolston
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- Translational Cancer Immunotherapy & Genomics Lab, Barts Cancer Institute, Charterhouse Square, London, United Kingdom
| | - Chih-Hsu Lin
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- C3.ai, Redwood City, California, United States of America
| | - Mirrian Ho
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Yu-Wei Lin
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- AiLife Diagnostics, Pearland, Texas, United States of America
| | - Sridevi Padakanti
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| |
Collapse
|
5
|
Koskinen M, Salmi JK, Loukola A, Mäkelä MJ, Sinisalo J, Carpén O, Renkonen R. Data-driven comorbidity analysis of 100 common disorders reveals patient subgroups with differing mortality risks and laboratory correlates. Sci Rep 2022; 12:18492. [PMID: 36323789 PMCID: PMC9630271 DOI: 10.1038/s41598-022-23090-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 10/25/2022] [Indexed: 11/07/2022] Open
Abstract
The populational heterogeneity of a disease, in part due to comorbidity, poses several complexities. Individual comorbidity profiles, on the other hand, contain useful information to refine phenotyping, prognostication, and risk assessment, and they provide clues to underlying biology. Nevertheless, the spectrum and the implications of the diagnosis profiles remain largely uncharted. Here we mapped comorbidity patterns in 100 common diseases using 4-year retrospective data from 526,779 patients and developed an online tool to visualize the results. Our analysis exposed disease-specific patient subgroups with distinctive diagnosis patterns, survival functions, and laboratory correlates. Computational modeling and real-world data shed light on the structure, variation, and relevance of populational comorbidity patterns, paving the way for improved diagnostics, risk assessment, and individualization of care. Variation in outcomes and biological correlates of a disease emphasizes the importance of evaluating the generalizability of current treatment strategies, as well as considering the limitations that selective inclusion criteria pose on clinical trials.
Collapse
Affiliation(s)
- Miika Koskinen
- grid.7737.40000 0004 0410 2071Faculty of Medicine, University of Helsinki, Helsinki, Finland ,grid.15485.3d0000 0000 9950 5666Helsinki Biobank, Helsinki University Hospital, Helsinki, Finland ,grid.15485.3d0000 0000 9950 5666Analytics and AI Development Services, Helsinki University Hospital, Helsinki, Finland
| | - Jani K. Salmi
- grid.15485.3d0000 0000 9950 5666Analytics and AI Development Services, Helsinki University Hospital, Helsinki, Finland
| | - Anu Loukola
- grid.15485.3d0000 0000 9950 5666Helsinki Biobank, Helsinki University Hospital, Helsinki, Finland
| | - Mika J. Mäkelä
- grid.15485.3d0000 0000 9950 5666Division of Allergology, Skin and Allergy Hospital, Helsinki University Hospital and Helsinki University, Helsinki, Finland
| | - Juha Sinisalo
- grid.7737.40000 0004 0410 2071Faculty of Medicine, University of Helsinki, Helsinki, Finland ,grid.7737.40000 0004 0410 2071Heart and Lung Center, Helsinki University Hospital, and Helsinki University, Helsinki, Finland
| | - Olli Carpén
- grid.7737.40000 0004 0410 2071Faculty of Medicine, University of Helsinki, Helsinki, Finland ,grid.15485.3d0000 0000 9950 5666Helsinki Biobank, Helsinki University Hospital, Helsinki, Finland ,grid.15485.3d0000 0000 9950 5666HUS Diagnostics, Helsinki University Hospital, Helsinki, Finland
| | - Risto Renkonen
- grid.7737.40000 0004 0410 2071Faculty of Medicine, University of Helsinki, Helsinki, Finland ,grid.15485.3d0000 0000 9950 5666HUS Diagnostics, Helsinki University Hospital, Helsinki, Finland
| |
Collapse
|
6
|
Mazaya M, Kwon YK. In Silico Pleiotropy Analysis in KEGG Signaling Networks Using a Boolean Network Model. Biomolecules 2022; 12:biom12081139. [PMID: 36009032 PMCID: PMC9406064 DOI: 10.3390/biom12081139] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 08/10/2022] [Accepted: 08/15/2022] [Indexed: 11/16/2022] Open
Abstract
Pleiotropy, which refers to the ability of different mutations on the same gene to cause different pathological effects in human genetic diseases, is important in understanding system-level biological diseases. Although some biological experiments have been proposed, still little is known about pleiotropy on gene–gene dynamics, since most previous studies have been based on correlation analysis. Therefore, a new perspective is needed to investigate pleiotropy in terms of gene–gene dynamical characteristics. To quantify pleiotropy in terms of network dynamics, we propose a measure called in silico Pleiotropic Scores (sPS), which represents how much a gene is affected against a pair of different types of mutations on a Boolean network model. We found that our model can identify more candidate pleiotropic genes that are not known to be pleiotropic than the experimental database. In addition, we found that many types of functionally important genes tend to have higher sPS values than other genes; in other words, they are more pleiotropic. We investigated the relations of sPS with the structural properties in the signaling network and found that there are highly positive relations to degree, feedback loops, and centrality measures. This implies that the structural characteristics are principles to identify new pleiotropic genes. Finally, we found some biological evidence showing that sPS analysis is relevant to the real pleiotropic data and can be considered a novel candidate for pleiotropic gene research. Taken together, our results can be used to understand the dynamics pleiotropic characteristics in complex biological systems in terms of gene–phenotype relations.
Collapse
Affiliation(s)
- Maulida Mazaya
- Research Center for Computing, National Research and Innovation Agency (BRIN), Cibinong Science Center, Jl. Raya Jakarta-Bogor KM 46, Cibinong 16911, West Java, Indonesia
| | - Yung-Keun Kwon
- School of IT Convergence, University of Ulsan, 93 Daehak-ro, Nam-gu, Ulsan 44610, Korea
- Correspondence:
| |
Collapse
|
7
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
8
|
Fernandez-de-Cossio J, Fernandez-de-Cossio-Diaz J, Perera-Negrin Y. A self-consistent probabilistic formulation for inference of interactions. Sci Rep 2020; 10:21435. [PMID: 33293622 PMCID: PMC7722874 DOI: 10.1038/s41598-020-78496-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 11/26/2020] [Indexed: 11/25/2022] Open
Abstract
Large molecular interaction networks are nowadays assembled in biomedical researches along with important technological advances. Diverse interaction measures, for which input solely consisting of the incidence of causal-factors, with the corresponding outcome of an inquired effect, are formulated without an obvious mathematical unity. Consequently, conceptual and practical ambivalences arise. We identify here a probabilistic requirement consistent with that input, and find, by the rules of probability theory, that it leads to a model multiplicative in the complement of the effect. Important practical properties are revealed along these theoretical derivations, that has not been noticed before.
Collapse
Affiliation(s)
- Jorge Fernandez-de-Cossio
- Bioinformatics Department, Center for Genetic Engineering and Biotechnology (CIGB), PO Box 6162, CP10600, Havana, Cuba.
| | | | - Yasser Perera-Negrin
- Molecular Oncology Group, Pharmaceutical Division, Center for Genetic Engineering and Biotechnology (CIGB), PO Box 6162, CP10600, Havana, Cuba
| |
Collapse
|
9
|
Mi Z, Guo B, Yang X, Yin Z, Zheng Z. LAMP: disease classification derived from layered assessment on modules and pathways in the human gene network. BMC Bioinformatics 2020; 21:487. [PMID: 33126852 PMCID: PMC7597061 DOI: 10.1186/s12859-020-03800-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 10/05/2020] [Indexed: 11/10/2022] Open
Abstract
Background Classification of diseases based on genetic information is of great significance as the basis for precision medicine, increasing the understanding of disease etiology and revolutionizing personalized medicine. Much effort has been directed at understanding disease associations by constructing disease networks, and classifying patient samples according to gene expression data. Integrating human gene networks overcomes limited coverage of genes. Incorporating pathway information into disease classification procedure addresses the challenge of cellular heterogeneity across patients.
Results In this work, we propose a disease classification model LAMP, which concentrates on the layered assessment on modules and pathways. Directed human gene interactions are the foundation of constructing the human gene network, where the significant roles of disease and pathway genes are recognized. The fast unfolding algorithm identifies 11 modules in the largest connected component. Then layered networks are introduced to distinguish positions of genes in propagating information from sources to targets. After gene screening, hierarchical clustering and refined process, 1726 diseases from KEGG are classified into 18 categories. Also, it is expounded that diseases with overlapping genes may not belong to the same category in LAMP. Within each category, entropy is applied to measure the compositional complexity, and to evaluate the prospects for combination diagnosis and gene-targeted therapy for diseases. Conclusion In this work, by collecting data from BioGRID and KEGG, we develop a disease classification model LAMP, to support people to view diseases from the perspective of commonalities in etiology and pathology. Comprehensive research on existing diseases can help meet the challenges of unknown diseases. The results provide suggestions for combination diagnosis and gene-targeted therapy, which motivates clinicians and researchers to reposition the understanding of diseases and explore diagnosis and therapy strategies.
Collapse
Affiliation(s)
- Zhilong Mi
- Beijing Advanced Innovation Center for Big Data and Brain Computing and LMIB, Beihang University, Beijing, China.,Peng Cheng Laboratory, Shenzhen, Guangdong Province, China.,School of Mathematical Sciences and Shenyuan Honors College, Beihang University, Beijing, China
| | - Binghui Guo
- Beijing Advanced Innovation Center for Big Data and Brain Computing and LMIB, Beihang University, Beijing, China. .,Peng Cheng Laboratory, Shenzhen, Guangdong Province, China. .,School of Mathematical Sciences and Shenyuan Honors College, Beihang University, Beijing, China.
| | - Xiaobo Yang
- Beijing Advanced Innovation Center for Big Data and Brain Computing and LMIB, Beihang University, Beijing, China.,Peng Cheng Laboratory, Shenzhen, Guangdong Province, China.,School of Mathematical Sciences and Shenyuan Honors College, Beihang University, Beijing, China
| | - Ziqiao Yin
- Beijing Advanced Innovation Center for Big Data and Brain Computing and LMIB, Beihang University, Beijing, China.,Peng Cheng Laboratory, Shenzhen, Guangdong Province, China.,School of Mathematical Sciences and Shenyuan Honors College, Beihang University, Beijing, China
| | - Zhiming Zheng
- Beijing Advanced Innovation Center for Big Data and Brain Computing and LMIB, Beihang University, Beijing, China.,Peng Cheng Laboratory, Shenzhen, Guangdong Province, China.,School of Mathematical Sciences and Shenyuan Honors College, Beihang University, Beijing, China
| |
Collapse
|
10
|
Zelenova MA, Yurov YB, Vorsanova SG, Iourov IY. Laundering CNV data for candidate process prioritization in brain disorders. Mol Cytogenet 2019; 12:54. [PMID: 31890034 PMCID: PMC6933640 DOI: 10.1186/s13039-019-0468-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 12/17/2019] [Indexed: 01/29/2023] Open
Abstract
Background Prioritization of genomic data has become a useful tool for uncovering the phenotypic effect of genetic variations (e.g. copy number variations or CNV) and disease mechanisms. Due to the complexity, brain disorders represent a major focus of genomic research aimed at revealing pathologic significance of genomic changes leading to brain dysfunction. Here, we propose a “CNV data laundering” algorithm based on filtering and prioritizing of genomic pathways retrieved from available databases for uncovering altered molecular pathways in brain disorders. The algorithm comprises seven consecutive steps of processing individual CNV data sets. First, the data are compared to in-house and web databases to discriminate recurrent non-pathogenic variants. Second, the CNV pool is confined to the genes predominantly expressed in the brain. Third, intergenic interactions are used for filtering causative CNV. Fourth, a network of interconnected elements specific for an individual genome variation set is created. Fifth, ontologic data (pathways/functions) are attributed to clusters of network elements. Sixth, the pathways are prioritized according to the significance of elements affected by CNV. Seventh, prioritized pathways are clustered according to the ontologies. Results The algorithm was applied to 191 CNV data sets obtained from children with brain disorders (intellectual disability and autism spectrum disorders) by SNP array molecular karyotyping. “CNV data laundering” has identified 13 pathway clusters (39 processes/475 genes) implicated in the phenotypic manifestations. Conclusions Elucidating altered molecular pathways in brain disorders, the algorithm may be used for uncovering disease mechanisms and genotype-phenotype correlations. These opportunities are strongly required for developing therapeutic strategies in devastating neuropsychiatric diseases.
Collapse
Affiliation(s)
- Maria A Zelenova
- Mental Health Research Center, Russia Moscow, 115522.,2Academician Yu.E. Veltishchev Research Clinical Institute of Pediatrics, N.I, Pirogov Russian National Research Medical University, Ministry of Health of the Russian Federation, Russia Moscow, 125635
| | - Yuri B Yurov
- Mental Health Research Center, Russia Moscow, 115522.,2Academician Yu.E. Veltishchev Research Clinical Institute of Pediatrics, N.I, Pirogov Russian National Research Medical University, Ministry of Health of the Russian Federation, Russia Moscow, 125635
| | - Svetlana G Vorsanova
- Mental Health Research Center, Russia Moscow, 115522.,2Academician Yu.E. Veltishchev Research Clinical Institute of Pediatrics, N.I, Pirogov Russian National Research Medical University, Ministry of Health of the Russian Federation, Russia Moscow, 125635
| | - Ivan Y Iourov
- Mental Health Research Center, Russia Moscow, 115522.,2Academician Yu.E. Veltishchev Research Clinical Institute of Pediatrics, N.I, Pirogov Russian National Research Medical University, Ministry of Health of the Russian Federation, Russia Moscow, 125635
| |
Collapse
|