1
|
Mukherjee A, Abraham S, Singh A, Balaji S, Mukunthan KS. From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies. Mol Biotechnol 2024:10.1007/s12033-024-01133-6. [PMID: 38565775 DOI: 10.1007/s12033-024-01133-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
In the dynamic landscape of targeted therapeutics, drug discovery has pivoted towards understanding underlying disease mechanisms, placing a strong emphasis on molecular perturbations and target identification. This paradigm shift, crucial for drug discovery, is underpinned by big data, a transformative force in the current era. Omics data, characterized by its heterogeneity and enormity, has ushered biological and biomedical research into the big data domain. Acknowledging the significance of integrating diverse omics data strata, known as multi-omics studies, researchers delve into the intricate interrelationships among various omics layers. This review navigates the expansive omics landscape, showcasing tailored assays for each molecular layer through genomes to metabolomes. The sheer volume of data generated necessitates sophisticated informatics techniques, with machine-learning (ML) algorithms emerging as robust tools. These datasets not only refine disease classification but also enhance diagnostics and foster the development of targeted therapeutic strategies. Through the integration of high-throughput data, the review focuses on targeting and modeling multiple disease-regulated networks, validating interactions with multiple targets, and enhancing therapeutic potential using network pharmacology approaches. Ultimately, this exploration aims to illuminate the transformative impact of multi-omics in the big data era, shaping the future of biological research.
Collapse
Affiliation(s)
- Arnab Mukherjee
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Suzanna Abraham
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Akshita Singh
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - S Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - K S Mukunthan
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| |
Collapse
|
2
|
Zou M, Li H, Su D, Xiong Y, Wei H, Wang S, Sun H, Wang T, Xi Q, Zuo Y, Yang L. Integrating somatic mutation profiles with structural deep clustering network for metabolic stratification in pancreatic cancer: a comprehensive analysis of prognostic and genomic landscapes. Brief Bioinform 2023; 25:bbad430. [PMID: 38040491 PMCID: PMC10783866 DOI: 10.1093/bib/bbad430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/29/2023] [Accepted: 11/05/2023] [Indexed: 12/03/2023] Open
Abstract
Pancreatic cancer is a globally recognized highly aggressive malignancy, posing a significant threat to human health and characterized by pronounced heterogeneity. In recent years, researchers have uncovered that the development and progression of cancer are often attributed to the accumulation of somatic mutations within cells. However, cancer somatic mutation data exhibit characteristics such as high dimensionality and sparsity, which pose new challenges in utilizing these data effectively. In this study, we propagated the discrete somatic mutation data of pancreatic cancer through a network propagation model based on protein-protein interaction networks. This resulted in smoothed somatic mutation profile data that incorporate protein network information. Based on this smoothed mutation profile data, we obtained the activity levels of different metabolic pathways in pancreatic cancer patients. Subsequently, using the activity levels of various metabolic pathways in cancer patients, we employed a deep clustering algorithm to establish biologically and clinically relevant metabolic subtypes of pancreatic cancer. Our study holds scientific significance in classifying pancreatic cancer based on somatic mutation data and may provide a crucial theoretical basis for the diagnosis and immunotherapy of pancreatic cancer patients.
Collapse
Affiliation(s)
- Min Zou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Honghao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hongmei Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Tao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qilemuge Xi
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd. Hohhot 010010, China
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
3
|
Wang C, Zhang H, Ma H, Wang Y, Cai K, Guo T, Yang Y, Li Z, Zhu Y. Inference of pan-cancer related genes by orthologs matching based on enhanced LSTM model. Front Microbiol 2022; 13:963704. [PMID: 36267181 PMCID: PMC9577021 DOI: 10.3389/fmicb.2022.963704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 08/16/2022] [Indexed: 11/13/2022] Open
Abstract
Many disease-related genes have been found to be associated with cancer diagnosis, which is useful for understanding the pathophysiology of cancer, generating targeted drugs, and developing new diagnostic and treatment techniques. With the development of the pan-cancer project and the ongoing expansion of sequencing technology, many scientists are focusing on mining common genes from The Cancer Genome Atlas (TCGA) across various cancer types. In this study, we attempted to infer pan-cancer associated genes by examining the microbial model organism Saccharomyces Cerevisiae (Yeast) by homology matching, which was motivated by the benefits of reverse genetics. First, a background network of protein-protein interactions and a pathogenic gene set involving several cancer types in humans and yeast were created. The homology between the human gene and yeast gene was then discovered by homology matching, and its interaction sub-network was obtained. This was undertaken following the principle that the homologous genes of the common ancestor may have similarities in expression. Then, using bidirectional long short-term memory (BiLSTM) in combination with adaptive integration of heterogeneous information, we further explored the topological characteristics of the yeast protein interaction network and presented a node representation score to evaluate the node ability in graphs. Finally, homologous mapping for human genes matched the important genes identified by ensemble classifiers for yeast, which may be thought of as genes connected to all types of cancer. One way to assess the performance of the BiLSTM model is through experiments on the database. On the other hand, enrichment analysis, survival analysis, and other outcomes can be used to confirm the biological importance of the prediction results. You may access the whole experimental protocols and programs at https://github.com/zhuyuan-cug/AI-BiLSTM/tree/master.
Collapse
Affiliation(s)
- Chao Wang
- Department of Surgery, Hepatic Surgery Center, Institute of Hepato-Pancreato-Biliary Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Houwang Zhang
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Haishu Ma
- School of Automation, China University of Geosciences, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Wuhan, China
| | - Yawen Wang
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
| | - Ke Cai
- School of Automation, China University of Geosciences, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Wuhan, China
| | - Tingrui Guo
- School of Automation, China University of Geosciences, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Wuhan, China
| | - Yuanhang Yang
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
| | - Zhen Li
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Wuhan, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Shanghai, China
- *Correspondence: Yuan Zhu
| |
Collapse
|
4
|
Cheong JH, Wang SC, Park S, Porembka MR, Christie AL, Kim H, Kim HS, Zhu H, Hyung WJ, Noh SH, Hu B, Hong C, Karalis JD, Kim IH, Lee SH, Hwang TH. Development and validation of a prognostic and predictive 32-gene signature for gastric cancer. Nat Commun 2022; 13:774. [PMID: 35140202 PMCID: PMC8828873 DOI: 10.1038/s41467-022-28437-y] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 01/21/2022] [Indexed: 12/12/2022] Open
Abstract
Genomic profiling can provide prognostic and predictive information to guide clinical care. Biomarkers that reliably predict patient response to chemotherapy and immune checkpoint inhibition in gastric cancer are lacking. In this retrospective analysis, we use our machine learning algorithm NTriPath to identify a gastric-cancer specific 32-gene signature. Using unsupervised clustering on expression levels of these 32 genes in tumors from 567 patients, we identify four molecular subtypes that are prognostic for survival. We then built a support vector machine with linear kernel to generate a risk score that is prognostic for five-year overall survival and validate the risk score using three independent datasets. We also find that the molecular subtypes predict response to adjuvant 5-fluorouracil and platinum therapy after gastrectomy and to immune checkpoint inhibitors in patients with metastatic or recurrent disease. In sum, we show that the 32-gene signature is a promising prognostic and predictive biomarker to guide the clinical care of gastric cancer patients and should be validated using large patient cohorts in a prospective manner. The ability to predict the survival and response to treatment of cancer patients may improve patient care. Here, the authors generate a 32 gene signature that can predict the survival and response to treatment in gastric cancer patients.
Collapse
Affiliation(s)
- Jae-Ho Cheong
- Department of Surgery, Yonsei University College of Medicine, Seoul, South Korea. .,Department of Biochemistry and Molecular Biology, Yonsei University College of Medicine, Seoul, South Korea. .,Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, South Korea.
| | - Sam C Wang
- Division of Surgical Oncology, Department of Surgery, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Sunho Park
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA
| | - Matthew R Porembka
- Division of Surgical Oncology, Department of Surgery, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Alana L Christie
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Hyunki Kim
- Department of Pathology, Yonsei University College of Medicine, Seoul, South Korea
| | - Hyo Song Kim
- Department of Internal Medicine, Division of Medical Oncology, Yonsei University College of Medicine, Seoul, South Korea
| | - Hong Zhu
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Woo Jin Hyung
- Department of Surgery, Yonsei University College of Medicine, Seoul, South Korea
| | - Sung Hoon Noh
- Department of Surgery, Yonsei University College of Medicine, Seoul, South Korea
| | - Bo Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Changjin Hong
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA
| | - John D Karalis
- Division of Surgical Oncology, Department of Surgery, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - In-Ho Kim
- Department of Internal Medicine, Division of Medical Oncology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea
| | - Sung Hak Lee
- Department of Hospital Pathology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea
| | - Tae Hyun Hwang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA. .,Department of Immunology, Mayo Clinic, Jacksonville, FL, USA.
| |
Collapse
|
5
|
Lim H, Xie L. A New Weighted Imputed Neighborhood-Regularized Tri-Factorization One-Class Collaborative Filtering Algorithm: Application to Target Gene Prediction of Transcription Factors. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:126-137. [PMID: 31995498 PMCID: PMC7382975 DOI: 10.1109/tcbb.2020.2968442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Identifying target genes of transcription factors (TFs) is crucial to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large-scale experiments and intrinsic complexity of gene regulation. Thus, computational prediction methods are useful to predict unobserved TF-gene associations. Here, we develop a new Weighted Imputed Neighborhood-regularized Tri-Factorization one-class collaborative filtering algorithm, WINTF. It predicts unobserved target genes for TFs using known but noisy, incomplete, and biased TF-gene associations and protein-protein interaction networks. Our benchmark study shows that WINTF significantly outperforms its counterpart matrix factorization-based algorithms and tri-factorization methods that do not include weight, imputation, and neighbor-regularization, for TF-gene association prediction. When evaluated by independent datasets, accuracy is 37.8 percent on the top 495 predicted associations, an enrichment factor of 4.19 compared with random guess. Furthermore, many predicted novel associations are supported by literature evidence. Although we only use canonical TF-gene interaction data, WINTF can directly be applied to tissue-specific data when available. Thus, WINTF provides a potentially useful framework to integrate multiple omics data for further improvement of TF-gene prediction and applications to other sparse and noisy biological data. The benchmark dataset and source code are freely available at https://github.com/XieResearchGroup/WINTF.
Collapse
|
6
|
Abstract
A key goal of cancer systems biology is to use big data to elucidate the molecular networks by which cancer develops. However, to date there has been no systematic evaluation of how far these efforts have progressed. In this Analysis, we survey six major systems biology approaches for mapping and modelling cancer pathways with attention to how well their resulting network maps cover and enhance current knowledge. Our sample of 2,070 systems biology maps captures all literature-curated cancer pathways with significant enrichment, although the strong tendency is for these maps to recover isolated mechanisms rather than entire integrated processes. Systems biology maps also identify previously underappreciated functions, such as a potential role for human papillomavirus-induced chromosomal alterations in ovarian tumorigenesis, and they add new genes to known cancer pathways, such as those related to metabolism, Hippo signalling and immunity. Notably, we find that many cancer networks have been provided only in journal figures and not for programmatic access, underscoring the need to deposit network maps in community databases to ensure they can be readily accessed. Finally, few of these findings have yet been clinically translated, leaving ample opportunity for future translational studies. Periodic surveys of cancer pathway maps, such as the one reported here, are critical to assess progress in the field and identify underserved areas of methodology and cancer biology.
Collapse
Affiliation(s)
- Brent M Kuenzi
- Division of Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
7
|
Zolotovskaia M, Sorokin M, Garazha A, Borisov N, Buzdin A. Molecular Pathway Analysis of Mutation Data for Biomarkers Discovery and Scoring of Target Cancer Drugs. Methods Mol Biol 2020; 2063:207-234. [PMID: 31667773 DOI: 10.1007/978-1-0716-0138-9_16] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
DNA mutations govern cancer development. Cancer mutation profiles vary dramatically among the individuals. In some cases, they may serve as the predictors of disease progression and response to therapies. However, the biomarker potential of cancer mutations can be dramatically (several orders of magnitude) enhanced by applying molecular pathway-based approach. We developed Oncobox system for calculation of pathway instability (PI) values for the molecular pathways that are aggregated mutation frequencies of the pathway members normalized on gene lengths and on number of genes in the pathway. PI scores can be effective biomarkers in different types of comparisons, for example, as the cancer type biomarkers and as the predictors of tumor response to target therapies. The latter option is implemented using mutation drug score (MDS) values, which algorithmically rank the drugs capacity of interfering with the mutated molecular pathways. Here, describe the mathematical basis and algorithms for PI and MDS values calculation, validation and implementation. The example analysis is provided encompassing 5956 human tumor mutation profiles of 15 cancer types from The Cancer Genome Atlas (TCGA) project, that totally make 2,316,670 mutations in 19,872 genes and 1748 molecular pathways, thus enabling ranking of 128 clinically approved target drugs. Our results evidence that the Oncobox PI and MDS approaches are highly useful for basic and applied aspects of molecular oncology and pharmacology research.
Collapse
Affiliation(s)
- Marianna Zolotovskaia
- Omicsway Corp., Walnut, CA, USA
- Department of Oncology, Hematology and Radiotherapy of Pediatric Faculty, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Maxim Sorokin
- Omicsway Corp., Walnut, CA, USA
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | - Nikolay Borisov
- Omicsway Corp., Walnut, CA, USA
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Anton Buzdin
- Omicsway Corp., Walnut, CA, USA.
- Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia.
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.
| |
Collapse
|
8
|
Resende TP, Marshall Lund L, Rossow S, Vannucci FA. Next-Generation Sequencing Coupled With in situ Hybridization: A Novel Diagnostic Platform to Investigate Swine Emerging Pathogens and New Variants of Endemic Viruses. Front Vet Sci 2019; 6:403. [PMID: 31803766 PMCID: PMC6873589 DOI: 10.3389/fvets.2019.00403] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 10/28/2019] [Indexed: 01/07/2023] Open
Abstract
Next generation sequencing (NGS) can be applied to identify and characterize the entire set of microbes within a sample. However, this platform does not provide a morphological context or specific association between the viral or bacterial sequences detected and the histological lesions. This limitation has generated uncertainty whether the sequences identified by NGS are actually contributing or not for the clinical outcome. Although in situ hybridization (ISH) and immunohistochemistry (IHC) can be used to detect pathogens in tissue samples, only ISH has the advantage of being rapidly developed in a context of an emerging disease, especially because it does not require development of specific primary antibodies against the target pathogen. Based on the sequence information provided by NGS, ISH is able to check the presence of a certain pathogen within histological lesions, by targeting its specific messenger RNA, helping to build the relationship between the pathogen and the clinical outcome. In this mini review we have compiled results of the application of NGS-ISH to the investigation of challenging diagnostic cases or emerging pathogens in pigs, that resulted in the detection of porcine circovirus type 3, porcine parvovirus type 2, Senecavirus A, and Mycoplasma hyorhinis.
Collapse
Affiliation(s)
- Talita P Resende
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, United States
| | - Lacey Marshall Lund
- Veterinary Diagnostic Laboratory, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, United States
| | - Stephanie Rossow
- Veterinary Diagnostic Laboratory, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, United States
| | - Fabio A Vannucci
- Veterinary Diagnostic Laboratory, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, United States
| |
Collapse
|
9
|
Čopar A, Zupan B, Zitnik M. Fast optimization of non-negative matrix tri-factorization. PLoS One 2019; 14:e0217994. [PMID: 31185054 PMCID: PMC6559648 DOI: 10.1371/journal.pone.0217994] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 05/22/2019] [Indexed: 11/18/2022] Open
Abstract
Non-negative matrix tri-factorization (NMTF) is a popular technique for learning low-dimensional feature representation of relational data. Currently, NMTF learns a representation of a dataset through an optimization procedure that typically uses multiplicative update rules. This procedure has had limited success, and its failure cases have not been well understood. We here perform an empirical study involving six large datasets comparing multiplicative update rules with three alternative optimization methods, including alternating least squares, projected gradients, and coordinate descent. We find that methods based on projected gradients and coordinate descent converge up to twenty-four times faster than multiplicative update rules. Furthermore, alternating least squares method can quickly train NMTF models on sparse datasets but often fails on dense datasets. Coordinate descent-based NMTF converges up to sixteen times faster compared to well-established methods.
Collapse
Affiliation(s)
- Andrej Čopar
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Blaž Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States of America
| | - Marinka Zitnik
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
10
|
Lim H, Xie L. Target Gene Prediction of Transcription Factor Using a New Neighborhood-regularized Tri-factorization One-class Collaborative Filtering Algorithm. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2019; 2018:1-10. [PMID: 31061989 DOI: 10.1145/3233547.3233551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Identifying the target genes of transcription factors (TFs) is one of the key factors to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large scale experiments and intrinsic complexity. Thus, computational prediction methods are useful to predict the unobserved associations. Here, we developed a new one-class collaborative filtering algorithm tREMAP that is based on regularized, weighted nonnegative matrix tri-factorization. The algorithm predicts unobserved target genes for TFs using known gene-TF associations and protein-protein interaction network. Our benchmark study shows that tREMAP significantly outperforms its counterpart REMAP, a bi-factorization-based algorithm, for transcription factor target gene prediction in all four performance metrics AUC, MAP, MPR, and HLU. When evaluated by independent data sets, the prediction accuracy is 37.8% on the top 495 predicted associations, an enrichment factor of 4.19 compared with the random guess. Furthermore, many of the predicted novel associations by tREMAP are supported by evidence from literature. Although we only use canonical TF-target gene interaction data in this study, tREMAP can be directly applied to tissue-specific data sets. tREMAP provides a framework to integrate multiple omics data for the further improvement of TF target gene prediction. Thus, tREMAP is a potentially useful tool in studying gene regulatory networks. The benchmark data set and the source code of tREMAP are freely available at https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP.
Collapse
Affiliation(s)
- Hansaim Lim
- PhD program in Biochemistry, Graduate Center of the City University of New York NY 10016 United States
| | - Lei Xie
- Department of Computer Science, Hunter College and Graduate Center, the City University of New York NY 10065 United States
| |
Collapse
|
11
|
Zolotovskaia MA, Sorokin MI, Roumiantsev SA, Borisov NM, Buzdin AA. Pathway Instability Is an Effective New Mutation-Based Type of Cancer Biomarkers. Front Oncol 2019; 8:658. [PMID: 30662873 PMCID: PMC6328788 DOI: 10.3389/fonc.2018.00658] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 12/12/2018] [Indexed: 01/20/2023] Open
Abstract
DNA mutations play a crucial role in cancer development and progression. Mutation profiles vary dramatically in different cancer types and between individual tumors. Mutations of several individual genes are known as reliable cancer biomarkers, although the number of such genes is tiny and does not enable differential diagnostics for most of the cancers. We report here a technique enabling dramatically increased efficiency of cancer biomarkers development using DNA mutations data. It includes a quantitative metric termed Pathway instability (PI) based on mutations enrichment of intracellular molecular pathways. This method was tested on 5,956 tumor mutation profiles of 15 cancer types from The Cancer Genome Atlas (TCGA) project. Totally, we screened 2,316,670 mutations in 19,872 genes and 1,748 molecular pathways. Our results demonstrated considerable advantage of pathway-based mutation biomarkers over individual gene mutation profiles, as reflected by more than two orders of magnitude greater numbers by high-quality [ROC area-under-curve (AUC)>0.75] biomarkers. For example, the number of such high-quality mutational biomarkers distinguishing between different cancer types was only six for the individual gene mutations, and already 660 for the pathway-based biomarkers. These results evidence that PI value can be used as a new generation of complex cancer biomarkers significantly outperforming the existing gene mutation biomarkers.
Collapse
Affiliation(s)
- Marianna A Zolotovskaia
- Department of Oncology, Hematology and Radiotherapy of Pediatric Faculty, Pirogov Russian National Research Medical University, Moscow, Russia.,Oncobox Ltd., Moscow, Russia
| | - Maxim I Sorokin
- The Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia.,Omicsway Corp., Walnut, CA, United States
| | - Sergey A Roumiantsev
- Department of Oncology, Hematology and Radiotherapy of Pediatric Faculty, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Nikolay M Borisov
- Oncobox Ltd., Moscow, Russia.,The Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Anton A Buzdin
- The Laboratory of Clinical Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia.,Omicsway Corp., Walnut, CA, United States.,The Laboratory of Systems Biology, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| |
Collapse
|
12
|
Xi J, Wang M, Li A. Discovering mutated driver genes through a robust and sparse co-regularized matrix factorization framework with prior information from mRNA expression patterns and interaction network. BMC Bioinformatics 2018; 19:214. [PMID: 29871594 PMCID: PMC5989443 DOI: 10.1186/s12859-018-2218-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 05/24/2018] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Discovery of mutated driver genes is one of the primary objective for studying tumorigenesis. To discover some relatively low frequently mutated driver genes from somatic mutation data, many existing methods incorporate interaction network as prior information. However, the prior information of mRNA expression patterns are not exploited by these existing network-based methods, which is also proven to be highly informative of cancer progressions. RESULTS To incorporate prior information from both interaction network and mRNA expressions, we propose a robust and sparse co-regularized nonnegative matrix factorization to discover driver genes from mutation data. Furthermore, our framework also conducts Frobenius norm regularization to overcome overfitting issue. Sparsity-inducing penalty is employed to obtain sparse scores in gene representations, of which the top scored genes are selected as driver candidates. Evaluation experiments by known benchmarking genes indicate that the performance of our method benefits from the two type of prior information. Our method also outperforms the existing network-based methods, and detect some driver genes that are not predicted by the competing methods. CONCLUSIONS In summary, our proposed method can improve the performance of driver gene discovery by effectively incorporating prior information from interaction network and mRNA expression patterns into a robust and sparse co-regularized matrix factorization framework.
Collapse
Affiliation(s)
- Jianing Xi
- School of Information Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, 230027 China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, 230027 China
- Centers for Biomedical Engineering, University of Science and Technology of China, Huangshan Road, Hefei, 230027 China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, 230027 China
- Centers for Biomedical Engineering, University of Science and Technology of China, Huangshan Road, Hefei, 230027 China
| |
Collapse
|
13
|
Xi J, Li A, Wang M. A novel unsupervised learning model for detecting driver genes from pan-cancer data through matrix tri-factorization framework with pairwise similarities constraints. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.03.026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
14
|
Parimbelli E, Marini S, Sacchi L, Bellazzi R. Patient similarity for precision medicine: A systematic review. J Biomed Inform 2018; 83:87-96. [PMID: 29864490 DOI: 10.1016/j.jbi.2018.06.001] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 05/16/2018] [Accepted: 06/01/2018] [Indexed: 12/19/2022]
Abstract
Evidence-based medicine is the most prevalent paradigm adopted by physicians. Clinical practice guidelines typically define a set of recommendations together with eligibility criteria that restrict their applicability to a specific group of patients. The ever-growing size and availability of health-related data is currently challenging the broad definitions of guideline-defined patient groups. Precision medicine leverages on genetic, phenotypic, or psychosocial characteristics to provide precise identification of patient subsets for treatment targeting. Defining a patient similarity measure is thus an essential step to allow stratification of patients into clinically-meaningful subgroups. The present review investigates the use of patient similarity as a tool to enable precision medicine. 279 articles were analyzed along four dimensions: data types considered, clinical domains of application, data analysis methods, and translational stage of findings. Cancer-related research employing molecular profiling and standard data analysis techniques such as clustering constitute the majority of the retrieved studies. Chronic and psychiatric diseases follow as the second most represented clinical domains. Interestingly, almost one quarter of the studies analyzed presented a novel methodology, with the most advanced employing data integration strategies and being portable to different clinical domains. Integration of such techniques into decision support systems constitutes and interesting trend for future research.
Collapse
Affiliation(s)
- E Parimbelli
- Telfer School of Management, University of Ottawa, Ottawa, Canada; Interdepartmental Centre for Health Technologies, University of Pavia, Italy.
| | - S Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - L Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - R Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy; RCCS ICS Maugeri, Pavia, Italy
| |
Collapse
|
15
|
Stanfield Z, Coşkun M, Koyutürk M. Drug Response Prediction as a Link Prediction Problem. Sci Rep 2017; 7:40321. [PMID: 28067293 PMCID: PMC5220354 DOI: 10.1038/srep40321] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 12/01/2016] [Indexed: 12/23/2022] Open
Abstract
Drug response prediction is a well-studied problem in which the molecular profile of a given sample is used to predict the effect of a given drug on that sample. Effective solutions to this problem hold the key for precision medicine. In cancer research, genomic data from cell lines are often utilized as features to develop machine learning models predictive of drug response. Molecular networks provide a functional context for the integration of genomic features, thereby resulting in robust and reproducible predictive models. However, inclusion of network data increases dimensionality and poses additional challenges for common machine learning tasks. To overcome these challenges, we here formulate drug response prediction as a link prediction problem. For this purpose, we represent drug response data for a large cohort of cell lines as a heterogeneous network. Using this network, we compute “network profiles” for cell lines and drugs. We then use the associations between these profiles to predict links between drugs and cell lines. Through leave-one-out cross validation and cross-classification on independent datasets, we show that this approach leads to accurate and reproducible classification of sensitive and resistant cell line-drug pairs, with 85% accuracy. We also examine the biological relevance of the network profiles.
Collapse
Affiliation(s)
- Zachary Stanfield
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Mustafa Coşkun
- Department of Electrical Engineering and Computer Science, Case School of Engineering, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Mehmet Koyutürk
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH, 44106, USA.,Department of Electrical Engineering and Computer Science, Case School of Engineering, Case Western Reserve University, Cleveland, OH, 44106, USA
| |
Collapse
|