1
|
Li G, Hu Z, Luo X, Liu J, Wu J, Peng W, Zhu X. Identification of cancer driver genes based on hierarchical weak consensus model. Health Inf Sci Syst 2024; 12:21. [PMID: 38464463 PMCID: PMC10917728 DOI: 10.1007/s13755-024-00279-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 01/31/2024] [Indexed: 03/12/2024] Open
Abstract
Cancer is a complex gene mutation disease that derives from the accumulation of mutations during somatic cell evolution. With the advent of high-throughput technology, a large amount of omics data has been generated, and how to find cancer-related driver genes from a large number of omics data is a challenge. In the early stage, the researchers developed many frequency-based driver genes identification methods, but they could not identify driver genes with low mutation rates well. Afterwards, researchers developed network-based methods by fusing multi-omics data, but they rarely considered the connection among features. In this paper, after analyzing a large number of methods for integrating multi-omics data, a hierarchical weak consensus model for fusing multiple features is proposed according to the connection among features. By analyzing the connection between PPI network and co-mutation hypergraph network, this paper firstly proposes a new topological feature, called co-mutation clustering coefficient (CMCC). Then, a hierarchical weak consensus model is used to integrate CMCC, mRNA and miRNA differential expression scores, and a new driver genes identification method HWC is proposed. In this paper, the HWC method and current 7 state-of-the-art methods are compared on three types of cancers. The comparison results show that HWC has the best identification performance in statistical evaluation index, functional consistency and the partial area under ROC curve. Supplementary Information The online version contains supplementary material available at 10.1007/s13755-024-00279-6.
Collapse
Affiliation(s)
- Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Zhipeng Hu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xinlong Luo
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
| | - Xiaoshu Zhu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
- School of Computer and Information Security & School of Software Engineering, Guilin University of Electronic Science and Technology, Guilin, China
| |
Collapse
|
2
|
Liu J, Zhang C, Shan Z. Application of Artificial Intelligence in Orthodontics: Current State and Future Perspectives. Healthcare (Basel) 2023; 11:2760. [PMID: 37893833 PMCID: PMC10606213 DOI: 10.3390/healthcare11202760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 10/11/2023] [Accepted: 10/16/2023] [Indexed: 10/29/2023] Open
Abstract
In recent years, there has been the notable emergency of artificial intelligence (AI) as a transformative force in multiple domains, including orthodontics. This review aims to provide a comprehensive overview of the present state of AI applications in orthodontics, which can be categorized into the following domains: (1) diagnosis, including cephalometric analysis, dental analysis, facial analysis, skeletal-maturation-stage determination and upper-airway obstruction assessment; (2) treatment planning, including decision making for extractions and orthognathic surgery, and treatment outcome prediction; and (3) clinical practice, including practice guidance, remote care, and clinical documentation. We have witnessed a broadening of the application of AI in orthodontics, accompanied by advancements in its performance. Additionally, this review outlines the existing limitations within the field and offers future perspectives.
Collapse
Affiliation(s)
- Junqi Liu
- Division of Paediatric Dentistry and Orthodontics, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China;
| | - Chengfei Zhang
- Division of Restorative Dental Sciences, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China;
| | - Zhiyi Shan
- Division of Paediatric Dentistry and Orthodontics, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China;
| |
Collapse
|
3
|
DeepDA-Ace: A Novel Domain Adaptation Method for Species-Specific Acetylation Site Prediction. MATHEMATICS 2022. [DOI: 10.3390/math10142364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Protein lysine acetylation is an important type of post-translational modification (PTM), and it plays a crucial role in various cellular processes. Recently, although many researchers have focused on developing tools for acetylation site prediction based on computational methods, most of these tools are based on traditional machine learning algorithms for acetylation site prediction without species specificity, still maintained as a single prediction model. Recent studies have shown that the acetylation sites of distinct species have evident location-specific differences; however, there is currently no integrated prediction model that can effectively predict acetylation sites cross all species. Therefore, to enhance the scope of species-specific level, it is necessary to establish a framework for species-specific acetylation site prediction. In this work, we propose a domain adaptation framework DeepDA-Ace for species-specific acetylation site prediction, including Rattus norvegicus, Schistosoma japonicum, Arabidopsis thaliana, and other types of species. In DeepDA-Ace, an attention based densely connected convolutional neural network is designed to capture sequence features, and the semantic adversarial learning strategy is proposed to align features of different species so as to achieve knowledge transfer. The DeepDA-Ace outperformed both the general prediction model and fine-tuning based species-specific model across most types of species. The experiment results have demonstrated that DeepDA-Ace is superior to the general and fine-tuning methods, and its precision exceeds 0.75 on most species. In addition, our method achieves at least 5% improvement over the existing acetylation prediction tools.
Collapse
|
4
|
Yuan X, Ma C, Zhao H, Yang L, Wang S, Xi J. STIC: Predicting Single Nucleotide Variants and Tumor Purity in Cancer Genome. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2692-2701. [PMID: 32086221 DOI: 10.1109/tcbb.2020.2975181] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single nucleotide variant (SNV) plays an important role in cellular proliferation and tumorigenesis in various types of human cancer. Next-generation sequencing (NGS) has provided high-throughput data at an unprecedented resolution to predict SNVs. Currently, there exist many computational methods for either germline or somatic SNV discovery from NGS data, but very few of them are versatile enough to adapt to any situations. In the absence of matched normal samples, the prediction of somatic SNVs from single-tumor samples becomes considerably challenging, especially when the tumor purity is unknown. Here, we propose a new approach, STIC, to predict somatic SNVs and estimate tumor purity from NGS data without matched normal samples. The main features of STIC include: (1) extracting a set of SNV-relevant features on each site and training the BP neural network algorithm on the features to predict SNVs; (2) creating an iterative process to distinguish somatic SNVs from germline ones by disturbing allele frequency; and (3) establishing a reasonable relationship between tumor purity and allele frequencies of somatic SNVs to accurately estimate the purity. We quantitatively evaluate the performance of STIC on both simulation and real sequencing datasets, the results of which indicate that STIC outperforms competing methods.
Collapse
|
5
|
Zhang W, Wang SL, Liu Y. Identification of Cancer Driver Modules Based on Graph Clustering from Multiomics Data. J Comput Biol 2021; 28:1007-1020. [PMID: 34529511 DOI: 10.1089/cmb.2021.0052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
A major challenge in cancer genomics is to identify cancer driver genes and modules. Most existing methods to identify cancer driver modules (iCDM) identify groups of genes whose somatic mutational patterns exhibit either mutual exclusivity or high coverage of patient samples, without considering other biological information from multiomics data sets. Here we integrate mutual exclusivity, coverage, and protein-protein interaction information to construct an edge-weighted network, and present a graph clustering approach based on symmetric non-negative matrix factorization to iCDM. iCDM was tested on pan-cancer data and the results were compared with those from several advanced computational methods. Our approach outperformed other methods in recovering known cancer driver modules, and the identified driver modules showed high accuracy in classifying normal and tumor samples.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| | - Shu-Lin Wang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, China
| | - Yue Liu
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, China
| |
Collapse
|
6
|
Wang Y, Xia Z, Deng J, Xie X, Gong M, Ma X. TLGP: a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain. BMC Bioinformatics 2021; 22:274. [PMID: 34433414 PMCID: PMC8386056 DOI: 10.1186/s12859-021-04190-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 05/12/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Gene prioritization (gene ranking) aims to obtain the centrality of genes, which is critical for cancer diagnosis and therapy since keys genes correspond to the biomarkers or targets of drugs. Great efforts have been devoted to the gene ranking problem by exploring the similarity between candidate and known disease-causing genes. However, when the number of disease-causing genes is limited, they are not applicable largely due to the low accuracy. Actually, the number of disease-causing genes for cancers, particularly for these rare cancers, are really limited. Therefore, there is a critical needed to design effective and efficient algorithms for gene ranking with limited prior disease-causing genes. RESULTS In this study, we propose a transfer learning based algorithm for gene prioritization (called TLGP) in the cancer (target domain) without disease-causing genes by transferring knowledge from other cancers (source domain). The underlying assumption is that knowledge shared by similar cancers improves the accuracy of gene prioritization. Specifically, TLGP first quantifies the similarity between the target and source domain by calculating the affinity matrix for genes. Then, TLGP automatically learns a fusion network for the target cancer by fusing affinity matrix, pathogenic genes and genomic data of source cancers. Finally, genes in the target cancer are prioritized. The experimental results indicate that the learnt fusion network is more reliable than gene co-expression network, implying that transferring knowledge from other cancers improves the accuracy of network construction. Moreover, TLGP outperforms state-of-the-art approaches in terms of accuracy, improving at least 5%. CONCLUSION The proposed model and method provide an effective and efficient strategy for gene ranking by integrating genomic data from various cancers.
Collapse
Affiliation(s)
- Yan Wang
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
- Department of Library, Xidian University, South TaiBai Road, Xi’an, China
| | - Zuheng Xia
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| | - Jingjing Deng
- Department of Computer Science, Swansea University, Bay, UK
| | - Xianghua Xie
- Department of Computer Science, Swansea University, Bay, UK
| | - Maoguo Gong
- School of Electronic Engineering, Xidian University, South TaiBai Road, Xi’an, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| |
Collapse
|
7
|
Fan L, Hou J, Qin G. Prediction of Disease Genes Based on Stage-Specific Gene Regulatory Networks in Breast Cancer. Front Genet 2021; 12:717557. [PMID: 34335705 PMCID: PMC8321251 DOI: 10.3389/fgene.2021.717557] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 06/24/2021] [Indexed: 11/13/2022] Open
Abstract
Breast cancer is one of the most common malignant tumors in women, which seriously endangers women’s health. Great advances have been made over the last decades, however, most studies predict driver genes of breast cancer using biological experiments and/or computational methods, regardless of stage information. In this study, we propose a computational framework to predict the disease genes of breast cancer based on stage-specific gene regulatory networks. Firstly, we screen out differentially expressed genes and hypomethylated/hypermethylated genes by comparing tumor samples with corresponding normal samples. Secondly, we construct three stage-specific gene regulatory networks by integrating RNA-seq profiles and TF-target pairs, and apply WGCNA to detect modules from these networks. Subsequently, we perform network topological analysis and gene set enrichment analysis. Finally, the key genes of specific modules for each stage are screened as candidate disease genes. We obtain seven stage-specific modules, and identify 20, 12, and 22 key genes for three stages, respectively. Furthermore, 55%, 83%, and 64% of the genes are associated with breast cancer, for example E2F2, E2F8, TPX2, BUB1, and CKAP2L. So it may be of great importance for further verification by cancer experts.
Collapse
Affiliation(s)
- Linzhuo Fan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Jinhong Hou
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Guimin Qin
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
8
|
Yu Z, Liu H, Du F, Tang X. GRMT: Generative Reconstruction of Mutation Tree From Scratch Using Single-Cell Sequencing Data. Front Genet 2021; 12:692964. [PMID: 34149820 PMCID: PMC8212059 DOI: 10.3389/fgene.2021.692964] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 05/17/2021] [Indexed: 12/11/2022] Open
Abstract
Single-cell sequencing (SCS) now promises the landscape of genetic diversity at single cell level, and is particularly useful to reconstruct the evolutionary history of tumor. There are multiple types of noise that make the SCS data notoriously error-prone, and significantly complicate tumor tree reconstruction. Existing methods for tumor phylogeny estimation suffer from either high computational intensity or low-resolution indication of clonal architecture, giving a necessity of developing new methods for efficient and accurate reconstruction of tumor trees. We introduce GRMT (Generative Reconstruction of Mutation Tree from scratch), a method for inferring tumor mutation tree from SCS data. GRMT exploits the k-Dollo parsimony model to allow each mutation to be gained once and lost at most k times. Under this constraint on mutation evolution, GRMT searches for mutation tree structures from a perspective of tree generation from scratch, and implements it to an iterative process that gradually increases the tree size by introducing a new mutation per time until a complete tree structure that contains all mutations is obtained. This enables GRMT to efficiently recover the chronological order of mutations and scale well to large datasets. Extensive evaluations on simulated and real datasets suggest GRMT outperforms the state-of-the-arts in multiple performance metrics. The GRMT software is freely available at https://github.com/qasimyu/grmt.
Collapse
Affiliation(s)
- Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Huidong Liu
- School of Information Engineering, Ningxia University, Yinchuan, China
| | - Fang Du
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Xiaofen Tang
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| |
Collapse
|
9
|
Ebadi AR, Soleimani A, Ghaderzadeh A. Providing an optimized model to detect driver genes from heterogeneous cancer samples using restriction in subspace learning. Sci Rep 2021; 11:9171. [PMID: 33911156 PMCID: PMC8080706 DOI: 10.1038/s41598-021-88548-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/13/2021] [Indexed: 11/09/2022] Open
Abstract
Extracting the drivers from genes with mutation, and segregation of driver and passenger genes are known as the most controversial issues in cancer studies. According to the heterogeneity of cancer, it is not possible to identify indicators under a group of associated drivers, in order to identify a group of patients with diseases related to these subgroups. Therefore, the precise identification of the related driver genes using artificial intelligence techniques is still considered as a challenge for researchers. In this research, a new method has been developed using the subspace learning method, unsupervised learning, and with more constraints. Accordingly, it has been attempted to extract the driver genes with more precision and accurate results. The obtained results show that the proposed method is more to predict the driver genes and subgroups of driver genes which have the highest degree of overlap due to p-value with known driver genes in valid databases. Driver genes are the benchmark of MsigDB which have more overlap compared to them as selected driver genes. In this article, in addition to including the driver genes defined in previous work, introduce newer driver genes. The minister will define newer groups of driver genes compared to other methods the p-value of the proposed method was 9.21e-7 better than previous methods for 200 genes. Due to the overlap and newer driver genes and driver gene group and subgroups. The results show that the p value of the proposed method is about 2.7 times less than the driver sub method due to overlap, indicating that the proposed method can identify driver genes in cancerous tumors with greater accuracy and reliability.
Collapse
Affiliation(s)
- Ali Reza Ebadi
- Department Computer Engineering, Sanandaj Branch, Islamic Azad University, Sanandaj, Iran
| | - Ali Soleimani
- Department of Computer Engineering, College of Technical and Engineering, Malard Branch, Islamic Azad University, Tehran, Iran.
| | - Abdulbaghi Ghaderzadeh
- Department Computer Engineering, Sanandaj Branch, Islamic Azad University, Sanandaj, Iran
| |
Collapse
|
10
|
Zhang W, Zeng Y, Wang L, Liu Y, Cheng YN. An Effective Graph Clustering Method to Identify Cancer Driver Modules. Front Bioeng Biotechnol 2020; 8:271. [PMID: 32318558 PMCID: PMC7154174 DOI: 10.3389/fbioe.2020.00271] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 03/16/2020] [Indexed: 12/15/2022] Open
Abstract
Identifying the molecular modules that drive cancer progression can greatly deepen the understanding of cancer mechanisms and provide useful information for targeted therapies. Most methods currently addressing this issue primarily use mutual exclusivity without making full use of the extra layer of module property. In this paper, we propose MCLCluster to identity cancer driver modules, which use somatic mutation data, Cancer Cell Fraction (CCF) data, gene functional interaction network and protein-protein interaction (PPI) network to derive the module property on mutual exclusivity, connectivity in PPI network and functionally similarity of genes. We have taken three effective measures to ensure the effectiveness of our algorithm. First, we use CCF data to choose stronger signals and more confident mutations. Second, the weighted gene functional interaction network is used to quantify the gene functional similarity in PPI. The third, graph clustering method based on Markov is exploited to extract the candidate module. MCLCluster is tested in the two TCGA datasets (GBM and BRCA), and identifies several well-known oncogenes driver modules and some modules with functionally associated driver genes. Besides, we compare it with Multi-Dendrix, FSME Cluster and RME in simulated dataset with background noise and passenger rate, MCLCluster outperforming all of these methods.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| | - Yifu Zeng
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Yue Liu
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, China
| | - Yi-Nan Cheng
- College of Science, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
11
|
Al Hajri Q, Dash S, Feng WC, Garner HR, Anandakrishnan R. Identifying multi-hit carcinogenic gene combinations: Scaling up a weighted set cover algorithm using compressed binary matrix representation on a GPU. Sci Rep 2020; 10:2022. [PMID: 32029803 PMCID: PMC7005272 DOI: 10.1038/s41598-020-58785-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 01/20/2020] [Indexed: 01/16/2023] Open
Abstract
Despite decades of research, effective treatments for most cancers remain elusive. One reason is that different instances of cancer result from different combinations of multiple genetic mutations (hits). Therefore, treatments that may be effective in some cases are not effective in others. We previously developed an algorithm for identifying combinations of carcinogenic genes with mutations (multi-hit combinations), which could suggest a likely cause for individual instances of cancer. Most cancers are estimated to require three or more hits. However, the computational complexity of the algorithm scales exponentially with the number of hits, making it impractical for identifying combinations of more than two hits. To identify combinations of greater than two hits, we used a compressed binary matrix representation, and optimized the algorithm for parallel execution on an NVIDIA V100 graphics processing unit (GPU). With these enhancements, the optimized GPU implementation was on average an estimated 12,144 times faster than the original integer matrix based CPU implementation, for the 3-hit algorithm, allowing us to identify 3-hit combinations. The 3-hit combinations identified using a training set were able to differentiate between tumor and normal samples in a separate test set with 90% overall sensitivity and 93% overall specificity. We illustrate how the distribution of mutations in tumor and normal samples in the multi-hit gene combinations can suggest potential driver mutations for further investigation. With experimental validation, these combinations may provide insight into the etiology of cancer and a rational basis for targeted combination therapy.
Collapse
Affiliation(s)
- Qais Al Hajri
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Sajal Dash
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Wu-Chun Feng
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, 24060, USA
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Harold R Garner
- Department of Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, 24060, USA
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, 29303, USA
| | - Ramu Anandakrishnan
- Department of Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, 24060, USA.
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, 29303, USA.
| |
Collapse
|
12
|
Feng X, Wang E, Cui Q. Gene Expression-Based Predictive Markers for Paclitaxel Treatment in ER+ and ER- Breast Cancer. Front Genet 2019; 10:156. [PMID: 30881385 PMCID: PMC6405635 DOI: 10.3389/fgene.2019.00156] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 02/13/2019] [Indexed: 12/29/2022] Open
Abstract
One of the objectives of precision oncology is to identify patient’s responsiveness to a given treatment and prevent potential overtreatments through molecular profiling. Predictive gene expression biomarkers are a promising and practical means to this purpose. The overall response rate of paclitaxel drugs in breast cancer has been reported to be in the range of 20–60% and is in the even lower range for ER-positive patients. Predicting responsiveness of breast cancer patients, either ER-positive or ER-negative, to paclitaxel treatment could prevent individuals with poor response to the therapy from undergoing excess exposure to the agent. In this study, we identified six sets of gene signatures whose gene expression profiles could robustly predict nonresponding patients with precisions more than 94% and recalls more than 93% on various discovery datasets (n = 469 for the largest set) and independent validation datasets (n = 278), using the previously developed Multiple Survival Screening algorithm, a random-sampling-based methodology. The gene signatures reported were stable regardless of half of the discovery datasets being swapped, demonstrating their robustness. We also reported a set of optimizations that enabled the algorithm to train on small-scale computational resources. The gene signatures and optimized methodology described in this study could be used for identifying unresponsiveness in patients of ER-positive or ER-negative breast cancers.
Collapse
Affiliation(s)
- Xiaowen Feng
- Department of Biomedical Informatics, School of Basic Medical Sciences, MOE Key Lab of Cardiovascular Sciences, Peking University, Beijing, China.,Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Edwin Wang
- Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Qinghua Cui
- Department of Biomedical Informatics, School of Basic Medical Sciences, MOE Key Lab of Cardiovascular Sciences, Peking University, Beijing, China
| |
Collapse
|
13
|
Dash S, Kinney NA, Varghese RT, Garner HR, Feng WC, Anandakrishnan R. Differentiating between cancer and normal tissue samples using multi-hit combinations of genetic mutations. Sci Rep 2019; 9:1005. [PMID: 30700767 PMCID: PMC6353925 DOI: 10.1038/s41598-018-37835-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 12/14/2018] [Indexed: 01/06/2023] Open
Abstract
Cancer is known to result from a combination of a small number of genetic defects. However, the specific combinations of mutations responsible for the vast majority of cancers have not been identified. Current computational approaches focus on identifying driver genes and mutations. Although individually these mutations can increase the risk of cancer they do not result in cancer without additional mutations. We present a fundamentally different approach for identifying the cause of individual instances of cancer: we search for combinations of genes with carcinogenic mutations (multi-hit combinations) instead of individual driver genes or mutations. We developed an algorithm that identified a set of multi-hit combinations that differentiate between tumor and normal tissue samples with 91% sensitivity (95% Confidence Interval (CI) = 89-92%) and 93% specificity (95% CI = 91-94%) on average for seventeen cancer types. We then present an approach based on mutational profile that can be used to distinguish between driver and passenger mutations within these genes. These combinations, with experimental validation, can aid in better diagnosis, provide insights into the etiology of cancer, and provide a rational basis for designing targeted combination therapies.
Collapse
Affiliation(s)
- Sajal Dash
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Nicholas A Kinney
- Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, USA
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, USA
| | - Robin T Varghese
- Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, USA
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, USA
| | - Harold R Garner
- Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, USA
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, USA
| | - Wu-Chun Feng
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA
| | - Ramu Anandakrishnan
- Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, USA.
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, USA.
| |
Collapse
|
14
|
Wang P, Gao L, Hu Y, Li F. Feature related multi-view nonnegative matrix factorization for identifying conserved functional modules in multiple biological networks. BMC Bioinformatics 2018; 19:394. [PMID: 30373534 PMCID: PMC6206826 DOI: 10.1186/s12859-018-2434-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 10/15/2018] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Comprehensive analyzing multi-omics biological data in different conditions is important for understanding biological mechanism in system level. Multiple or multi-layer network model gives us a new insight into simultaneously analyzing these data, for instance, to identify conserved functional modules in multiple biological networks. However, because of the larger scale and more complicated structure of multiple networks than single network, how to accurate and efficient detect conserved functional biological modules remains a significant challenge. RESULTS Here, we propose an efficient method, named ConMod, to discover conserved functional modules in multiple biological networks. We introduce two features to characterize multiple networks, thus all networks are compressed into two feature matrices. The module detection is only performed in the feature matrices by using multi-view non-negative matrix factorization (NMF), which is independent of the number of input networks. Experimental results on both synthetic and real biological networks demonstrate that our method is promising in identifying conserved modules in multiple networks since it improves the accuracy and efficiency comparing with state-of-the-art methods. Furthermore, applying ConMod to co-expression networks of different cancers, we find cancer shared gene modules, the majority of which have significantly functional implications, such as ribosome biogenesis and immune response. In addition, analyzing on brain tissue-specific protein interaction networks, we detect conserved modules related to nervous system development, mRNA processing, etc. CONCLUSIONS: ConMod facilitates finding conserved modules in any number of networks with a low time and space complexity, thereby serve as a valuable tool for inference shared traits and biological functions of multiple biological system.
Collapse
Affiliation(s)
- Peizhuo Wang
- School of Computer Science and Technology, Xidian University, Xi’an, 710071 China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, 710071 China
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi’an, 710071 China
| | - Feng Li
- School of Computer Science and Technology, Xidian University, Xi’an, 710071 China
| |
Collapse
|