1
|
Tran TD, Nguyen MT. C-Biomarker.net: A Cytoscape app for the identification of cancer biomarker genes from cores of large biomolecular networks. Biosystems 2023; 226:104887. [PMID: 36990379 DOI: 10.1016/j.biosystems.2023.104887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Revised: 03/22/2023] [Accepted: 03/24/2023] [Indexed: 03/30/2023]
Abstract
Although there have been many studies revealing that biomarker genes for early cancer detection can be found in biomolecular networks, no proper tool exists to discover the cancer biomarker genes from various biomolecular networks. Accordingly, we developed a novel Cytoscape app called C-Biomarker.net, which can identify cancer biomarker genes from cores of various biomolecular networks. Derived from recent research, we designed and implemented the software based on parallel algorithms proposed in this study for working on high-performance computing devices. We tested our software on various network sizes and found the suitable size for each running mode on CPU or GPU. Interestingly, using the software for 17 cancer signaling pathways, we found that on average 70.59% of the top three nodes residing at the innermost core of each pathway are biomarker genes of the cancer respectively to the pathway. Similarly, by the software, we also found 100% of the top ten nodes at both cores of Human Gene Regulatory (HGR) network and Human Protein-Protein Interaction (HPPI) network are multi-cancer biomarkers. These case studies are reliable evidence for performance of cancer biomarker prediction function in the software. Through the case studies, we also suggest that true cores of directed complex networks should be identified by the algorithm of R-core rather than K-core as usual. Finally, we compared the prediction result of our software with those of other researchers and confirmed that our prediction method outperforms the other methods. Taken together, C-Biomarker.net is a reliable tool that efficiently detects biomarker nodes from cores of various large biomolecular networks. The software is available at https://github.com/trantd/C-Biomarker.net.
Collapse
|
2
|
Li X, Xiang J, Wu FX, Li M. A Dual Ranking Algorithm Based on the Multiplex Network for Heterogeneous Complex Disease Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1993-2002. [PMID: 33577455 DOI: 10.1109/tcbb.2021.3059046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Identifying biomarkers of heterogeneous complex diseases has always been one of the focuses in medical research. In previous studies, the powerful network propagation methods have been applied to finding marker genes related to specific diseases, but existing methods are mostly based on a single network, which may be greatly affected by the incompleteness of the network and the ignorance of a large amount of information about physical and functional interactions between biological components. Other methods that directly integrate multiple types of interactions into an aggregate network have the risks that different types of data may conflict with each other and the characteristics and topologies of each individual network are lost. Meanwhile, biomarkers used in clinical trials should have the characteristics of small quantity and strong discriminate ability. In this study, we developed a multiplex network-based dual ranking framework (DualRank) for heterogeneous complex disease analysis. We applied the proposed method to heterogeneous complex diseases for diagnosis, prognosis, and classification. The results showed that DualRank outperformed competing methods and could identify biomarkers with the small quantity, great prediction performance (average AUC = 0.818) and biological interpretability.
Collapse
|
3
|
Li L, Cai D, Zhong H, Liu F, Jiang Q, Liang J, Li P, Song Y, Ji A, Jiao W, Song J, Li J, Chen Z, Li Q, Ke L. Mitochondrial dynamics and biogenesis indicators may serve as potential biomarkers for diagnosis of myasthenia gravis. Exp Ther Med 2022; 23:307. [PMID: 35340870 PMCID: PMC8931634 DOI: 10.3892/etm.2022.11236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 02/10/2022] [Indexed: 11/05/2022] Open
Abstract
Due to challenges in diagnosing myasthenia gravis (MG), identifying novel diagnostic biomarkers for this disease is essential. Mitochondria are key organelles that regulate multiple physiological functions, such as energy production, cell proliferation and cell death. In the present study, Mfn1/2, Opa1, Drp1, Fis1, AMPK, PGC-1α, NRF-1 and TFAM were compared between patients with MG and healthy subjects to identify potential diagnostic biomarkers for MG. Blood samples were collected from 50 patients with MG and 50 healthy subjects. The participants' demographic information and routine blood test results were recorded. Mitochondrial dynamics were evaluated and levels of Mfn1/2, Opa1, Drp1, Fis1, AMPK, PGC-1α, NRF-1 and TFAM were determined in peripheral blood mononuclear cells using western blotting and reverse transcription-quantitative PCR, respectively. Receiver operating characteristic curve analysis was used to evaluate the diagnostic accuracy of these indicators. The areas under the curve values of Mfn1/2, Opa1, Drp1, Fis1,AMPK, PGC-1α, NRF-1 and TFAM were 0.5408-0.8696. Compared with control subjects, mRNA expression levels of Mfn1/2, Opa1, AMPK, PGC-1α, NRF-1 and TFAM were lower, while those of Drp1 and Fis1 were higher in patients with MG. The protein expression levels of all these molecules were lower in patients with MG than in control subjects. These results suggested that mitochondrial dynamics and biogenesis indicators may be diagnostic biomarkers for MG.
Collapse
Affiliation(s)
- Lanqi Li
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Donghong Cai
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Huiya Zhong
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Fengbin Liu
- Department of Gastrosplenic Diseases, The First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Qilong Jiang
- Department of Gastrosplenic Diseases, The First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Jian Liang
- Guangdong Provincial Key Laboratory of New Drug Development and Research of Chinese Medicine, Mathematical Engineering Academy of Chinese Medicine, Guangzhou, Guangdong 510006, P.R. China
| | - Peiwu Li
- Department of Gastrosplenic Diseases, The First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Yafang Song
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Aidong Ji
- Clinical Medical College of Acupuncture, Moxibustion and Rehabilitation, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510006, P.R. China
| | - Wei Jiao
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Jingwei Song
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Jinqiu Li
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Zhiwei Chen
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| | - Qing Li
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine
| | - Lingling Ke
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, P.R. China
| |
Collapse
|
4
|
Li X, Li M, Xiang J, Zhao Z, Shang X. SEPA: Signalling entropy-based algorithm to evaluate personalized pathway activation for survival analysis on pan-cancer data. Bioinformatics 2022; 38:2536-2543. [PMID: 35199150 DOI: 10.1093/bioinformatics/btac122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 01/16/2022] [Accepted: 02/21/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Biomarkers with prognostic ability and biological interpretability can be used to support decision-making in the survival analysis. Genes usually form functional modules to play synergistic roles, such as pathways. Predicting significant features from the functional level can effectively reduce the adverse effects of heterogeneity and obtain more reproducible and interpretable biomarkers. Personalized pathway activation inference can quantify the dysregulation of essential pathways involved in the initiation and progression of cancers, and can contribute to the development of personalized medical treatments. RESULTS In this study, we propose a novel method to evaluate personalized pathway activation based on signalling entropy for survival analysis (SEPA), which is a new attempt to introduce the information-theoretic entropy in generating pathway representation for each patient. SEPA effectively integrates pathway-level information into gene expression data, converting the high-dimensional gene expression data into the low-dimensional biological pathway activation scores. SEPA shows its classification power on the prognostic pan-cancer genomic data, and the potential pathway markers identified based on SEPA have statistical significance in the discrimination of high-risk and low-risk cohorts and are likely to be associated with the initiation and progress of cancers. The results show that SEPA scores can be used as an indicator to precisely distinguish cancer patients with different clinical outcomes, and identify important pathway features with strong discriminative power and biological interpretability. AVAILABILITY The MATLAB-package for SEPA is freely available from https://github.com/xingyili/SEPA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710072, China
| | - Min Li
- School of Computer Science, Central South University, Changsha, Hunan, 410083, China
| | - Ju Xiang
- School of Computer Science, Central South University, Changsha, Hunan, 410083, China.,Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, Hunan, 410219, China
| | - Zhelin Zhao
- School of Software, Northwestern Polytechnical University, Xi'an, Shaanxi, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710072, China
| |
Collapse
|
5
|
Ma B, Yan G, Chai B, Hou X. XGBLC: an improved survival prediction model based on XGBoost. Bioinformatics 2022; 38:410-418. [PMID: 34586380 DOI: 10.1093/bioinformatics/btab675] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 09/06/2021] [Accepted: 09/23/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Survival analysis using gene expression profiles plays a crucial role in the interpretation of clinical research and assessment of disease therapy programs. Several prediction models have been developed to explore the relationship between patients' covariates and survival. However, the high-dimensional genomic features limit the prediction performance of the survival model. Thus, an accurate and reliable prediction model is necessary for survival analysis using high-dimensional genomic data. RESULTS In this study, we proposed an improved survival prediction model based on XGBoost framework called XGBLC, which used Lasso-Cox to enhance the ability to analyze high-dimensional genomic data. The novel first- and second-order gradient statistics of Lasso-Cox were defined to construct the loss function of XGBLC. We extensively tested our XGBLC algorithm on both simulated and real-world datasets, and estimated the performance of models with 5-fold cross-validation. Based on 20 cancer datasets from The Cancer Genome Atlas (TCGA), XGBLC outperforms five state-of-the-art survival methods in terms of C-index, Brier score and AUC. The results show that XGBLC still keeps good accuracy and robustness by comparing the performance on the simulated datasets with different scales. The developed prediction model would be beneficial for physicians to understand the effects of patient's genomic characteristics on survival and make personalized treatment decisions. AVAILABILITY AND IMPLEMENTATION The implementation of XGBLC algorithm based on R language is available at: https://github.com/lab319/XGBLC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Ge Yan
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bingjie Chai
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Xiaoyu Hou
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
6
|
Li XY, Xiang J, Wu FX, Li M. NetAUC: A network-based multi-biomarker identification method by AUC optimization. Methods 2021; 198:56-64. [PMID: 34364986 DOI: 10.1016/j.ymeth.2021.08.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/08/2021] [Accepted: 08/03/2021] [Indexed: 10/20/2022] Open
Abstract
Complex diseases are caused by a variety of factors, and their diagnosis, treatment and prognosis are usually difficult. Proteins play an indispensable role in living organisms and perform specific biological functions by interacting with other proteins or biomolecules, their dysfunction may lead to diseases, it is a natural way to mine disease-related biomarkers from protein-protein interaction network. AUC, the area under the receiver operating characteristics (ROC) curve, is regarded as a gold standard to evaluate the effectiveness of a binary classifier, which measures the classification ability of an algorithm under arbitrary distribution or any misclassification cost. In this study, we have proposed a network-based multi-biomarker identification method by AUC optimization (NetAUC), which integrates gene expression and the network information to identify biomarkers for the complex disease analysis. The main purpose is to optimize two objectives simultaneously: maximizing AUC and minimizing the number of selected features. We have applied NetAUC to two types of disease analysis: 1) prognosis of breast cancer, 2) classification of similar diseases. The results show that NetAUC can identify a small panel of disease-related biomarkers which have the powerful classification ability and the functional interpretability.
Collapse
Affiliation(s)
- Xing-Yi Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, 410219, Hunan, China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
7
|
Yang Q, Tian GL, Qin JW, Wu BQ, Tan L, Xu L, Wu SZ, Yang JT, Jiang JH, Yu RQ. Coupling bootstrap with synergy self-organizing map-based orthogonal partial least squares discriminant analysis: Stable metabolic biomarker selection for inherited metabolic diseases. Talanta 2020; 219:121370. [DOI: 10.1016/j.talanta.2020.121370] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 06/27/2020] [Accepted: 06/30/2020] [Indexed: 12/13/2022]
|
8
|
Feng ZY, Wang Y. ELF: Extract Landmark Features By Optimizing Topology Maintenance, Redundancy, and Specificity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:411-421. [PMID: 29994260 DOI: 10.1109/tcbb.2018.2846225] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Feature selection is the process of selecting a subset of landmark features for model construction when there are many features and a comparatively few samples. The far-reaching development technologies such as biological sequencing at single cell level make feature selection a more challenging work. The difficulty lies in four facts: those features measured are in high dimension and with noise; dropouts make the data much sparse; many features are either redundant or irrelevant; and samples are not well-labeled in the experiments. Here, we propose a new model called ELF (Extract Landmark Features) to address the above challenges. ELF aims to simultaneously maximize topology maintenance to keep the pairwise relationships among samples, minimize feature redundancy to diversify the features, and maximize feature specificity to make every selected feature more representative. This makes ELF a nonlinear combinatorial optimization. To solve this difficult problem, we propose a heuristic algorithm based on greedy strategy. We show ELF's outstanding performance on two single cell RNA-seq datasets. One is the direct reprogramming from mouse embryonic fibroblasts to induced neuron and the other is hepatoblast differentiation. ELF is able to choose only hundreds of landmark genes to maintain the cells' correlativity. Topology maintenance, redundancy removal, and specificity each plays its important role in selecting landmark features and revealing cells' biological functions. In addition, ELF can be generally applied in other scenarios. We demonstrate that ELF can reveal pivotal pixel in writing region and human face in two public image datasets. We believe that ELF is a useful tool to obtain more interpretable results by revealing key features while clustering the samples well.
Collapse
|
9
|
Yang W, Han J, Ma J, Feng Y, Hou Q, Wang Z, Yu T. Prediction of key gene function in spinal muscular atrophy using guilt by association method based on network and gene ontology. Exp Ther Med 2019; 17:2561-2566. [PMID: 30906446 PMCID: PMC6425128 DOI: 10.3892/etm.2019.7216] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 01/23/2019] [Indexed: 12/21/2022] Open
Abstract
Guilt by association (GBA) algorithm has been widely used to predict gene functions statistically, and a network-based approach may increase the confidence and veracity of identifying molecular signatures for diseases. The aim of the present study was to suggest a gene ontology (GO)-based method by integrating the GBA algorithm and network, to identify key gene functions for spinal muscular atrophy (SMA). The inference of predicting key gene functions was comprised of four steps, preparing gene lists and sets; extracting differentially expressed genes (DEGs) using microarray data [linear models for microarray data (limma)] package; constructing a co-expression matrix on gene lists using the Spearman correlation coefficient method; and predicting gene functions by GBA algorithm. Ultimately, key gene functions were predicted according to the area under the curve (AUC) index for GO terms and the GO terms with AUC >0.7 were determined as the optimal gene functions for SMA. A total of 484 DEGs and 466 background GO terms were regarded as gene lists and sets for the subsequent analyses, respectively. The predicted results obtained from the network-based GBA approach showed 141 gene sets had a good classified performance with AUC >0.5. Most significantly, 3 gene sets with AUC >0.7 were denoted as seed gene functions for SMA, including cell morphogenesis, which is involved in differentiation and ossification. In conclusion, we have predicted 3 key gene functions for SMA compared with control utilizing network-based GBA algorithm. The findings may provide great insights to reveal pathological and molecular mechanism underlying SMA.
Collapse
Affiliation(s)
- Wenjiu Yang
- Department of Spine Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Jing Han
- Department of Ophthalmology, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Jinfeng Ma
- Department of Spine Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Yujie Feng
- Hepatobiliary Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Qingxian Hou
- Department of Spine Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Zhijie Wang
- Department of Spine Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Tengbo Yu
- Sports Medicine, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| |
Collapse
|
10
|
Zhu L, Zhang HB, Huang DS. LMMO: A Large Margin Approach for Refining Regulatory Motifs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:913-925. [PMID: 28391205 DOI: 10.1109/tcbb.2017.2691325] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Although discriminative motif discovery (DMD) methods are promising for eliciting motifs from high-throughput experimental data, they usually have to sacrifice accuracy and may fail to fully leverage the potential of large datasets. Recently, it has been demonstrated that the motifs identified by DMDs can be significantly improved by maximizing the receiver-operating characteristic curve (AUC) metric, which has been widely used in the literature to rank the performance of elicited motifs. However, existing approaches for motif refinement choose to directly maximize the non-convex and discontinuous AUC itself, which is known to be difficult and may lead to suboptimal solutions. In this paper, we propose Large Margin Motif Optimizer (LMMO), a large-margin-type algorithm for refining regulatory motifs. By relaxing the AUC cost function with the surrogate convex hinge loss, we show that the resultant learning problem can be cast as an instance of difference-of-convex (DC) programs, and solve it iteratively using constrained concave-convex procedure (CCCP). To further save computational time, we combine LMMO with existing techniques for improving the scalability of large-margin-type algorithms, such as cutting plane method. Experimental evaluations on synthetic and real data illustrate the performance of the proposed approach. The code of LMMO is freely available at: https://github.com/ekffar/LMMO.
Collapse
|
11
|
Quintero M, Adamoski D, Reis LMD, Ascenção CFR, Oliveira KRSD, Gonçalves KDA, Dias MM, Carazzolle MF, Dias SMG. Guanylate-binding protein-1 is a potential new therapeutic target for triple-negative breast cancer. BMC Cancer 2017; 17:727. [PMID: 29115931 PMCID: PMC5688804 DOI: 10.1186/s12885-017-3726-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 10/30/2017] [Indexed: 12/11/2022] Open
Abstract
Background Triple-negative breast cancer (TNBC) is characterized by a lack of estrogen and progesterone receptor expression (ESR and PGR, respectively) and an absence of human epithelial growth factor receptor (ERBB2) amplification. Approximately 15–20% of breast malignancies are TNBC. Patients with TNBC often have an unfavorable prognosis. In addition, TNBC represents an important clinical challenge since it does not respond to hormone therapy. Methods In this work, we integrated high-throughput mRNA sequencing (RNA-Seq) data from normal and tumor tissues (obtained from The Cancer Genome Atlas, TCGA) and cell lines obtained through in-house sequencing or available from the Gene Expression Omnibus (GEO) to generate a unified list of differentially expressed (DE) genes. Methylome and proteomic data were integrated to our analysis to give further support to our findings. Genes that were overexpressed in TNBC were then curated to retain new potentially druggable targets based on in silico analysis. Knocking-down was used to assess gene importance for TNBC cell proliferation. Results Our pipeline analysis generated a list of 243 potential new targets for treating TNBC. We finally demonstrated that knock-down of Guanylate-Binding Protein 1 (GBP1 ), one of the candidate genes, selectively affected the growth of TNBC cell lines. Moreover, we showed that GBP1 expression was controlled by epidermal growth factor receptor (EGFR) in breast cancer cell lines. Conclusions We propose that GBP1 is a new potential druggable therapeutic target for treating TNBC with enhanced EGFR expression. Electronic supplementary material The online version of this article (10.1186/s12885-017-3726-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Melissa Quintero
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, 13083-970, Brazil
| | - Douglas Adamoski
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, 13083-970, Brazil.,Graduate Program in Genetics and Molecular Biology, Institute of Biology, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Larissa Menezes Dos Reis
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, 13083-970, Brazil.,Graduate Program in Genetics and Molecular Biology, Institute of Biology, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Carolline Fernanda Rodrigues Ascenção
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, 13083-970, Brazil.,Graduate Program in Genetics and Molecular Biology, Institute of Biology, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Krishina Ratna Sousa de Oliveira
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, 13083-970, Brazil.,Graduate Program in Genetics and Molecular Biology, Institute of Biology, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Kaliandra de Almeida Gonçalves
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, 13083-970, Brazil
| | - Marília Meira Dias
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, 13083-970, Brazil
| | - Marcelo Falsarella Carazzolle
- Genomic and Expression Laboratory (LGE), Institute of Biology, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Sandra Martha Gomes Dias
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, 13083-970, Brazil.
| |
Collapse
|
12
|
Cui Y, Li B, Li R. Decentralized Learning Framework of Meta-Survival Analysis for Developing Robust Prognostic Signatures. JCO Clin Cancer Inform 2017; 1:1-13. [PMID: 30657395 PMCID: PMC6873986 DOI: 10.1200/cci.17.00077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE A significant hurdle in developing reliable gene expression-based prognostic models has been the limited sample size, which can cause overfitting and false discovery. Combining data from multiple studies can enhance statistical power and reduce spurious findings, but how to address the biologic heterogeneity across different datasets remains a major challenge. Better meta-survival analysis approaches are needed. MATERIAL AND METHODS We presented a decentralized learning framework for meta-survival analysis without the need for data aggregation. Our method consisted of a series of proposals that together alleviated the influence of data heterogeneity and improved the performance of survival prediction. First, we transformed the gene expression profile of every sample into normalized percentile ranks to obtain platform-agnostic features. Second, we used Stouffer's meta-z approach in combination with Harrell's concordance index to prioritize and select genes to be included in the model. Third, we used survival discordance as a scale-independent model loss function. Instead of generating a merged dataset and training the model therein, we avoided comparing patients across datasets and individually evaluated the loss function on each dataset. Finally, we optimized the model by minimizing the joint loss function. RESULTS Through comprehensive evaluation on 31 public microarray datasets containing 6,724 samples of several cancer types, we demonstrated that the proposed method has outperformed (1) single prognostic genes identified using conventional meta-analysis, (2) multigene signatures trained on single datasets, (3) multigene signatures trained on merged datasets as well as by other existing meta-analysis methods, and (4) clinically applicable, established multigene signatures. CONCLUSION The decentralized learning approach can be used to effectively perform meta-analysis of gene expression data and to develop robust multigene prognostic signatures.
Collapse
Affiliation(s)
- Yi Cui
- Yi Cui, Bailiang Li, and Ruijiang Li, Stanford University School of Medicine, Stanford, CA; Yi Cui, Global Institution for Collaborative Research and Education, Hokkaido University, Sapporo, Japan
| | - Bailiang Li
- Yi Cui, Bailiang Li, and Ruijiang Li, Stanford University School of Medicine, Stanford, CA; Yi Cui, Global Institution for Collaborative Research and Education, Hokkaido University, Sapporo, Japan
| | - Ruijiang Li
- Yi Cui, Bailiang Li, and Ruijiang Li, Stanford University School of Medicine, Stanford, CA; Yi Cui, Global Institution for Collaborative Research and Education, Hokkaido University, Sapporo, Japan
| |
Collapse
|
13
|
Ke BS, Chiang AJ, Chang YCI. Influence Analysis for the Area Under the Receiver Operating Characteristic Curve. J Biopharm Stat 2017; 28:722-734. [PMID: 28920760 DOI: 10.1080/10543406.2017.1377728] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Classification measures play essential roles in the assessment and construction of classifiers. Hence, determining how to prevent these measures from being affected by individual observations has become an important problem. In this paper, we propose several indexes based on the influence function and the concept of local influence to identify influential observations that affect the estimate of the area under the receiver operating characteristic curve (AUC), an important and commonly used measure. Cumulative lift charts are also used to equipoise the disagreements among the proposed indexes. Both the AUC indexes and the graphical tools only rely on the classification scores, and both are applicable to classifiers that can produce real-valued classification scores. A real data set is used for illustration.
Collapse
Affiliation(s)
- Bo-Shiang Ke
- a Institute of Statistics, National Chiao Tung University , Hsinchu , Taiwan
| | - An Jen Chiang
- b Department of Obstetrics and Gynecology , Kaohsiung Veterans General Hospital , Kaohsiung , Taiwan
| | | |
Collapse
|
14
|
Salim A, Amjesh R, Chandra SSV. An approach to forecast human cancer by profiling microRNA expressions from NGS data. BMC Cancer 2017; 17:77. [PMID: 28122525 PMCID: PMC5267436 DOI: 10.1186/s12885-016-3042-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 12/28/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND microRNAs are single-stranded non-coding RNA sequences of 18 - 24 nucleotides in length. They play an important role in post-transcriptional regulation of gene expression. Evidences of microRNA acting as promoter/suppressor of several diseases including cancer are being unveiled. Recent studies have shown that microRNAs are differentially expressed in disease states when compared with that of normal states. Profiling of microRNA is a good measure to estimate the differences in expression levels, which can be further utilized to understand the progression of any associated disease. METHODS Machine learning techniques, when applied to microRNA expression values obtained from NGS data, could be utilized for the development of effective disease prediction system. This paper discusses an approach for microRNA expression profiling, its normalization and a Support Vector based machine learning technique to develop a Cancer Prediction System. Presently, the system has been trained with data samples of hepatocellular carcinoma, carcinomas of the bladder and lung cancer. microRNAs related to specific types of cancer were used to build the classifier. RESULTS When the system is trained and tested with 10 fold cross validation, the prediction accuracy obtained is 97.56% for lung cancer, 97.82% for hepatocellular carcinoma and 95.0% for carcinomas of the bladder. The system is further validated with separate test sets, which show accuracies higher than 90%. A ranking based on differential expression marks the relative significance of each microRNA in the prediction process. CONCLUSIONS Results from experiments proved that microRNA expression profiling is an effective mechanism for disease identification, provided sufficiently large database is available.
Collapse
Affiliation(s)
- A. Salim
- Department of Computer Science, College of Engineering Trivandrum, Sreekaryam, Thiruvananthapuram, India
| | - R. Amjesh
- Department of Computational Biology and BioInformatics, University of Kerala, Karyavattom, Thiruvananthapuram, India
| | - S. S. Vinod Chandra
- Department of Computational Biology and BioInformatics, University of Kerala, Karyavattom, Thiruvananthapuram, India
- Computer Center, University of Kerala, Thiruvananthapuram, India
| |
Collapse
|