1
|
CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network. Sci Rep 2019; 9:16927. [PMID: 31729414 PMCID: PMC6858312 DOI: 10.1038/s41598-019-53034-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 10/21/2019] [Indexed: 01/07/2023] Open
Abstract
With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.
Collapse
|
2
|
Asahchop EL, Branton WG, Krishnan A, Chen PA, Yang D, Kong L, Zochodne DW, Brew BJ, Gill MJ, Power C. HIV-associated sensory polyneuropathy and neuronal injury are associated with miRNA-455-3p induction. JCI Insight 2018; 3:122450. [PMID: 30518697 DOI: 10.1172/jci.insight.122450] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 10/24/2018] [Indexed: 11/17/2022] Open
Abstract
Symptomatic distal sensory polyneuropathy (sDSP) is common and debilitating in people with HIV/AIDS, leading to neuropathic pain, although the condition's cause is unknown. To investigate biomarkers and associated pathogenic mechanisms for sDSP, we examined plasma miRNA profiles in HIV/AIDS patients with sDSP or without sDSP in 2 independent cohorts together with assessing related pathogenic effects. Several miRNAs were found to be increased in the Discovery Cohort (sDSP, n = 29; non-DSP, n = 40) by array analyses and were increased in patients with sDSP compared with patients without sDSP. miR-455-3p displayed a 12-fold median increase in the sDSP group, which was confirmed by machine learning analyses and verified by reverse transcription PCR. In the Validation Cohort (sDSP n = 16, non-DSP n = 20, healthy controls n = 15), significant upregulation of miR-455-3p was also observed in the sDSP group. Bioinformatics revealed that miR-455-3p targeted multiple host genes implicated in peripheral nerve maintenance, including nerve growth factor (NGF) and related genes. Transfection of cultured human dorsal root ganglia with miR-455-3p showed a concentration-dependent reduction in neuronal β-III tubulin expression. Human neurons transfected with miR-455-3p demonstrated reduced neurite outgrowth and NGF expression that was reversed by anti-miR-455-3p antagomir cotreatment. miR-455-3p represents a potential biomarker for HIV-associated sDSP and might also exert pathogenic effects leading to sDSP.
Collapse
Affiliation(s)
- Eugene L Asahchop
- Department of Medicine (Neurology), University of Alberta, Edmonton, Alberta, Canada
| | - William G Branton
- Department of Medicine (Neurology), University of Alberta, Edmonton, Alberta, Canada
| | - Anand Krishnan
- Department of Medicine (Neurology), University of Alberta, Edmonton, Alberta, Canada
| | - Patricia A Chen
- Department of Medicine (Neurology), University of Alberta, Edmonton, Alberta, Canada
| | - Dong Yang
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Linglong Kong
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Douglas W Zochodne
- Department of Medicine (Neurology), University of Alberta, Edmonton, Alberta, Canada.,Neuroscience and Mental Health Institute, University of Alberta, Edmonton, Alberta, Canada
| | - Bruce J Brew
- Departments of Neurology and HIV, St. Vincent's Hospital, and Peter Duncan Neurosciences Unit, St. Vincent's Centre for Applied Medical Research, University of New South Wales, Sydney, Australia
| | - M John Gill
- Department of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Christopher Power
- Department of Medicine (Neurology), University of Alberta, Edmonton, Alberta, Canada.,Neuroscience and Mental Health Institute, University of Alberta, Edmonton, Alberta, Canada.,Department of Medicine, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
3
|
Yanamala N, Orandle MS, Kodali VK, Bishop L, Zeidler-Erdely PC, Roberts JR, Castranova V, Erdely A. Sparse Supervised Classification Methods Predict and Characterize Nanomaterial Exposures: Independent Markers of MWCNT Exposures. Toxicol Pathol 2017; 46:14-27. [PMID: 28934917 DOI: 10.1177/0192623317730575] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Recent experimental evidence indicates significant pulmonary toxicity of multiwalled carbon nanotubes (MWCNTs), such as inflammation, interstitial fibrosis, granuloma formation, and carcinogenicity. Although numerous studies explored the adverse potential of various CNTs, their comparability is often limited. This is due to differences in administered dose, physicochemical characteristics, exposure methods, and end points monitored. Here, we addressed the problem through sparse classification method, a supervised machine learning approach that can reduce the noise contained in redundant variables for discriminating among MWCNT-exposed and MWCNT-unexposed groups. A panel of proteins measured from bronchoalveolar lavage fluid (BAL) samples was used to predict exposure to various MWCNT and determine markers that are attributable to MWCNT exposure and toxicity in mice. Using sparse support vector machine-based classification technique, we identified a small subset of proteins clearly distinguishing each exposure. Macrophage-derived chemokine (MDC/CCL22), in particular, was associated with various MWCNT exposures and was independent of exposure method employed, that is, oropharyngeal aspiration versus inhalation exposure. Sustained expression of some of the selected protein markers identified also suggests their potential role in MWCNT-induced toxicity and proposes hypotheses for future mechanistic studies. Such approaches can be used more broadly for nanomaterial risk profiling studies to evaluate decisions related to dose/time-response relationships that could delineate experimental variables from exposure markers.
Collapse
Affiliation(s)
- Naveena Yanamala
- 1 Exposure Assessment Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, West Virginia, USA
| | - Marlene S Orandle
- 2 Pathology & Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, West Virginia, USA
| | - Vamsi K Kodali
- 2 Pathology & Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, West Virginia, USA
| | - Lindsey Bishop
- 2 Pathology & Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, West Virginia, USA
| | - Patti C Zeidler-Erdely
- 2 Pathology & Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, West Virginia, USA
| | - Jenny R Roberts
- 3 Allergy and Clinical Immunology Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, West Virginia, USA
| | - Vincent Castranova
- 4 Department of Pharmaceutical Sciences, West Virginia University, Morgantown, West Virginia, USA
| | - Aaron Erdely
- 2 Pathology & Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, West Virginia, USA
| |
Collapse
|
4
|
Verification of Three-Phase Dependency Analysis Bayesian Network Learning Method for Maize Carotenoid Gene Mining. BIOMED RESEARCH INTERNATIONAL 2017; 2017:1813494. [PMID: 28828382 PMCID: PMC5554554 DOI: 10.1155/2017/1813494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Accepted: 06/27/2017] [Indexed: 11/17/2022]
Abstract
Background and Objective Mining the genes related to maize carotenoid components is important to improve the carotenoid content and the quality of maize. Methods On the basis of using the entropy estimation method with Gaussian kernel probability density estimator, we use the three-phase dependency analysis (TPDA) Bayesian network structure learning method to construct the network of maize gene and carotenoid components traits. Results In the case of using two discretization methods and setting different discretization values, we compare the learning effect and efficiency of 10 kinds of Bayesian network structure learning methods. The method is verified and analyzed on the maize dataset of global germplasm collection with 527 elite inbred lines. Conclusions The result confirmed the effectiveness of the TPDA method, which outperforms significantly another 9 kinds of Bayesian network learning methods. It is an efficient method of mining genes for maize carotenoid components traits. The parameters obtained by experiments will help carry out practical gene mining effectively in the future.
Collapse
|
5
|
García V, Sánchez JS, Cleofas-Sánchez L, Ochoa-Domínguez HJ, López-Orozco F. An Insight on the ‘Large G, Small n’ Problem in Gene-Expression Microarray Classification. PATTERN RECOGNITION AND IMAGE ANALYSIS 2017. [DOI: 10.1007/978-3-319-58838-4_53] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
6
|
Martins M, Santos C, Costa L, Frizera A. Feature reduction and multi-classification of different assistive devices according to the gait pattern. Disabil Rehabil Assist Technol 2015; 11:202-18. [PMID: 26337072 DOI: 10.3109/17483107.2015.1079652] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Total knee arthroplasty (TKA) is a surgical procedure used in patients with Osteoarthritis to improve their state. An understanding about how gait patterns differ from patient to patient and are influenced by the assistive device (AD) that is prescribed is still missing. This article focuses on such purpose. Standard walker, crutches and rollator were tested. Symmetric indexes of spatiotemporal and postural control features were calculated. In order to select the important features which can discriminate the differences among the ADs, different techniques for feature selection are investigated. Classification is handled by Multi-class Support Vector Machine. Results showed that rollator provides a more symmetrical gait and crutches demonstrated to be the worst. Relatively to postural control parameters, standard walker is the most stable and crutches are the worst AD. This means that, depending on the patient's problem and the recovery goal, different ADs should be used. After selecting a set of 16 important features, through correlation, it was demonstrated that they provide important quantitative information about the functional capacity, which is not represented by velocity, cadence and clinical scales. Also, they were capable of distinguishing the gait patterns influenced by each AD, showing that each patient has different needs during recovery. Implications of Rehabilitation An understanding about how gait patterns of post-surgical patients differ from person to person and how they are influenced by the type of device that is prescribed during their recovery might help in physical therapy. Research specifically addressing these issues is still missing. Inter-limb asymmetry and postural control features can be evaluated in an outpatient setting, supplying important additional information about individual gait pattern, which is not represented by gait velocity, cadence and scales usually used. The features calculated in this study are able to provide complementary information to gait velocity, cadence and clinical scales to assess the functional capacity of patients that passed through TKA. The selected parameters make a new clinical tool useful for tracking the evolution of patients' recovery after TKA.
Collapse
Affiliation(s)
| | | | | | - Anselmo Frizera
- b Electrical Engineering Department, Federal University of Espirito Santo , Vitória , Brazil
| |
Collapse
|
7
|
Vidyasagar M. Machine learning methods in the computational biology of cancer. Proc Math Phys Eng Sci 2014; 470:20140081. [PMID: 25002826 PMCID: PMC4032557 DOI: 10.1098/rspa.2014.0081] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Accepted: 03/25/2014] [Indexed: 12/21/2022] Open
Abstract
The objectives of this Perspective paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented and is applied to predict the time to tumour recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented.
Collapse
Affiliation(s)
- M Vidyasagar
- Erik Jonsson School of Engineering and Computer Sciences, University of Texas at Dallas , 800 West Campbell Road, Richardson , TX 75080 , USA
| |
Collapse
|
8
|
Zhou W, Dickerson JA. A novel class dependent feature selection method for cancer biomarker discovery. Comput Biol Med 2014; 47:66-75. [DOI: 10.1016/j.compbiomed.2014.01.014] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2013] [Revised: 01/23/2014] [Accepted: 01/28/2014] [Indexed: 10/25/2022]
|