1
|
Yao H, Zhang N, Zhang R, Duan M, Xie T, Pan J, Peng E, Huang J, Zhang Y, Xu X, Xu H, Zhou F, Wang G. Severity Detection for the Coronavirus Disease 2019 (COVID-19) Patients Using a Machine Learning Model Based on the Blood and Urine Tests. Front Cell Dev Biol 2020; 8:683. [PMID: 32850809 PMCID: PMC7411005 DOI: 10.3389/fcell.2020.00683] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 07/06/2020] [Indexed: 01/08/2023] Open
Abstract
The recent outbreak of the coronavirus disease-2019 (COVID-19) caused serious challenges to the human society in China and across the world. COVID-19 induced pneumonia in human hosts and carried a highly inter-person contagiousness. The COVID-19 patients may carry severe symptoms, and some of them may even die of major organ failures. This study utilized the machine learning algorithms to build the COVID-19 severeness detection model. Support vector machine (SVM) demonstrated a promising detection accuracy after 32 features were detected to be significantly associated with the COVID-19 severeness. These 32 features were further screened for inter-feature redundancies. The final SVM model was trained using 28 features and achieved the overall accuracy 0.8148. This work may facilitate the risk estimation of whether the COVID-19 patients would develop the severe symptoms. The 28 COVID-19 severeness associated biomarkers may also be investigated for their underlining mechanisms how they were involved in the COVID-19 infections.
Collapse
Affiliation(s)
- Haochen Yao
- Department of Pathogenobiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medical Science, Jilin University, Changchun, China
| | - Nan Zhang
- The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Ruochi Zhang
- BioKnow Health Informatics Lab, College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Meiyu Duan
- BioKnow Health Informatics Lab, College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Tianqi Xie
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jiahui Pan
- Department of Pathogenobiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medical Science, Jilin University, Changchun, China
| | - Ejun Peng
- Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Juanjuan Huang
- Department of Pathogenobiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medical Science, Jilin University, Changchun, China
| | - Yingli Zhang
- The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Xiaoming Xu
- The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Hong Xu
- The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Fengfeng Zhou
- BioKnow Health Informatics Lab, College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Guoqing Wang
- Department of Pathogenobiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medical Science, Jilin University, Changchun, China
| |
Collapse
|
2
|
A survey on single and multi omics data mining methods in cancer data classification. J Biomed Inform 2020; 107:103466. [DOI: 10.1016/j.jbi.2020.103466] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 05/01/2020] [Accepted: 05/31/2020] [Indexed: 01/09/2023]
|
3
|
Mattesen TB, Rasmussen MH, Sandoval J, Ongen H, Árnadóttir SS, Gladov J, Martinez-Cardus A, Castro de Moura M, Madsen AH, Laurberg S, Dermitzakis ET, Esteller M, Andersen CL, Bramsen JB. MethCORR modelling of methylomes from formalin-fixed paraffin-embedded tissue enables characterization and prognostication of colorectal cancer. Nat Commun 2020; 11:2025. [PMID: 32332866 PMCID: PMC7181739 DOI: 10.1038/s41467-020-16000-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Accepted: 04/02/2020] [Indexed: 12/29/2022] Open
Abstract
Transcriptional characterization and classification has potential to resolve the inter-tumor heterogeneity of colorectal cancer and improve patient management. Yet, robust transcriptional profiling is difficult using formalin-fixed, paraffin-embedded (FFPE) samples, which complicates testing in clinical and archival material. We present MethCORR, an approach that allows uniform molecular characterization and classification of fresh-frozen and FFPE samples. MethCORR identifies genome-wide correlations between RNA expression and DNA methylation in fresh-frozen samples. This information is used to infer gene expression information in FFPE samples from their methylation profiles. MethCORR is here applied to methylation profiles from 877 fresh-frozen/FFPE samples and comparative analysis identifies the same two subtypes in four independent cohorts. Furthermore, subtype-specific prognostic biomarkers that better predicts relapse-free survival (HR = 2.66, 95%CI [1.67-4.22], P value < 0.001 (log-rank test)) than UICC tumor, node, metastasis (TNM) staging and microsatellite instability status are identified and validated using DNA methylation-specific PCR. The MethCORR approach is general, and may be similarly successful for other cancer types.
Collapse
Grants
- R01 CA207467 NCI NIH HHS
- This research is supported by grants from the European Commission FP7 project SYSCOL (UE7-SYSCOL-258236), the Novo Nordisk Foundation (NNF16OC0023182), the Danish National Advanced Technology Foundation (056-2010-1), the John and Birthe Meyer Foundation, the Danish Council for Independent Research (Medical Sciences) (DFF - 0602-02128B, DFF – 4183-00619, DFF - 7016-00332B), the Danish Council for Strategic Research (1309-00006B), the Danish Cancer Society (R40-A1965_11_S2, R56-A3110-12-S2, R107-A7035, R133-A8520), the National Cancer Institute of the National Institutes of Health (R01 CA207467), the Aage and Johanne Louis-Hansen’s Foundation (17-2-0457), the Knud and Edith Eriksen’s Memorial Foundation, the Neye Foundation and the Manufacturer Einar Willumsen’s Memorial Foundation (6000073)
Collapse
Affiliation(s)
- Trine B Mattesen
- Department of Molecular Medicine, Aarhus University Hospital, 8200, Aarhus, Denmark
| | - Mads H Rasmussen
- Department of Molecular Medicine, Aarhus University Hospital, 8200, Aarhus, Denmark
| | - Juan Sandoval
- Epigenomic Unit, Health Research Institute La Fe (ISSLaFe), Valencia, Spain
- Biomarker and precision medicine Unit, Health Research Institute La Fe (ISSLaFe), Valencia, Spain
| | - Halit Ongen
- Genetic Medicine and Development, University of Geneva Medical School-CMU, 1 Rue Michel-Servet, 1211, Geneva, Switzerland
| | - Sigrid S Árnadóttir
- Department of Molecular Medicine, Aarhus University Hospital, 8200, Aarhus, Denmark
| | - Josephine Gladov
- Department of Molecular Medicine, Aarhus University Hospital, 8200, Aarhus, Denmark
| | - Anna Martinez-Cardus
- Badalona Applied Research Group in Oncology (B-ARGO), Germans Trias i Pujol Research Institute (IGTP), Badalona, Barcelona, Catalonia, Spain
- Medical Oncology Service, Institute Catalan of Oncology (ICO), Badalona, Barcelona, Catalonia, Spain
| | - Manuel Castro de Moura
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Barcelona, Catalonia, Spain
| | - Anders H Madsen
- Department of Surgery, Hospitalsenheden Vest, 7400, Herning, Denmark
| | - Søren Laurberg
- Colorectal Surgical Unit, Department of Surgery, Aarhus University Hospital, 8200, Aarhus, Denmark
| | - Emmanouil T Dermitzakis
- Genetic Medicine and Development, University of Geneva Medical School-CMU, 1 Rue Michel-Servet, 1211, Geneva, Switzerland
| | - Manel Esteller
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Barcelona, Catalonia, Spain
- Centro de Investigacion Biomedica en Red Cancer (CIBERONC), Madrid, Spain
- Institucio Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain
- Physiological Sciences Department, School of Medicine and Health Sciences, University of Barcelona (UB), Barcelona, Catalonia, Spain
| | - Claus L Andersen
- Department of Molecular Medicine, Aarhus University Hospital, 8200, Aarhus, Denmark.
| | - Jesper B Bramsen
- Department of Molecular Medicine, Aarhus University Hospital, 8200, Aarhus, Denmark.
| |
Collapse
|
4
|
Neums L, Meier R, Koestler DC, Thompson JA. Improving survival prediction using a novel feature selection and feature reduction framework based on the integration of clinical and molecular data. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:415-426. [PMID: 31797615 PMCID: PMC6941850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The accurate prediction of a cancer patient's risk of progression or death can guide clinicians in the selection of treatment and help patients in planning personal affairs. Predictive models based on patient-level data represent a tool for determining risk. Ideally, predictive models will use multiple sources of data (e.g., clinical, demographic, molecular, etc.). However, there are many challenges associated with data integration, such as overfitting and redundant features. In this paper we aim to address those challenges through the development of a novel feature selection and feature reduction framework that can handle correlated data. Our method begins by computing a survival distance score for gene expression, which in combination with a score for clinical independence, results in the selection of highly predictive genes that are non-redundant with clinical features. The survival distance score is a measure of variation of gene expression over time, weighted by the variance of the gene expression over all patients. Selected genes, in combination with clinical data, are used to build a predictive model for survival. We benchmark our approach against commonly used methods, namely lasso- as well as ridge-penalized Cox proportional hazards models, using three publicly available cancer data sets: kidney cancer (521 samples), lung cancer (454 samples) and bladder cancer (335 samples). Across all data sets, our approach built on the training set outperformed the clinical data alone in the test set in terms of predictive power with a c.Index of 0.773 vs 0.755 for kidney cancer, 0.695 vs 0.664 for lung cancer and 0.648 vs 0.636 for bladder cancer. Further, we were able to show increased predictive performance of our method compared to lasso-penalized models fit to both gene expression and clinical data, which had a c.Index of 0.767, 0.677, and 0.645, as well as increased or comparable predictive power compared to ridge models, which had a c.Index of 0.773, 0.668 and 0.650 for the kidney, lung, and bladder cancer data sets, respectively. Therefore, our score for clinical independence improves prognostic performance as compared to modeling approaches that do not consider combining non-redundant data. Future work will concentrate on optimizing the survival distance score in order to achieve improved results for all types of cancer.
Collapse
Affiliation(s)
- Lisa Neums
- Department of Biostatistics and Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
- University of Kansas Cancer Center, 8919 Parallel Parkway, Suite 326, Kansas City, KS 66112, USA,
| | | | | | | |
Collapse
|
5
|
Yao H, Zhang N, Zhang R, Duan M, Xie T, Pan J, Peng E, Huang J, Zhang Y, Xu X, Xu H, Zhou F, Wang G. Severity Detection for the Coronavirus Disease 2019 (COVID-19) Patients Using a Machine Learning Model Based on the Blood and Urine Tests. Front Cell Dev Biol 2020. [PMID: 32850809 DOI: 10.2139/ssrn.3564426] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2023] Open
Abstract
The recent outbreak of the coronavirus disease-2019 (COVID-19) caused serious challenges to the human society in China and across the world. COVID-19 induced pneumonia in human hosts and carried a highly inter-person contagiousness. The COVID-19 patients may carry severe symptoms, and some of them may even die of major organ failures. This study utilized the machine learning algorithms to build the COVID-19 severeness detection model. Support vector machine (SVM) demonstrated a promising detection accuracy after 32 features were detected to be significantly associated with the COVID-19 severeness. These 32 features were further screened for inter-feature redundancies. The final SVM model was trained using 28 features and achieved the overall accuracy 0.8148. This work may facilitate the risk estimation of whether the COVID-19 patients would develop the severe symptoms. The 28 COVID-19 severeness associated biomarkers may also be investigated for their underlining mechanisms how they were involved in the COVID-19 infections.
Collapse
Affiliation(s)
- Haochen Yao
- Department of Pathogenobiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medical Science, Jilin University, Changchun, China
| | - Nan Zhang
- The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Ruochi Zhang
- BioKnow Health Informatics Lab, College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Meiyu Duan
- BioKnow Health Informatics Lab, College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Tianqi Xie
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jiahui Pan
- Department of Pathogenobiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medical Science, Jilin University, Changchun, China
| | - Ejun Peng
- Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Juanjuan Huang
- Department of Pathogenobiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medical Science, Jilin University, Changchun, China
| | - Yingli Zhang
- The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Xiaoming Xu
- The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Hong Xu
- The First Hospital of Jilin University, Jilin University, Changchun, China
| | - Fengfeng Zhou
- BioKnow Health Informatics Lab, College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Guoqing Wang
- Department of Pathogenobiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medical Science, Jilin University, Changchun, China
| |
Collapse
|
6
|
López de Maturana E, Alonso L, Alarcón P, Martín-Antoniano IA, Pineda S, Piorno L, Calle ML, Malats N. Challenges in the Integration of Omics and Non-Omics Data. Genes (Basel) 2019; 10:genes10030238. [PMID: 30897838 PMCID: PMC6471713 DOI: 10.3390/genes10030238] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 03/05/2019] [Accepted: 03/14/2019] [Indexed: 11/16/2022] Open
Abstract
Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm’s predictive ability. Only a small number of published studies performed a “real” integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.
Collapse
Affiliation(s)
- Evangelina López de Maturana
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Lola Alonso
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Pablo Alarcón
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Isabel Adoración Martín-Antoniano
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Silvia Pineda
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Lucas Piorno
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - M Luz Calle
- Biosciences Department, University of Vic-Central University of Catalonia, Carrer de la Laura 13, 08570 Vic, Spain.
| | - Núria Malats
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| |
Collapse
|