1
|
Işık YE, Aydın Z. Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity. PeerJ 2023; 11:e15552. [PMID: 37404475 PMCID: PMC10317018 DOI: 10.7717/peerj.15552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 05/23/2023] [Indexed: 07/06/2023] Open
Abstract
Respiratory diseases are among the major health problems causing a burden on hospitals. Diagnosis of infection and rapid prediction of severity without time-consuming clinical tests could be beneficial in preventing the spread and progression of the disease, especially in countries where health systems remain incapable. Personalized medicine studies involving statistics and computer technologies could help to address this need. In addition to individual studies, competitions are also held such as Dialogue for Reverse Engineering Assessment and Methods (DREAM) challenge which is a community-driven organization with a mission to research biology, bioinformatics, and biomedicine. One of these competitions was the Respiratory Viral DREAM Challenge, which aimed to develop early predictive biomarkers for respiratory virus infections. These efforts are promising, however, the prediction performance of the computational methods developed for detecting respiratory diseases still has room for improvement. In this study, we focused on improving the performance of predicting the infection and symptom severity of individuals infected with various respiratory viruses using gene expression data collected before and after exposure. The publicly available gene expression dataset in the Gene Expression Omnibus, named GSE73072, containing samples exposed to four respiratory viruses (H1N1, H3N2, human rhinovirus (HRV), and respiratory syncytial virus (RSV)) was used as input data. Various preprocessing methods and machine learning algorithms were implemented and compared to achieve the best prediction performance. The experimental results showed that the proposed approaches obtained a prediction performance of 0.9746 area under the precision-recall curve (AUPRC) for infection (i.e., shedding) prediction (SC-1), 0.9182 AUPRC for symptom class prediction (SC-2), and 0.6733 Pearson correlation for symptom score prediction (SC-3) by outperforming the best leaderboard scores of Respiratory Viral DREAM Challenge (a 4.48% improvement for SC-1, a 13.68% improvement for SC-2, and a 13.98% improvement for SC-3). Additionally, over-representation analysis (ORA), which is a statistical method for objectively determining whether certain genes are more prevalent in pre-defined sets such as pathways, was applied using the most significant genes selected by feature selection methods. The results show that pathways associated with the 'adaptive immune system' and 'immune disease' are strongly linked to pre-infection and symptom development. These findings contribute to our knowledge about predicting respiratory infections and are expected to facilitate the development of future studies that concentrate on predicting not only infections but also the associated symptoms.
Collapse
Affiliation(s)
- Yunus Emre Işık
- Department of Management Information Systems, Sivas Cumhuriyet University, Sivas, Turkey
| | - Zafer Aydın
- Department of Computer Engineering, Abdullah Gül University, Kayseri, Turkey
| |
Collapse
|
2
|
Görmez Y, Sabzekar M, Aydın Z. IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction. Proteins 2021; 89:1277-1288. [PMID: 33993559 DOI: 10.1002/prot.26149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 04/21/2021] [Accepted: 05/11/2021] [Indexed: 11/10/2022]
Abstract
There is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.
Collapse
Affiliation(s)
- Yasin Görmez
- Faculty of Economics and Administrative Sciences, Management Information Systems, Sivas Cumhuriyet University, Sivas, Turkey
| | - Mostafa Sabzekar
- Department of Computer Engineering, Birjand University of Technology, Birjand, Iran
| | - Zafer Aydın
- Engineering Faculty, Computer Engineering Department, Abdullah Gül University, Kayseri, Turkey
| |
Collapse
|
3
|
Sieberts SK, Schaff J, Duda M, Pataki BÁ, Sun M, Snyder P, Daneault JF, Parisi F, Costante G, Rubin U, Banda P, Chae Y, Chaibub Neto E, Dorsey ER, Aydın Z, Chen A, Elo LL, Espino C, Glaab E, Goan E, Golabchi FN, Görmez Y, Jaakkola MK, Jonnagaddala J, Klén R, Li D, McDaniel C, Perrin D, Perumal TM, Rad NM, Rainaldi E, Sapienza S, Schwab P, Shokhirev N, Venäläinen MS, Vergara-Diaz G, Zhang Y, Wang Y, Guan Y, Brunner D, Bonato P, Mangravite LM, Omberg L. Crowdsourcing digital health measures to predict Parkinson's disease severity: the Parkinson's Disease Digital Biomarker DREAM Challenge. NPJ Digit Med 2021; 4:53. [PMID: 33742069 PMCID: PMC7979931 DOI: 10.1038/s41746-021-00414-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 02/08/2021] [Indexed: 12/16/2022] Open
Abstract
Consumer wearables and sensors are a rich source of data about patients' daily disease and symptom burden, particularly in the case of movement disorders like Parkinson's disease (PD). However, interpreting these complex data into so-called digital biomarkers requires complicated analytical approaches, and validating these biomarkers requires sufficient data and unbiased evaluation methods. Here we describe the use of crowdsourcing to specifically evaluate and benchmark features derived from accelerometer and gyroscope data in two different datasets to predict the presence of PD and severity of three PD symptoms: tremor, dyskinesia, and bradykinesia. Forty teams from around the world submitted features, and achieved drastically improved predictive performance for PD status (best AUROC = 0.87), as well as tremor- (best AUPR = 0.75), dyskinesia- (best AUPR = 0.48) and bradykinesia-severity (best AUPR = 0.95).
Collapse
Affiliation(s)
| | | | - Marlena Duda
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Bálint Ármin Pataki
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
| | | | | | - Jean-Francois Daneault
- Dept of PM&R, Harvard Medical School, Spaulding Rehabilitation Hospital, Charlestown, MA, USA
- Dept of Rehabilitation and Movement Sciences, Rutgers University, Newark, NJ, USA
| | - Federico Parisi
- Dept of PM&R, Harvard Medical School, Spaulding Rehabilitation Hospital, Charlestown, MA, USA
- Wyss Institute, Harvard University, Boston, MA, USA
| | - Gianluca Costante
- Dept of PM&R, Harvard Medical School, Spaulding Rehabilitation Hospital, Charlestown, MA, USA
- Wyss Institute, Harvard University, Boston, MA, USA
| | - Udi Rubin
- Early Signal Foundation, 311 W 43rd Street, New York, NY, USA
| | - Peter Banda
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | | | | | - E Ray Dorsey
- Center for Health + Technology, University of Rochester, Rochester, NY, USA
| | - Zafer Aydın
- Department of Electrical and Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Aipeng Chen
- Prince of Wales Clinical School, UNSW Sydney, Sydney, Australia
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, Turku, Finland
| | - Carlos Espino
- Early Signal Foundation, 311 W 43rd Street, New York, NY, USA
| | - Enrico Glaab
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Ethan Goan
- School of Electrical Engineering and Robotics, Queensland University of Technology, Brisbane, QLD, Australia
| | | | - Yasin Görmez
- Department of Electrical and Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Maria K Jaakkola
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, Turku, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Jitendra Jonnagaddala
- School of Public Health and Community Medicine, UNSW Sydney, Sydney, Australia
- WHO Collaborating Centre for eHealth, UNSW Sydney, Sydney, Australia
| | - Riku Klén
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, Turku, Finland
| | - Dongmei Li
- Clinical and Translational Science Institute, University of Rochester Medical Center, Rochester, NY, USA
| | - Christian McDaniel
- Artificial Intelligence, University of Georgia, Athens, GA, USA
- Computer Science, University of Georgia, Athens, GA, USA
| | - Dimitri Perrin
- School of Computer Science, Queensland University of Technology, Brisbane, QLD, Australia
| | | | - Nastaran Mohammadian Rad
- Institute for Computing and Information Sciences, Radboud University, Nijmegen, The Netherlands
- Fondazione Bruno Kessler (FBK), Via Sommarive 18, Povo, Trento, Italy
- University of Trento, Trento, Italy
| | - Erin Rainaldi
- Verily Life Sciences, 269 East Grand Ave, South San Francisco, CA, USA
| | - Stefano Sapienza
- Dept of PM&R, Harvard Medical School, Spaulding Rehabilitation Hospital, Charlestown, MA, USA
| | - Patrick Schwab
- Institute of Robotics and Intelligent Systems, ETH Zurich, Zurich, Switzerland
| | | | - Mikko S Venäläinen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, Turku, Finland
| | - Gloria Vergara-Diaz
- Dept of PM&R, Harvard Medical School, Spaulding Rehabilitation Hospital, Charlestown, MA, USA
| | - Yuqian Zhang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Daniela Brunner
- Early Signal Foundation, 311 W 43rd Street, New York, NY, USA
- Dept. of Psychiatry, Columbia University, New York, NY, USA
| | - Paolo Bonato
- Dept of PM&R, Harvard Medical School, Spaulding Rehabilitation Hospital, Charlestown, MA, USA
- Wyss Institute, Harvard University, Boston, MA, USA
| | | | | |
Collapse
|
4
|
Fourati S, Talla A, Mahmoudian M, Burkhart JG, Klén R, Henao R, Yu T, Aydın Z, Yeung KY, Ahsen ME, Almugbel R, Jahandideh S, Liang X, Nordling TEM, Shiga M, Stanescu A, Vogel R, Pandey G, Chiu C, McClain MT, Woods CW, Ginsburg GS, Elo LL, Tsalik EL, Mangravite LM, Sieberts SK. A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection. Nat Commun 2018; 9:4418. [PMID: 30356117 PMCID: PMC6200745 DOI: 10.1038/s41467-018-06735-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 09/12/2018] [Indexed: 01/17/2023] Open
Abstract
The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.
Collapse
Affiliation(s)
- Slim Fourati
- Department of Pathology, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Aarthi Talla
- Department of Pathology, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Mehrad Mahmoudian
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
- Department of Future Technologies, University of Turku, FI-20014 Turku, Finland
| | - Joshua G Burkhart
- Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, OR, 97239, USA
- Laboratory of Evolutionary Genetics, Institute of Ecology and Evolution, University of Oregon, Eugene, OR, 97403, USA
| | - Riku Klén
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Ricardo Henao
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA
| | - Thomas Yu
- Sage Bionetworks, Seattle, WA, 98121, USA
| | - Zafer Aydın
- Department of Computer Engineering, Abdullah Gul University, Kayseri, 38080, Turkey
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, WA, 98402, USA
| | - Mehmet Eren Ahsen
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Reem Almugbel
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, WA, 98402, USA
| | | | - Xiao Liang
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, WA, 98402, USA
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, Tainan, 70101, Taiwan
| | - Motoki Shiga
- Department of Electrical, Electronic and Computer Engineering, Faculty of Engineering, Gifu University, Gifu, 501-1193, Japan
| | - Ana Stanescu
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Computer Science, University of West Georgia, Carrolton, GA, 30116, USA
| | - Robert Vogel
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598, USA
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Christopher Chiu
- Section of Infectious Diseases and Immunity, Imperial College London, London, W12 0NN, UK
| | - Micah T McClain
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Medical Service, Durham VA Health Care System, Durham, NC, 27705, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Christopher W Woods
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Medical Service, Durham VA Health Care System, Durham, NC, 27705, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Geoffrey S Ginsburg
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Ephraim L Tsalik
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Emergency Medicine Service, Durham VA Health Care System, Durham, NC, 27705, USA
| | | | | |
Collapse
|