1
|
Nawaz MS, Nawaz MZ, Junyi Z, Fournier-Viger P, Qu JF. Exploiting the sequential nature of genomic data for improved analysis and identification. Comput Biol Med 2024; 183:109307. [PMID: 39488052 DOI: 10.1016/j.compbiomed.2024.109307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 09/18/2024] [Accepted: 10/18/2024] [Indexed: 11/04/2024]
Abstract
Genomic data is growing exponentially, posing new challenges for sequence analysis and classification, particularly for managing and understanding harmful new viruses that may later cause pandemics. Recent genome sequence classification models yield promising performance. However, the majority of them do not consider the sequential arrangement of nucleotides and amino acids, a critical aspect for uncovering their inherent structure and function. To overcome this, we introduce GenoAnaCla, a novel approach for analyzing and classifying genome sequences, based on sequential pattern mining (SPM). The proposed approach first constructs and preprocesses datasets comprising RNA virus genome sequences in three formats: nucleotide, coding region, and protein. Then, to capture sequential features for the analysis and classification of viruses, GenoAnaCla extracts frequent sequential patterns and rules in three forms and in codons. Eight classifiers are utilized, and their effectiveness is assessed by employing a variety of evaluation metrics. A performance comparison demonstrates that the suggested approach surpasses the current state-of-the-art genome sequence classification and detection techniques with a 3.18% performance increase in accuracy on average.
Collapse
Affiliation(s)
- M Saqib Nawaz
- College of Computer Science and Software Engineering, Shenzhen University, China.
| | - M Zohaib Nawaz
- College of Computer Science and Software Engineering, Shenzhen University, China; Faculty of Computing and Information Technology, Department of Computer Science, University of Sargodha, Pakistan.
| | - Zhang Junyi
- College of Computer Science and Software Engineering, Shenzhen University, China.
| | | | - Jun-Feng Qu
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, Hubei, China.
| |
Collapse
|
2
|
Gugulothu P, Bhukya R. Coot-Lion optimized deep learning algorithm for COVID-19 point mutation rate prediction using genome sequences. Comput Methods Biomech Biomed Engin 2024; 27:1410-1429. [PMID: 37668061 DOI: 10.1080/10255842.2023.2244109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 07/08/2023] [Accepted: 07/28/2023] [Indexed: 09/06/2023]
Abstract
In this study, a deep quantum neural network (DQNN) based on the Lion-based Coot algorithm (LBCA-based Deep QNN) is employed to predict COVID-19. Here, the genome sequences are subjected to feature extraction. The fusion of features is performed using the Bray-Curtis distance and the deep belief network (DBN). Lastly, a deep quantum neural network (Deep QNN) is used to predict COVID-19. The LBCA is obtained by integrating Coot algorithm and LOA. The COVID-19 predictions are done with mutation points. The LBCA-based Deep QNN outperformed with testing accuracy of 0.941, true positive rate of 0.931, and false positive rate of 0.869.
Collapse
Affiliation(s)
- Praveen Gugulothu
- Department of Computer Science and Engineering, National Institute of Technology Warangal, Hanamkonda, Telangana 506004, India
| | - Raju Bhukya
- Department of Computer Science and Engineering, National Institute of Technology Warangal, Hanamkonda, Telangana 506004, India
| |
Collapse
|
3
|
Dubey S, Verma DK, Kumar M. Real-time infectious disease endurance indicator system for scientific decisions using machine learning and rapid data processing. PeerJ Comput Sci 2024; 10:e2062. [PMID: 39145255 PMCID: PMC11323025 DOI: 10.7717/peerj-cs.2062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/25/2024] [Indexed: 08/16/2024]
Abstract
The SARS-CoV-2 virus, which induces an acute respiratory illness commonly referred to as COVID-19, had been designated as a pandemic by the World Health Organization due to its highly infectious nature and the associated public health risks it poses globally. Identifying the critical factors for predicting mortality is essential for improving patient therapy. Unlike other data types, such as computed tomography scans, x-radiation, and ultrasounds, basic blood test results are widely accessible and can aid in predicting mortality. The present research advocates the utilization of machine learning (ML) methodologies for predicting the likelihood of infectious disease like COVID-19 mortality by leveraging blood test data. Age, LDH (lactate dehydrogenase), lymphocytes, neutrophils, and hs-CRP (high-sensitivity C-reactive protein) are five extremely potent characteristics that, when combined, can accurately predict mortality in 96% of cases. By combining XGBoost feature importance with neural network classification, the optimal approach can predict mortality with exceptional accuracy from infectious disease, along with achieving a precision rate of 90% up to 16 days before the event. The studies suggested model's excellent predictive performance and practicality were confirmed through testing with three instances that depended on the days to the outcome. By carefully analyzing and identifying patterns in these significant biomarkers insightful information has been obtained for simple application. This study offers potential remedies that could accelerate decision-making for targeted medical treatments within healthcare systems, utilizing a timely, accurate, and reliable method.
Collapse
Affiliation(s)
- Shivendra Dubey
- Computer Science and Engineering, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India
| | - Dinesh Kumar Verma
- Computer Science and Engineering, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India
| | - Mahesh Kumar
- Computer Science and Engineering, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India
| |
Collapse
|
4
|
Nawaz MS, Fournier-Viger P, Nawaz S, Zhu H, Yun U. SPM4GAC: SPM based approach for genome analysis and classification of macromolecules. Int J Biol Macromol 2024; 266:130984. [PMID: 38513910 DOI: 10.1016/j.ijbiomac.2024.130984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/16/2024] [Indexed: 03/23/2024]
Abstract
Genome sequence analysis and classification play critical roles in properly understanding an organism's main characteristics, functionalities, and changing (evolving) nature. However, the rapid expansion of genomic data makes genome sequence analysis and classification a challenging task due to the high computational requirements, proper management, and understanding of genomic data. Recently proposed models yielded promising results for the task of genome sequence classification. Nevertheless, these models often ignore the sequential nature of nucleotides, which is crucial for revealing their underlying structure and function. To address this limitation, we present SPM4GAC, a sequential pattern mining (SPM)-based framework to analyze and classify the macromolecule genome sequences of viruses. First, a large dataset containing the genome sequences of various RNA viruses is developed and transformed into a suitable format. On the transformed dataset, algorithms for SPM are used to identify frequent sequential patterns of nucleotide bases. The obtained frequent sequential patterns of bases are then used as features to classify different viruses. Ten classifiers are employed, and their performance is assessed by using several evaluation measures. Finally, a performance comparison of SPM4GAC with state-of-the-art methods for genome sequence classification/detection reveals that SPM4GAC performs better than those methods.
Collapse
Affiliation(s)
- M Saqib Nawaz
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China.
| | | | - Shoaib Nawaz
- Department of Pharmacy, The University of Lahore, Sargodha Campus, Pakistan.
| | - Haowei Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China.
| | - Unil Yun
- Sejong University, Seoul, Republic of Korea.
| |
Collapse
|
5
|
Ghosh A, Larrondo-Petrie MM, Pavlovic M. Revolutionizing Vaccine Development for COVID-19: A Review of AI-Based Approaches. INFORMATION 2023; 14:665. [DOI: 10.3390/info14120665] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025] Open
Abstract
The evolvement of COVID-19 vaccines is rapidly being revolutionized using artificial intelligence-based technologies. Small compounds, peptides, and epitopes are collected to develop new therapeutics. These substances can also guide artificial intelligence-based modeling, screening, or creation. Machine learning techniques are used to leverage pre-existing data for COVID-19 drug detection and vaccine advancement, while artificial intelligence-based models are used for these purposes. Models based on artificial intelligence are used to evaluate and recognize the best candidate targets for future therapeutic development. Artificial intelligence-based strategies can be used to address issues with the safety and efficacy of COVID-19 vaccine candidates, as well as issues with manufacturing, storage, and logistics. Because antigenic peptides are effective at eliciting immune responses, artificial intelligence algorithms can assist in identifying the most promising COVID-19 vaccine candidates. Following COVID-19 vaccination, the first phase of the vaccine-induced immune response occurs when major histocompatibility complex (MHC) class II molecules (typically bind peptides of 12–25 amino acids) recognize antigenic peptides. Therefore, AI-based models are used to identify the best COVID-19 vaccine candidates and ensure the efficacy and safety of vaccine-induced immune responses. This study explores the use of artificial intelligence-based approaches to address logistics, manufacturing, storage, safety, and effectiveness issues associated with several COVID-19 vaccine candidates. Additionally, we will evaluate potential targets for next-generation treatments and examine the role that artificial intelligence-based models can play in identifying the most promising COVID-19 vaccine candidates, while also considering the effectiveness of antigenic peptides in triggering immune responses. The aim of this project is to gain insights into how artificial intelligence-based approaches could revolutionize the development of COVID-19 vaccines and how they can be leveraged to address challenges associated with vaccine development. In this work, we highlight potential barriers and solutions and focus on recent improvements in using artificial intelligence to produce COVID-19 drugs and vaccines, as well as the prospects for intelligent training in COVID-19 treatment discovery.
Collapse
Affiliation(s)
- Aritra Ghosh
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Maria M. Larrondo-Petrie
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Mirjana Pavlovic
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
6
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
7
|
Okeibunor JC, Jaca A, Iwu-Jaja CJ, Idemili-Aronu N, Ba H, Zantsi ZP, Ndlambe AM, Mavundza E, Muneene D, Wiysonge CS, Makubalo L. The use of artificial intelligence for delivery of essential health services across WHO regions: a scoping review. Front Public Health 2023; 11:1102185. [PMID: 37469694 PMCID: PMC10352788 DOI: 10.3389/fpubh.2023.1102185] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 06/19/2023] [Indexed: 07/21/2023] Open
Abstract
Background Artificial intelligence (AI) is a broad outlet of computer science aimed at constructing machines capable of simulating and performing tasks usually done by human beings. The aim of this scoping review is to map existing evidence on the use of AI in the delivery of medical care. Methods We searched PubMed and Scopus in March 2022, screened identified records for eligibility, assessed full texts of potentially eligible publications, and extracted data from included studies in duplicate, resolving differences through discussion, arbitration, and consensus. We then conducted a narrative synthesis of extracted data. Results Several AI methods have been used to detect, diagnose, classify, manage, treat, and monitor the prognosis of various health issues. These AI models have been used in various health conditions, including communicable diseases, non-communicable diseases, and mental health. Conclusions Presently available evidence shows that AI models, predominantly deep learning, and machine learning, can significantly advance medical care delivery regarding the detection, diagnosis, management, and monitoring the prognosis of different illnesses.
Collapse
Affiliation(s)
| | - Anelisa Jaca
- Cochrane South Africa, South African Medical Research Council, Cape Town, South Africa
| | | | - Ngozi Idemili-Aronu
- Department of Sociology/Anthropology, University of Nigeria, Nsukka, Nigeria
| | - Housseynou Ba
- World Health Organization Regional Office for Africa, Brazzaville, Republic of Congo
| | - Zukiswa Pamela Zantsi
- Cochrane South Africa, South African Medical Research Council, Cape Town, South Africa
| | - Asiphe Mavis Ndlambe
- Cochrane South Africa, South African Medical Research Council, Cape Town, South Africa
| | - Edison Mavundza
- World Health Organization Regional Office for Africa, Brazzaville, Republic of Congo
| | | | - Charles Shey Wiysonge
- Cochrane South Africa, South African Medical Research Council, Cape Town, South Africa
- HIV and Other Infectious Diseases Research Unit, South African Medical Research Council, Durban, South Africa
| | - Lindiwe Makubalo
- World Health Organization Regional Office for Africa, Brazzaville, Republic of Congo
| |
Collapse
|
8
|
Nandhini K, Tamilpavai G. An Optimal Stacked ResNet-BiLSTM-Based Accurate Detection and Classification of Genetic Disorders. Neural Process Lett 2023:1-22. [PMID: 37359129 PMCID: PMC10196306 DOI: 10.1007/s11063-023-11195-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2023] [Indexed: 06/28/2023]
Abstract
Gene is located inside the nuclease and the genetic data is contained in deoxyribonucleic acid (DNA). A person's gene count ranges from 20,000 to 30,000. Even a minor alteration to the DNA sequence can be harmful if it affects the cell's fundamental functions. As a result, the gene begins to act abnormally. The sorts of genetic abnormalities brought on by mutation include chromosomal disorders, complex disorders, and single-gene disorders. Therefore, a detailed diagnosis method is required. Thus, we proposed an Elephant Herd Optimization-Whale Optimization Algorithm (EHO-WOA) optimized Stacked ResNet-Bidirectional Long Term Short Memory (ResNet-BiLSTM) model for detecting genetic disorders. Here, a hybrid EHO-WOA algorithm is presented to assess the Stacked ResNet-BiLSTM architecture's fitness. The ResNet-BiLSTM design uses the genotype and gene expression phenotype as input data. Furthermore, the proposed method identifies rare genetic disorders such as Angelman Syndrome, Rett Syndrome, and Prader-Willi Syndrome. It demonstrates the effectiveness of the developed model with greater accuracy, recall, specificity, precision, and f1-score. Thus, a wide range of DNA deficiencies including Prader-Willi syndrome, Marfan syndrome, Early Onset Morbid Obesity, Rett syndrome, and Angelman syndrome are predicted accurately.
Collapse
Affiliation(s)
- K. Nandhini
- Department of Computer Science and Engineering, Anna University, Chennai, India
| | - G. Tamilpavai
- Department of Computer Science and Engineering, Government College of Engineering, Tirunelveli, India
| |
Collapse
|
9
|
Nawaz MS, Fournier-Viger P, He Y, Zhang Q. PSAC-PDB: Analysis and classification of protein structures. Comput Biol Med 2023; 158:106814. [PMID: 36989742 DOI: 10.1016/j.compbiomed.2023.106814] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/09/2023] [Accepted: 03/20/2023] [Indexed: 03/29/2023]
Abstract
This paper presents a novel framework, called PSAC-PDB, for analyzing and classifying protein structures from the Protein Data Bank (PDB). PSAC-PDB first finds, analyze and identifies protein structures in PDB that are similar to a protein structure of interest using a protein structure comparison tool. Second, the amino acids (AA) sequences of identified protein structures (obtained from PDB), their aligned amino acids (AAA) and aligned secondary structure elements (ASSE) (obtained by structural alignment), and frequent AA (FAA) patterns (discovered by sequential pattern mining), are used for the reliable detection/classification of protein structures. Eleven classifiers are used and their performance is compared using six evaluation metrics. Results show that three classifiers perform well on overall, and that FAA patterns can be used to efficiently classify protein structures in place of providing the whole AA sequences, AAA or ASSE. Furthermore, better classification results are obtained using AAA of protein structures rather than AA sequences. PSAC-PDB also performed better than state-of-the-art approaches for SARS-CoV-2 genome sequences classification.
Collapse
|
10
|
A systematic review of artificial intelligence-based COVID-19 modeling on multimodal genetic information. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2023; 179:1-9. [PMID: 36809830 PMCID: PMC9938959 DOI: 10.1016/j.pbiomolbio.2023.02.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 02/07/2023] [Accepted: 02/12/2023] [Indexed: 02/21/2023]
Abstract
This study systematically reviews the Artificial Intelligence (AI) methods developed to resolve the critical process of COVID-19 gene data analysis, including diagnosis, prognosis, biomarker discovery, drug responsiveness, and vaccine efficacy. This systematic review follows the guidelines of Preferred Reporting for Systematic Reviews and Meta-Analyses (PRISMA). We searched PubMed, Embase, Web of Science, and Scopus databases to identify the relevant articles from January 2020 to June 2022. It includes the published studies of AI-based COVID-19 gene modeling extracted through relevant keyword searches in academic databases. This study included 48 articles discussing AI-based genetic studies for several objectives. Ten articles confer about the COVID-19 gene modeling with computational tools, and five articles evaluated ML-based diagnosis with observed accuracy of 97% on SARS-CoV-2 classification. Gene-based prognosis study reviewed three articles and found host biomarkers detecting COVID-19 progression with 90% accuracy. Twelve manuscripts reviewed the prediction models with various genome analysis studies, nine articles examined the gene-based in silico drug discovery, and another nine investigated the AI-based vaccine development models. This study compiled the novel coronavirus gene biomarkers and targeted drugs identified through ML approaches from published clinical studies. This review provided sufficient evidence to delineate the potential of AI in analyzing complex gene information for COVID-19 modeling on multiple aspects like diagnosis, drug discovery, and disease dynamics. AI models entrenched a substantial positive impact by enhancing the efficiency of the healthcare system during the COVID-19 pandemic.
Collapse
|
11
|
Protecting Sensitive Data in the Information Age: State of the Art and Future Prospects. FUTURE INTERNET 2022. [DOI: 10.3390/fi14110302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The present information age is characterized by an ever-increasing digitalization. Smart devices quantify our entire lives. These collected data provide the foundation for data-driven services called smart services. They are able to adapt to a given context and thus tailor their functionalities to the user’s needs. It is therefore not surprising that their main resource, namely data, is nowadays a valuable commodity that can also be traded. However, this trend does not only have positive sides, as the gathered data reveal a lot of information about various data subjects. To prevent uncontrolled insights into private or confidential matters, data protection laws restrict the processing of sensitive data. One key factor in this regard is user-friendly privacy mechanisms. In this paper, we therefore assess current state-of-the-art privacy mechanisms. To this end, we initially identify forms of data processing applied by smart services. We then discuss privacy mechanisms suited for these use cases. Our findings reveal that current state-of-the-art privacy mechanisms provide good protection in principle, but there is no compelling one-size-fits-all privacy approach. This leads to further questions regarding the practicality of these mechanisms, which we present in the form of seven thought-provoking propositions.
Collapse
|
12
|
Hybrid CNN-LSTM and modified wild horse herd Model-based prediction of genome sequences for genetic disorders. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
13
|
Ahmad M, Ahmed I, Jeon G. A sustainable advanced artificial intelligence-based framework for analysis of COVID-19 spread. ENVIRONMENT, DEVELOPMENT AND SUSTAINABILITY 2022:1-16. [PMID: 35993085 PMCID: PMC9379242 DOI: 10.1007/s10668-022-02584-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Accepted: 04/21/2022] [Indexed: 06/15/2023]
Abstract
The idea of sustainability aims to provide a protected operating environment that supports without risking the capacity of coming generations and to satisfy their demands in the future. With the advent of artificial intelligence, big data, and the Internet of Things, there is a tremendous paradigm transformation in how environmental data are managed and handled for sustainable applications in smart cities and societies. The ongoing COVID-19 (Coronavirus Disease) pandemic maintains a mortifying impact on the world population's health. A continuous rise in the number of positive cases produced much stress on governing organizations worldwide, and they are finding it challenging to handle the situation. Artificial Intelligence methods can be extended quite efficiently to monitor the disease, predict the pandemic's growth, and outline policies and strategies to control its transmission or spread. The combination of healthcare, along with big data, and machine learning methods, can improve the quality of life by providing better care services and creating cost-effective systems. Researchers have been using these techniques to fight against the COVID-19 pandemic. This paper emphasizes on the analysis of different factors and symptoms and presents a sustainable framework to predict and detect COVID-19. Firstly, we have collected a data set having different symptoms information of COVID-19. Then, we have explored various machine learning algorithms or methods: including Logistic Regression, Naive Bayes, Decision Tree, Random Forest Classifier, Extreme Gradient Boost, K-Nearest Neighbour, and Support Vector Machine to predict and detect COVID-19 lab results, using different symptoms information. The model might help to predict and detect the long-term spread of a pandemic and implement advanced proactive measures. The findings show that the Logistic Regression and Support Vector Machine outperformed from other machine learning algorithms in terms of accuracy; algorithms exhibit 97.66% and 98% results, respectively.
Collapse
Affiliation(s)
- Misbah Ahmad
- Center of Excellence in Information Technology, Institute of Management Sciences, 1-A, Sector E-5, Phase VII, Peshawar, Hayatabad Pakistan
| | - Imran Ahmed
- School of Computing and Information Science, Anglia Ruskin University, Cambridge East Road, Cambridge, CB1 1PT UK
| | - Gwanggil Jeon
- Department of Embedded Systems Engineering, Incheon National University, Incheon, Korea
| |
Collapse
|
14
|
Harikrishnan NB, Pranay SY, Nagaraj N. Classification of SARS-CoV-2 viral genome sequences using Neurochaos Learning. Med Biol Eng Comput 2022; 60:2245-2255. [PMID: 35668230 PMCID: PMC9170350 DOI: 10.1007/s11517-022-02591-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 04/28/2022] [Indexed: 12/01/2022]
Abstract
Abstract The high spread rate of SARS-CoV-2 virus has put the researchers all over the world in a demanding situation. The need of the hour is to develop novel learning algorithms that can effectively learn a general pattern by training with fewer genome sequences of coronavirus. Learning from very few training samples is necessary and important during the beginning of a disease outbreak when sequencing data is limited. This is because a successful detection and isolation of patients can curb the spread of the virus. However, this poses a huge challenge for machine learning and deep learning algorithms as they require huge amounts of training data to learn the pattern and distinguish from other closely related viruses. In this paper, we propose a new paradigm – Neurochaos Learning (NL) for classification of coronavirus genome sequence that addresses this specific problem. NL is inspired from the empirical evidence of chaos and non-linearity at the level of neurons in biological neural networks. The average sensitivity, specificity and accuracy for NL are 0.998, 0.999 and 0.998 respectively for the multiclass classification problem (SARS-CoV-2, Coronaviridae, Metapneumovirus, Rhinovirus and Influenza) using leave one out crossvalidation. With just one training sample per class for 1000 independent random trials of training, we report an average macro F1-score \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$> 0.99$$\end{document}>0.99 for the classification of SARS-CoV-2 from SARS-CoV-1 genome sequences. We compare the performance of NL with K-nearest neighbours (KNN), logistic regression, random forest, SVM, and naïve Bayes classifiers. We foresee promising future applications in genome classification using NL with novel combinations of chaotic feature engineering and other machine learning algorithms. Graphical abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1007/s11517-022-02591-3.
Collapse
Affiliation(s)
- N. B. Harikrishnan
- The University of Trans-Disciplinary Health Sciences and Technology, Bengaluru, 560064 Karnataka India
- Consciousness Studies Programme, National Institute of Advanced Studies, Indian Institute of Science Campus, Bengaluru, 560012 Karnataka India
| | - S. Y. Pranay
- Consciousness Studies Programme, National Institute of Advanced Studies, Indian Institute of Science Campus, Bengaluru, 560012 Karnataka India
| | - Nithin Nagaraj
- Consciousness Studies Programme, National Institute of Advanced Studies, Indian Institute of Science Campus, Bengaluru, 560012 Karnataka India
| |
Collapse
|