1
|
Ismail E, Gad W, Hashem M. A hybrid Stacking-SMOTE model for optimizing the prediction of autistic genes. BMC Bioinformatics 2023; 24:379. [PMID: 37803253 PMCID: PMC10559615 DOI: 10.1186/s12859-023-05501-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 09/27/2023] [Indexed: 10/08/2023] Open
Abstract
PURPOSE Autism spectrum disorder(ASD) is a disease associated with the neurodevelopment of the brain. The autism spectrum can be observed in early childhood, where the symptoms of the disease usually appear in children within the first year of their life. Currently, ASD can only be diagnosed based on the apparent symptoms due to the lack of information on genes related to the disease. Therefore, in this paper, we need to predict the largest number of disease-causing genes for a better diagnosis. METHODS A hybrid stacking ensemble model with Synthetic Minority Oversampling TEchnique (Stack-SMOTE) is proposed to predict the genes associated with ASD. The proposed model uses the gene ontology database to measure the similarities between the genes using a hybrid gene similarity function(HGS). HGS is effective in measuring the similarity as it combines the features of information gain-based methods and graph-based methods. The proposed model solves the imbalanced ASD dataset problem using the Synthetic Minority Oversampling Technique (SMOTE), which generates synthetic data rather than duplicates the data to reduce the overfitting. Sequentially, a gradient boosting-based random forest classifier (GBBRF) is introduced as a new combination technique to enhance the prediction of ASD genes. Moreover, the GBBRF classifier combined with random forest(RF), k-nearest neighbor, support vector machine(SVM), and logistic regression(LR) to form the proposed Stacking-SMOTE model to optimize the prediction of ASD genes. RESULTS The proposed Stacking-SMOTE model is evaluated using the Simons Foundation Autism Research Initiative (SFARI) gene database and a set of candidates ASD genes.The results of the proposed model-based SMOTE outperform other reported undersampling and oversampling techniques. Sequentially, the results of GBBRF achieve higher accuracy than using the basic classifiers. Moreover, the experimental results show that the proposed Stacking-SMOTE model outperforms the existing ASD prediction models with approximately 95.5% accuracy. CONCLUSION The proposed Stacking-SMOTE model demonstrates that SMOTE is effective in handling the autism imbalanced data. Sequentially, the integration between the gradient boosting and random forest classifier (GBBRF) support to build a robust stacking ensemble model(Stacking-SMOTE).
Collapse
Affiliation(s)
- Eman Ismail
- Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Walaa Gad
- Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Mohamed Hashem
- Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| |
Collapse
|
2
|
Fan Y, Xiong H, Sun G. DeepASDPred: a CNN-LSTM-based deep learning method for Autism spectrum disorders risk RNA identification. BMC Bioinformatics 2023; 24:261. [PMID: 37349705 DOI: 10.1186/s12859-023-05378-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 06/06/2023] [Indexed: 06/24/2023] Open
Abstract
BACKGROUND Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders characterized by difficulty communicating with society and others, behavioral difficulties, and a brain that processes information differently than normal. Genetics has a strong impact on ASD associated with early onset and distinctive signs. Currently, all known ASD risk genes are able to encode proteins, and some de novo mutations disrupting protein-coding genes have been demonstrated to cause ASD. Next-generation sequencing technology enables high-throughput identification of ASD risk RNAs. However, these efforts are time-consuming and expensive, so an efficient computational model for ASD risk gene prediction is necessary. RESULTS In this study, we propose DeepASDPerd, a predictor for ASD risk RNA based on deep learning. Firstly, we use K-mer to feature encode the RNA transcript sequences, and then fuse them with corresponding gene expression values to construct a feature matrix. After combining chi-square test and logistic regression to select the best feature subset, we input them into a binary classification prediction model constructed by convolutional neural network and long short-term memory for training and classification. The results of the tenfold cross-validation proved our method outperformed the state-of-the-art methods. Dataset and source code are available at https://github.com/Onebear-X/DeepASDPred is freely available. CONCLUSIONS Our experimental results show that DeepASDPred has outstanding performance in identifying ASD risk RNA genes.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| | - Hui Xiong
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| | - Guicong Sun
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.
| |
Collapse
|
3
|
Ismail E, Gad W, Hashem M. HEC-ASD: a hybrid ensemble-based classification model for predicting autism spectrum disorder disease genes. BMC Bioinformatics 2022; 23:554. [PMID: 36544099 PMCID: PMC9768984 DOI: 10.1186/s12859-022-05099-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
PURPOSE Autism spectrum disorder (ASD) is the most prevalent disease today. The causes of its infection may be attributed to genetic causes by 80% and environmental causes by 20%. In spite of this, the majority of the current research is concerned with environmental causes, and the least proportion with the genetic causes of the disease. Autism is a complex disease, which makes it difficult to identify the genes that cause the disease. METHODS Hybrid ensemble-based classification (HEC-ASD) model for predicting ASD genes using gradient boosting machines is proposed. The proposed model utilizes gene ontology (GO) to construct a gene functional similarity matrix using hybrid gene similarity (HGS) method. HGS measures the semantic similarity between genes effectively. It combines the graph-based method, such as Wang method with the number of directed children's nodes of gene term from GO. Moreover, an ensemble gradient boosting classifier is adapted to enhance the prediction of genes forming a robust classification model. RESULTS The proposed model is evaluated using the Simons Foundation Autism Research Initiative (SFARI) gene database. The experimental results are promising as they improve the classification performance for predicting ASD genes. The results are compared with other approaches that used gene regulatory network (GRN), protein to protein interaction network (PPI), or GO. The HEC-ASD model reaches the highest prediction accuracy of 0.88% using ensemble learning classifiers. CONCLUSION The proposed model demonstrates that ensemble learning technique using gradient boosting is effective in predicting autism spectrum disorder genes. Moreover, the HEC-ASD model utilized GO rather than using PPI network and GRN.
Collapse
Affiliation(s)
- Eman Ismail
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Walaa Gad
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Mohamed Hashem
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| |
Collapse
|
4
|
Gagliano A, Murgia F, Capodiferro AM, Tanca MG, Hendren A, Falqui SG, Aresti M, Comini M, Carucci S, Cocco E, Lorefice L, Roccella M, Vetri L, Sotgiu S, Zuddas A, Atzori L. 1H-NMR-Based Metabolomics in Autism Spectrum Disorder and Pediatric Acute-Onset Neuropsychiatric Syndrome. J Clin Med 2022; 11:6493. [PMID: 36362721 PMCID: PMC9658067 DOI: 10.3390/jcm11216493] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 10/24/2022] [Accepted: 10/27/2022] [Indexed: 11/03/2023] Open
Abstract
We recently described a unique plasma metabolite profile in subjects with pediatric acute-onset neuropsychiatric syndrome (PANS), suggesting pathogenic models involving specific patterns of neurotransmission, neuroinflammation, and oxidative stress. Here, we extend the analysis to a group of patients with autism spectrum disorder (ASD), as a consensus has recently emerged around its immune-mediated pathophysiology with a widespread involvement of brain networks. This observational case-control study enrolled patients referred for PANS and ASD from June 2019 to May 2020, as well as neurotypical age and gender-matched control subjects. Thirty-four PANS outpatients, fifteen ASD outpatients, and twenty-five neurotypical subjects underwent physical and neuropsychiatric evaluations, alongside serum metabolomic analysis with 1H-NMR. In supervised models, the metabolomic profile of ASD was significantly different from controls (p = 0.0001), with skewed concentrations of asparagine, aspartate, betaine, glycine, lactate, glucose, and pyruvate. Metabolomic separation was also observed between PANS and ASD subjects (p = 0.02), with differences in the concentrations of arginine, aspartate, betaine, choline, creatine phosphate, glycine, pyruvate, and tryptophan. We confirmed a unique serum metabolomic profile of PANS compared with both ASD and neurotypical subjects, distinguishing PANS as a pathophysiological entity per se. Tryptophan and glycine appear as neuroinflammatory fingerprints of PANS and ASD, respectively. In particular, a reduction in glycine would primarily affect NMDA-R excitatory tone, overall impairing downstream glutamatergic, dopaminergic, and GABAergic transmissions. Nonetheless, we found metabolomic similarities between PANS and ASD that suggest a putative role of N-methyl-D-aspartate receptor (NMDA-R) dysfunction in both disorders. Metabolomics-based approaches could contribute to the identification of novel ASD and PANS biomarkers.
Collapse
Affiliation(s)
- Antonella Gagliano
- Child & Adolescent Neuropsychiatry Unit, Department of Biomedical Sciences, “A. Cao” Paediatric Hospital, University of Cagliari, 09121 Cagliari, Italy
- Department of Health Science, “Magna Graecia” University of Catanzaro, 88100 Catanzaro, Italy
| | - Federica Murgia
- Clinical Metabolomics Unit, Department of Biomedical Sciences, University of Cagliari, 09042 Cagliari, Italy
| | - Agata Maria Capodiferro
- Child & Adolescent Neuropsychiatry Unit, Department of Biomedical Sciences, “A. Cao” Paediatric Hospital, University of Cagliari, 09121 Cagliari, Italy
| | - Marcello Giuseppe Tanca
- Child & Adolescent Neuropsychiatry Unit, Department of Biomedical Sciences, “A. Cao” Paediatric Hospital, University of Cagliari, 09121 Cagliari, Italy
| | - Aran Hendren
- Faculty of Health and Medical Sciences, University of Surrey, Guildford GU2 7XH, UK
| | - Stella Giulia Falqui
- Child & Adolescent Neuropsychiatry Unit, Department of Biomedical Sciences, “A. Cao” Paediatric Hospital, University of Cagliari, 09121 Cagliari, Italy
| | - Michela Aresti
- Child & Adolescent Neuropsychiatry Unit, Department of Biomedical Sciences, “A. Cao” Paediatric Hospital, University of Cagliari, 09121 Cagliari, Italy
| | - Martina Comini
- Child & Adolescent Neuropsychiatry Unit, Department of Biomedical Sciences, “A. Cao” Paediatric Hospital, University of Cagliari, 09121 Cagliari, Italy
| | - Sara Carucci
- Child & Adolescent Neuropsychiatry Unit, Department of Biomedical Sciences, “A. Cao” Paediatric Hospital, University of Cagliari, 09121 Cagliari, Italy
| | - Eleonora Cocco
- Multiple Sclerosis Regional Center, ASSL Cagliari, Department of Medical Sciences and Public Health, University of Cagliari, 09126 Cagliari, Italy
| | - Lorena Lorefice
- Multiple Sclerosis Regional Center, ASSL Cagliari, 09126 Cagliari, Italy
| | - Michele Roccella
- Department of Psychology, Educational Science and Human Movement, University of Palermo, 90128 Palermo, Italy
| | - Luigi Vetri
- Oasi Research Institute-IRCCS, Via Conte Ruggero 73, 94018 Troina, Italy
| | - Stefano Sotgiu
- Child Neuropsychiatry Unit, Department of Medicine, Surgery and Farmacy, University of Sassari, 07100 Sassari, Italy
| | - Alessandro Zuddas
- Child & Adolescent Neuropsychiatry Unit, Department of Biomedical Sciences, “A. Cao” Paediatric Hospital, University of Cagliari, 09121 Cagliari, Italy
| | - Luigi Atzori
- Clinical Metabolomics Unit, Department of Biomedical Sciences, University of Cagliari, 09042 Cagliari, Italy
| |
Collapse
|
5
|
Joudar SS, Albahri AS, Hamid RA. Triage and priority-based healthcare diagnosis using artificial intelligence for autism spectrum disorder and gene contribution: A systematic review. Comput Biol Med 2022; 146:105553. [PMID: 35561591 DOI: 10.1016/j.compbiomed.2022.105553] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 04/03/2022] [Accepted: 04/20/2022] [Indexed: 11/03/2022]
Abstract
The exact nature, harmful effects and aetiology of autism spectrum disorder (ASD) have caused widespread confusion. Artificial intelligence (AI) science helps solve challenging diagnostic problems in the medical field through extensive experiments. Disease severity is closely related to triage decisions and prioritisation contexts in medicine because both have been widely used to diagnose various diseases via AI, machine learning and automated decision-making techniques. Recently, taking advantage of high-performance AI algorithms has achieved accessible success in diagnosing and predicting risks from clinical and biological data. In contrast, less progress has been made with ASD because of obscure reasons. According to academic literature, ASD diagnosis works from a specific perspective, and much of the confusion arises from the fact that how AI techniques are currently integrated with the diagnosis of ASD concerning the triage and priority strategies and gene contributions. To this end, this study sought to describe a systematic review of the literature to assess the respective AI methods using the available datasets, highlight the tools and strategies used for diagnosing ASD and investigate how AI trends contribute in distinguishing triage and priority for ASD and gene contributions. Accordingly, this study checked the Science Direct, IEEE Xplore Digital Library, Web of Science (WoS), PubMed, and Scopus databases. A set of 363 articles from 2017 to 2022 is collected to reveal a clear picture and a better understanding of all the academic literature through a final set of 18 articles. The retrieved articles were filtered according to the defined inclusion and exclusion criteria and classified into three categories. The first category includes 'Triage patients based on diagnosis methods' which accounts for 16.66% (n = 3/18). The second category includes 'Prioritisation for Risky Genes' which accounts for 66.6% (n = 12/18) and is classified into two subcategories: 'Mutations observation based', 'Biomarkers and toxic chemical observations'. The third category includes 'E-triage using telehealth' which accounts for 16.66% (n = 3/18). This multidisciplinary systematic review revealed the taxonomy, motivations, recommendations and challenges of ASD research that need synergistic attention. Thus, this systematic review performs a comprehensive science mapping analysis and discusses the open issues that help perform and improve the recommended solution of ASD research direction. In addition, this study critically reviews the literature and attempts to address the current research gaps in knowledge and highlights weaknesses that require further research. Finally, a new developed methodology has been suggested as future work for triaging and prioritising ASD patients according to their severity levels by using decision-making techniques.
Collapse
Affiliation(s)
- Shahad Sabbar Joudar
- Informatics Institute for Postgraduate Studies (IIPS), Iraqi Commission for Computers and Informatics (ICCI), Baghdad, Iraq; University of Technology, Baghdad, Iraq
| | - A S Albahri
- Informatics Institute for Postgraduate Studies (IIPS), Iraqi Commission for Computers and Informatics (ICCI), Baghdad, Iraq.
| | - Rula A Hamid
- Informatics Institute for Postgraduate Studies (IIPS), Iraqi Commission for Computers and Informatics (ICCI), Baghdad, Iraq; College of Business Informatics, University of Information Technology and Communications (UOITC), Baghdad, Iraq
| |
Collapse
|