1
|
Karim T, Shaon MSH, Sultan MF, Hasan MZ, Kafy AA. ANNprob-ACPs: A novel anticancer peptide identifier based on probabilistic feature fusion approach. Comput Biol Med 2024; 169:107915. [PMID: 38171261 DOI: 10.1016/j.compbiomed.2023.107915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 01/05/2024]
Abstract
Anticancer Peptides (ACPs) offer significant potential as cancer treatment drugs in this modern era. Quickly identifying active compounds from protein sequences is crucial for healthcare and cancer treatment. In this paper ANNprob-ACPs, a novel and effective model for detecting ACPs has been implemented based on nine feature encoding techniques, including AAC, CC, W2V, DPC, PAAC, QSO, CTDC, CTDT, and CKSAAGP. After analyzing the performance of several machine learning models, the six best models were selected based on their overall performances in every evaluation metric. The probability scores of each model were subsequently aggregated and used as input of our meta- model, called ANNprob-ACPs. Our model outperformed all others and its potential to lead to phenomenal identification of ACPs. The results of this study showed notable improvement in 10-fold cross-validation and independent test, with accuracy of 93.72% and 90.62%, respectively. Our proposed model, ANNprob-ACPs outperformed existing approaches in terms of accuracy and effectiveness in discovering ACPs. By using SHAP, this study obtained the physicochemical properties of QSO, and compositional properties of DPC, AAC, and PAAC are more impactful for our model's performances, which have a major impact on a drug's interactions and future discoveries. Consequently, this model is crucial for the future and has a high probability of detecting ACPs more frequently. We developed a web server of ANNprob-ACPs, which is accessible at ANNprob-ACPs webserver.
Collapse
Affiliation(s)
- Tasmin Karim
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Md Shazzad Hossain Shaon
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Md Fahim Sultan
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Md Zahid Hasan
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Abdulla-Al Kafy
- Department of Urban & Regional Planning, Rajshahi University of Engineering & Technology (RUET), Rajshahi, 6204, Bangladesh.
| |
Collapse
|
2
|
Chong JWR, Tang DYY, Leong HY, Khoo KS, Show PL, Chew KW. Bridging artificial intelligence and fucoxanthin for the recovery and quantification from microalgae. Bioengineered 2023; 14:2244232. [PMID: 37578162 PMCID: PMC10431731 DOI: 10.1080/21655979.2023.2244232] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/30/2023] [Accepted: 07/31/2023] [Indexed: 08/15/2023] Open
Abstract
Fucoxanthin is a carotenoid that possesses various beneficial medicinal properties for human well-being. However, the current extraction technologies and quantification techniques are still lacking in terms of cost validation, high energy consumption, long extraction time, and low yield production. To date, artificial intelligence (AI) models can assist and improvise the bottleneck of fucoxanthin extraction and quantification process by establishing new technologies and processes which involve big data, digitalization, and automation for efficiency fucoxanthin production. This review highlights the application of AI models such as artificial neural network (ANN) and adaptive neuro fuzzy inference system (ANFIS), capable of learning patterns and relationships from large datasets, capturing non-linearity, and predicting optimal conditions that significantly impact the fucoxanthin extraction yield. On top of that, combining metaheuristic algorithm such as genetic algorithm (GA) can further improve the parameter space and discovery of optimal conditions of ANN and ANFIS models, which results in high R2 accuracy ranging from 98.28% to 99.60% after optimization. Besides, AI models such as support vector machine (SVM), convolutional neural networks (CNNs), and ANN have been leveraged for the quantification of fucoxanthin, either computer vision based on color space of images or regression analysis based on statistical data. The findings are reliable when modeling for the concentration of pigments with high R2 accuracy ranging from 66.0% - 99.2%. This review paper has reviewed the feasibility and potential of AI for the extraction and quantification purposes, which can reduce the cost, accelerate the fucoxanthin yields, and development of fucoxanthin-based products.
Collapse
Affiliation(s)
- Jun Wei Roy Chong
- Department of Chemical and Environmental Engineering, Faculty of Science and Engineering, University of Nottingham Malaysia, Jalan Broga, Semenyih, Selangor Darul Ehsan, Malaysia
| | - Doris Ying Ying Tang
- Department of Chemical and Environmental Engineering, Faculty of Science and Engineering, University of Nottingham Malaysia, Jalan Broga, Semenyih, Selangor Darul Ehsan, Malaysia
| | - Hui Yi Leong
- ISCO (Nanjing) Biotech-Company, Nanjing, Jiangning, China
| | - Kuan Shiong Khoo
- Department of Chemical Engineering and Materials Science, Yuan Ze University, Taoyuan, Taiwan
- Faculty of Allied Health Sciences, Chettinad Hospital and Research Institute, Chettinad Academy of Research and Education, Kelambakkam, Tamil Nadu, India
| | - Pau Loke Show
- Department of Chemical Engineering, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Kit Wayne Chew
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
3
|
Monroe LK, Truong DP, Miner JC, Adikari SH, Sasiene ZJ, Fenimore PW, Alexandrov B, Williams RF, Nguyen HB. Conotoxin Prediction: New Features to Increase Prediction Accuracy. Toxins (Basel) 2023; 15:641. [PMID: 37999504 PMCID: PMC10675404 DOI: 10.3390/toxins15110641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 10/27/2023] [Accepted: 10/30/2023] [Indexed: 11/25/2023] Open
Abstract
Conotoxins are toxic, disulfide-bond-rich peptides from cone snail venom that target a wide range of receptors and ion channels with multiple pathophysiological effects. Conotoxins have extraordinary potential for medical therapeutics that include cancer, microbial infections, epilepsy, autoimmune diseases, neurological conditions, and cardiovascular disorders. Despite the potential for these compounds in novel therapeutic treatment development, the process of identifying and characterizing the toxicities of conotoxins is difficult, costly, and time-consuming. This challenge requires a series of diverse, complex, and labor-intensive biological, toxicological, and analytical techniques for effective characterization. While recent attempts, using machine learning based solely on primary amino acid sequences to predict biological toxins (e.g., conotoxins and animal venoms), have improved toxin identification, these methods are limited due to peptide conformational flexibility and the high frequency of cysteines present in toxin sequences. This results in an enumerable set of disulfide-bridged foldamers with different conformations of the same primary amino acid sequence that affect function and toxicity levels. Consequently, a given peptide may be toxic when its cysteine residues form a particular disulfide-bond pattern, while alternative bonding patterns (isoforms) or its reduced form (free cysteines with no disulfide bridges) may have little or no toxicological effects. Similarly, the same disulfide-bond pattern may be possible for other peptide sequences and result in different conformations that all exhibit varying toxicities to the same receptor or to different receptors. We present here new features, when combined with primary sequence features to train machine learning algorithms to predict conotoxins, that significantly increase prediction accuracy.
Collapse
Affiliation(s)
- Lyman K. Monroe
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Duc P. Truong
- Theoretical Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Jacob C. Miner
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Samantha H. Adikari
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Zachary J. Sasiene
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Paul W. Fenimore
- Theoretical Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Boian Alexandrov
- Theoretical Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Robert F. Williams
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Hau B. Nguyen
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|
4
|
Meng C, Pei Y, Bu Y, Zou Q, Ju Y. Machine learning-based antioxidant protein identification model: Progress and evaluation. J Cell Biochem 2023; 124:1825-1834. [PMID: 37877550 DOI: 10.1002/jcb.30491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 09/30/2023] [Accepted: 10/06/2023] [Indexed: 10/26/2023]
Abstract
Efficient and accurate identification of antioxidant proteins is of great significance. In recent years, many models for identifying antioxidant proteins have been proposed, but the low sensitivity and high dimensionality of the models are common problems. The generalization ability of the model needs to be improved. Researchers have tried different feature extraction algorithms and feature selection algorithms to obtain the most effective feature combination and have chosen more appropriate classification algorithms and tools to improve model performance. In this article, we systematically reviewed the data set of the most frequently used antioxidant proteins and the method selection for each step of model establishment and discussed the characteristics of each method. We have conducted a detailed analysis of recent research and believe that the practical ability and efficiency of model application can be improved by reducing model dimensions. The key to improving the performance of antioxidant protein recognition models in the future may lie in feature selection, so this paper also focuses on the combination of feature extraction and selection steps in the analysis of the model building process.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, Hohhot, China
| | - Yue Pei
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Yongbo Bu
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
5
|
Huang C, Zhu F, Zhang H, Wang N, Huang Q. Identification of S1PR4 as an immune modulator for favorable prognosis in HNSCC through machine learning. iScience 2023; 26:107693. [PMID: 37680482 PMCID: PMC10480314 DOI: 10.1016/j.isci.2023.107693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/25/2023] [Accepted: 08/17/2023] [Indexed: 09/09/2023] Open
Abstract
G protein-coupled receptors (GPCRs) are the largest family of membrane proteins and play a critical role as pharmacological targets. An improved understanding of GPCRs' involvement in tumor microenvironment may provide new perspectives for cancer therapy. This study used machine learning to classify head and neck squamous cell carcinoma (HNSCC) patients into two GPCR-based subtypes. Notably, these subtypes showed significant differences in prognosis, gene expression, and immune microenvironment, particularly CD8+ T cell infiltration. S1PR4 emerged as a key regulator distinguishing the subtypes, positively correlated with CD8+ T cell proportion and cytotoxicity in HNSCC. It was predominantly expressed in CX3CR1+CD8+ T cells among T cells. Upregulation of S1PR4 enhanced T cell function during CAR-T cell therapy, suggesting its potential in cancer immunotherapy. These findings highlight S1PR4 as an immune modulator for favorable prognosis in HNSCC, and offer a potential GPCR-targeted therapeutic option for HNSCC treatment.
Collapse
Affiliation(s)
- Chenshen Huang
- Department of Gastrointestinal Surgery, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of General Surgery, Tongji Hospital, Tongji University School of Medicine, Shanghai, China
| | - Fengshuo Zhu
- Department of Oral Maxillofacial-Head and Neck Oncology, Shanghai Ninth People’s Hospital, College of Stomatology, Shanghai, China
- Jiao Tong University School of Medicine, National Clinical Research Center for Oral Disease, Shanghai Key Laboratory of Stomatology and Shanghai Research Institute of Stomatology, Shanghai, China
| | - Hao Zhang
- Department of Neurosurgery, The Second Affiliated Hospital, Chongqing Medical University, Chongqing, China
| | - Ning Wang
- Department of Hepatobiliary and Pancreatic Surgery, Huzhou Central Hospital, Affiliated Hospital of Zhejiang University, Huzhou, China
| | - Qi Huang
- Department of General Surgery, Tongji Hospital, Tongji University School of Medicine, Shanghai, China
| |
Collapse
|
6
|
Yan Y, Shi Z, Wei H. ROSes-FINDER: a multi-task deep learning framework for accurate prediction of microorganism reactive oxygen species scavenging enzymes. Front Microbiol 2023; 14:1245805. [PMID: 37744924 PMCID: PMC10513406 DOI: 10.3389/fmicb.2023.1245805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 08/21/2023] [Indexed: 09/26/2023] Open
Abstract
Reactive oxygen species (ROS) are highly reactive molecules that play important roles in microbial biological processes. However, excessive accumulation of ROS can lead to oxidative stress and cellular damage. Microorganism have evolved a diverse suite of enzymes to mitigate the harmful effects of ROS. Accurate prediction of ROS scavenging enzymes classes (ROSes) is crucial for understanding the mechanisms of oxidative stress and developing strategies to combat related diseases. Nevertheless, the existing approaches for categorizing ROS-related proteins exhibit certain drawbacks with regards to their precision and inclusiveness. To address this, we propose a new multi-task deep learning framework called ROSes-FINDER. This framework integrates three component methods using a voting-based approach to predict multiple ROSes properties simultaneously. It can identify whether a given protein sequence is a ROSes and determine its type. The three component methods used in the framework are ROSes-CNN, which extracts raw sequence encoding features, ROSes-NN, which predicts protein functions based on sequence information, and ROSes-XGBoost, which performs functional classification using ensemble machine learning. Comprehensive experiments demonstrate the superior performance and robustness of our method. ROSes-FINDER is freely available at https://github.com/alienn233/ROSes-Finder for predicting ROSes classes.
Collapse
Affiliation(s)
- Yueyang Yan
- College of Veterinary Medicine, Jilin University, Changchun, China
| | - Zhanpeng Shi
- College of Veterinary Medicine, Jilin University, Changchun, China
| | - Haijian Wei
- Department of Organ Transplantation, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai City, China
| |
Collapse
|
7
|
Jian C, Chen S, Wang Z, Zhou Y, Zhang Y, Li Z, Jian J, Wang T, Xiang T, Wang X, Jia Y, Wang H, Gong J. Predicting delayed methotrexate elimination in pediatric acute lymphoblastic leukemia patients: an innovative web-based machine learning tool developed through a multicenter, retrospective analysis. BMC Med Inform Decis Mak 2023; 23:148. [PMID: 37537590 PMCID: PMC10398990 DOI: 10.1186/s12911-023-02248-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 07/26/2023] [Indexed: 08/05/2023] Open
Abstract
BACKGROUND High-dose methotrexate (HD-MTX) is a potent chemotherapeutic agent used to treat pediatric acute lymphoblastic leukemia (ALL). HD-MTX is known for cause delayed elimination and drug-related adverse events. Therefore, close monitoring of delayed MTX elimination in ALL patients is essential. OBJECTIVE This study aimed to identify the risk factors associated with delayed MTX elimination and to develop a predictive tool for its occurrence. METHODS Patients who received MTX chemotherapy during hospitalization were selected for inclusion in our study. Univariate and least absolute shrinkage and selection operator (LASSO) methods were used to screen for relevant features. Then four machine learning (ML) algorithms were used to construct prediction model in different sampling method. Furthermore, the performance of the model was evaluated using several indicators. Finally, the optimal model was deployed on a web page to create a visual prediction tool. RESULTS The study included 329 patients with delayed MTX elimination and 1400 patients without delayed MTX elimination who met the inclusion criteria. Univariate and LASSO regression analysis identified eleven predictors, including age, weight, creatinine, uric acid, total bilirubin, albumin, white blood cell count, hemoglobin, prothrombin time, immunological classification, and co-medication with omeprazole. The XGBoost algorithm with SMOTE exhibited AUROC of 0.897, AUPR of 0.729, sensitivity of 0.808, specificity of 0.847, outperforming the other models. And had AUROC of 0.788 in external validation. CONCLUSION The XGBoost algorithm provides superior performance in predicting the delayed elimination of MTX. We have created a prediction tool to assist medical professionals in predicting MTX metabolic delay.
Collapse
Affiliation(s)
- Chang Jian
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Siqi Chen
- College of Pharmacy, Chongqing Medical University, Chongqing, China
| | - Zhuangcheng Wang
- Big Data Engineering Center, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Yang Zhou
- Department of Medicine, Affiliated Hospital of Nantong University, Jiangsu, China
| | - Yang Zhang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Ziyu Li
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Jie Jian
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Tingting Wang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Tianyu Xiang
- Department of Pharmacy, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Xiao Wang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Yuntao Jia
- Department of Pharmacy, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Huilai Wang
- Department of Information Center, University-Town Hospital of Chongqing Medical University, Chongqing, China.
| | - Jun Gong
- Department of Information Center, University-Town Hospital of Chongqing Medical University, Chongqing, China.
| |
Collapse
|
8
|
Yang C. Prediction of hearing preservation after acoustic neuroma surgery based on SMOTE-XGBoost. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:10757-10772. [PMID: 37322959 DOI: 10.3934/mbe.2023477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Prior to the surgical removal of an acoustic neuroma, the majority of patients anticipate that their hearing will be preserved to the greatest possible extent following surgery. This paper proposes a postoperative hearing preservation prediction model for the characteristics of class-imbalanced hospital real data based on the extreme gradient boost tree (XGBoost). In order to eliminate sample imbalance, the synthetic minority oversampling technique (SMOTE) is applied to increase the number of underclass samples in the data. Multiple machine learning models are also used for the accurate prediction of surgical hearing preservation in acoustic neuroma patients. In comparison to research results from existing literature, the experimental results found the model proposed in this paper to be superior. In summary, the method this paper proposes can make a significant contribution to the development of personalized preoperative diagnosis and treatment plans for patients, leading to effective judgment for the hearing retention of patients with acoustic neuroma following surgery, a simplified long medical treatment process and saved medical resources.
Collapse
Affiliation(s)
- Cenyi Yang
- School of Mathematics and Statistics, Central South University, Changsha 410083, China
| |
Collapse
|
9
|
Chakraborty A, Mitra S, Bhattacharjee M, De D, Pal AJ. Determining human-coronavirus protein-protein interaction using machine intelligence. MEDICINE IN NOVEL TECHNOLOGY AND DEVICES 2023; 18:100228. [PMID: 37056696 PMCID: PMC10077817 DOI: 10.1016/j.medntd.2023.100228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 03/29/2023] [Accepted: 04/01/2023] [Indexed: 04/08/2023] Open
Abstract
The Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) virus spread the novel CoronaVirus −19 (nCoV-19) pandemic, resulting in millions of fatalities globally. Recent research demonstrated that the Protein-Protein Interaction (PPI) between SARS-CoV-2 and human proteins is accountable for viral pathogenesis. However, many of these PPIs are poorly understood and unexplored, necessitating a more in-depth investigation to find latent yet critical interactions. This article elucidates the host-viral PPI through Machine Learning (ML) lenses and validates the biological significance of the same using web-based tools. ML classifiers are designed based on comprehensive datasets with five sequence-based features of human proteins, namely Amino Acid Composition, Pseudo Amino Acid Composition, Conjoint Triad, Dipeptide Composition, and Normalized Auto Correlation. A majority voting rule-based ensemble method composed of the Random Forest Model (RFM), AdaBoost, and Bagging technique is proposed that delivers encouraging statistical performance compared to other models employed in this work. The proposed ensemble model predicted a total of 111 possible SARS-CoV-2 human target proteins with a high likelihood factor ≥70%, validated by utilizing Gene Ontology (GO) and KEGG pathway enrichment analysis. Consequently, this research can aid in a deeper understanding of the molecular mechanisms underlying viral pathogenesis and provide clues for developing more efficient anti-COVID medications.
Collapse
Affiliation(s)
- Arijit Chakraborty
- Bachelor of Computer Application Department, The Heritage Academy, Kolkata, India
| | - Sajal Mitra
- Department of Computer Science and Engineering, Heritage Institute of Technology, Kolkata, India
| | | | - Debashis De
- Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India
| | | |
Collapse
|
10
|
Han S, Zhang Z, Ma W, Gao J, Li Y. Nucleotide-Binding Oligomerization Domain (NOD)-Like Receptor Subfamily C (NLRC) as a Prognostic Biomarker for Glioblastoma Multiforme Linked to Tumor Microenvironment: A Bioinformatics, Immunohistochemistry, and Machine Learning-Based Study. J Inflamm Res 2023; 16:523-537. [PMID: 36798872 PMCID: PMC9926983 DOI: 10.2147/jir.s397305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/03/2023] [Indexed: 02/12/2023] Open
Abstract
Purpose Glioblastoma multiforme (GBM) remains the deadliest primary brain tumor. We aimed to illuminate the role of nucleotide-binding oligomerization domain (NOD)-like receptor subfamily C (NLRC) in GBM. Patients and Methods Based on public database data (mainly The Cancer Genome Atlas [TCGA]), we performed bioinformatics analysis to visually evaluate the role and mechanism of NLRCs in GBM. Then, we validated our findings in a glioma tissue microarray (TMA) by immunohistochemistry (IHC), and the prognostic value of NOD1 was assessed via random forest (RF) models. Results In GBM tissues, the expression of NLRC members was significantly increased, which was related to the low survival rate of GBM. Additionally, Cox regression analysis revealed that the expression of NOD1 (among NLRCs) served as an independent prognostic marker. A nomogram based on multivariate analysis proved the effective predictive performance of NOD1 in GBM. Enrichment analysis showed that high expression of NOD1 could regulate extracellular structure, cell adhesion, and immune response to promote tumor progression. Then, immune infiltration analysis showed that NOD1 overexpression correlated with an enhanced immune response. Then, in a glioma TMA, the results of IHC revealed that the increase in NOD1 expression indicated high recurrence and poor prognosis of human glioma. Furthermore, the expression level of NOD1 showed good prognostic value in the TMA cohort via RF. Conclusion The value of NOD1 as a biomarker for GBM was demonstrated. The possible mechanisms may lie in the regulatory role of NLRC-related pathways in the tumor microenvironment.
Collapse
Affiliation(s)
- Shiyuan Han
- Department of Neurosurgery, Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital (Dongdan Campus), Beijing, People’s Republic of China
| | - Zimu Zhang
- Department of General Surgery, Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital (Dongdan Campus), Beijing, People’s Republic of China
| | - Wenbin Ma
- Department of Neurosurgery, Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital (Dongdan Campus), Beijing, People’s Republic of China
| | - Jun Gao
- Department of Neurosurgery, Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital (Dongdan Campus), Beijing, People’s Republic of China
| | - Yongning Li
- Department of Neurosurgery, Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital (Dongdan Campus), Beijing, People’s Republic of China,Department of International Medical Service, Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital (Dongdan campus), Beijing, People’s Republic of China,Correspondence: Yongning Li, Department of Neurosurgery and Department of International Medical Service, Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital (Dongdan campus), No. 1 Shuaifuyuan Wangfujing Dongcheng District, Beijing, People’s Republic of China, Tel +86 13901074129, Fax +86 1069152530, Email
| |
Collapse
|
11
|
Jiang L, Chen S, Wu Y, Zhou D, Duan L. Prediction of coronary heart disease in gout patients using machine learning models. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:4574-4591. [PMID: 36896513 DOI: 10.3934/mbe.2023212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Growing evidence shows that there is an increased risk of cardiovascular diseases among gout patients, especially coronary heart disease (CHD). Screening for CHD in gout patients based on simple clinical factors is still challenging. Here we aim to build a diagnostic model based on machine learning so as to avoid missed diagnoses or over exaggerated examinations as much as possible. Over 300 patient samples collected from Jiangxi Provincial People's Hospital were divided into two groups (gout and gout+CHD). The prediction of CHD in gout patients has thus been modeled as a binary classification problem. A total of eight clinical indicators were selected as features for machine learning classifiers. A combined sampling technique was used to overcome the imbalanced problem in the training dataset. Eight machine learning models were used including logistic regression, decision tree, ensemble learning models (random forest, XGBoost, LightGBM, GBDT), support vector machine (SVM) and neural networks. Our results showed that stepwise logistic regression and SVM achieved more excellent AUC values, while the random forest and XGBoost models achieved more excellent performances in terms of recall and accuracy. Furthermore, several high-risk factors were found to be effective indices in predicting CHD in gout patients, which provide insights into the clinical diagnosis.
Collapse
Affiliation(s)
- Lili Jiang
- Department of Rheumatology and Clinical Immunology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, China
| | - Sirong Chen
- School of Mathematical Sciences, Soochow University, Suzhou, China
| | - Yuanhui Wu
- Department of Rheumatology and Clinical Immunology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, China
| | - Da Zhou
- School of Mathematical Sciences, Xiamen University, Xiamen, China
| | - Lihua Duan
- Department of Rheumatology and Clinical Immunology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, China
| |
Collapse
|
12
|
Anteghini M, Haja A, Martins dos Santos VA, Schomaker L, Saccenti E. OrganelX web server for sub-peroxisomal and sub-mitochondrial protein localization and peroxisomal target signal detection. Comput Struct Biotechnol J 2022; 21:128-133. [PMID: 36544474 PMCID: PMC9747352 DOI: 10.1016/j.csbj.2022.11.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 11/28/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022] Open
Abstract
We present the OrganelX e-Science Web Server that provides a user-friendly implementation of the In-Pero and In-Mito classifiers for sub-peroxisomal and sub-mitochondrial localization of peroxisomal and mitochondrial proteins and the Is-PTS1 algorithm for detecting and validating potential peroxisomal proteins carrying a PTS1 signal sequence. The OrganelX e-Science Web Server is available at https://organelx.hpc.rug.nl/fasta/.
Collapse
Affiliation(s)
- Marco Anteghini
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands,LifeGlimmer GmbH, Berlin, Germany,Corresponding author at: Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands (M. Anteghini).
| | - Asmaa Haja
- Bernoulli Institute, University of Groningen, Groningen, The Netherlands
| | - Vitor A.P. Martins dos Santos
- LifeGlimmer GmbH, Berlin, Germany,Bioprocess Engineering, Wageningen University & Research, Wageningen, The Netherlands
| | - Lambert Schomaker
- Bernoulli Institute, University of Groningen, Groningen, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands,Corresponding author at: Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands (M. Anteghini).
| |
Collapse
|
13
|
Promising perspectives on novel protein food sources combining artificial intelligence and 3D food printing for food industry. Trends Food Sci Technol 2022. [DOI: 10.1016/j.tifs.2022.05.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
14
|
Badré A, Pan C. LINA: A Linearizing Neural Network Architecture for Accurate First-Order and Second-Order Interpretations. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2022; 10:36166-36176. [PMID: 35462722 PMCID: PMC9032252 DOI: 10.1109/access.2022.3163257] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
While neural networks can provide high predictive performance, it was a challenge to identify the salient features and important feature interactions used for their predictions. This represented a key hurdle for deploying neural networks in many biomedical applications that require interpretability, including predictive genomics. In this paper, linearizing neural network architecture (LINA) was developed here to provide both the first-order and the second-order interpretations on both the instance-wise and the model-wise levels. LINA combines the representational capacity of a deep inner attention neural network with a linearized intermediate representation for model interpretation. In comparison with DeepLIFT, LIME, Grad*Input and L2X, the first-order interpretation of LINA had better Spearman correlation with the ground-truth importance rankings of features in synthetic datasets. In comparison with NID and GEH, the second-order interpretation results from LINA achieved better precision for identification of the ground-truth feature interactions in synthetic datasets. These algorithms were further benchmarked using predictive genomics as a real-world application. LINA identified larger numbers of important single nucleotide polymorphisms (SNPs) and salient SNP interactions than the other algorithms at given false discovery rates. The results showed accurate and versatile model interpretation using LINA.
Collapse
Affiliation(s)
- Adrien Badré
- School of Computer Science, The University of Oklahoma, Norman, OK 73019, USA
| | - Chongle Pan
- School of Computer Science, The University of Oklahoma, Norman, OK 73019, USA
| |
Collapse
|
15
|
Pantic I, Paunovic J, Pejic S, Drakulic D, Todorovic A, Stankovic S, Vucevic D, Cumic J, Radosavljevic T. Artificial intelligence approaches to the biochemistry of oxidative stress: Current state of the art. Chem Biol Interact 2022; 358:109888. [PMID: 35296431 DOI: 10.1016/j.cbi.2022.109888] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/04/2022] [Accepted: 03/09/2022] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI) and machine learning models are today frequently used for classification and prediction of various biochemical processes and phenomena. In recent years, numerous research efforts have been focused on developing such models for assessment, categorization, and prediction of oxidative stress. Supervised machine learning can successfully automate the process of evaluation and quantification of oxidative damage in biological samples, as well as extract useful data from the abundance of experimental results. In this concise review, we cover the possible applications of neural networks, decision trees and regression analysis as three common strategies in machine learning. We also review recent works on the various weaknesses and limitations of artificial intelligence in biochemistry and related scientific areas. Finally, we discuss future innovative approaches on the ways how AI can contribute to the automation of oxidative stress measurement and diagnosis of diseases associated with oxidative damage.
Collapse
Affiliation(s)
- Igor Pantic
- University of Belgrade, Faculty of Medicine, Institute of Medical Physiology, Laboratory for Cellular Physiology, Visegradska 26/II, RS-11129, Belgrade, Serbia; University of Haifa, 199 Abba Hushi Blvd, Mount Carmel, Haifa, IL, 3498838, Israel; Ben-Gurion University of the Negev, Faculty of Health Sciences, Department of Physiology and Cell Biology, 84105 Be'er Sheva, Israel.
| | - Jovana Paunovic
- University of Belgrade, Faculty of Medicine, Institute of Pathological Physiology, Dr Subotica 9, RS-11129, Belgrade, Serbia
| | - Snezana Pejic
- University of Belgrade, Vinca Institute of Nuclear Sciences, Department of Molecular Biology and Endocrinology, Mike Petrovica Alasa 12-14, RS-11351, Belgrade, Serbia
| | - Dunja Drakulic
- University of Belgrade, Vinca Institute of Nuclear Sciences, Department of Molecular Biology and Endocrinology, Mike Petrovica Alasa 12-14, RS-11351, Belgrade, Serbia
| | - Ana Todorovic
- University of Belgrade, Vinca Institute of Nuclear Sciences, Department of Molecular Biology and Endocrinology, Mike Petrovica Alasa 12-14, RS-11351, Belgrade, Serbia
| | - Sanja Stankovic
- University Clinical Centre of Serbia, Centre for Medical Biochemistry, Visegradska 26, RS-11000, Belgrade, Serbia; University of Kragujevac, Faculty of Medical Sciences, Svetozara Markovica 69, RS-34000, Kragujevac, Serbia
| | - Danijela Vucevic
- University of Belgrade, Faculty of Medicine, Institute of Pathological Physiology, Dr Subotica 9, RS-11129, Belgrade, Serbia
| | - Jelena Cumic
- University of Belgrade, Faculty of Medicine, University Clinical Centre of Serbia, Dr. Koste Todorovića 8, RS-11129, Belgrade, Serbia
| | - Tatjana Radosavljevic
- University of Belgrade, Faculty of Medicine, Institute of Pathological Physiology, Dr Subotica 9, RS-11129, Belgrade, Serbia
| |
Collapse
|
16
|
Ahmed Z, Zulfiqar H, Khan AA, Gul I, Dao FY, Zhang ZY, Yu XL, Tang L. iThermo: A Sequence-Based Model for Identifying Thermophilic Proteins Using a Multi-Feature Fusion Strategy. Front Microbiol 2022; 13:790063. [PMID: 35273581 PMCID: PMC8902591 DOI: 10.3389/fmicb.2022.790063] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 01/10/2022] [Indexed: 01/20/2023] Open
Abstract
Thermophilic proteins have important application value in biotechnology and industrial processes. The correct identification of thermophilic proteins provides important information for the application of these proteins in engineering. The identification method of thermophilic proteins based on biochemistry is laborious, time-consuming, and high cost. Therefore, there is an urgent need for a fast and accurate method to identify thermophilic proteins. Considering this urgency, we constructed a reliable benchmark dataset containing 1,368 thermophilic and 1,443 non-thermophilic proteins. A multi-layer perceptron (MLP) model based on a multi-feature fusion strategy was proposed to discriminate thermophilic proteins from non-thermophilic proteins. On independent data set, the proposed model could achieve an accuracy of 96.26%, which demonstrates that the model has a good application prospect. In order to use the model conveniently, a user-friendly software package called iThermo was established and can be freely accessed at http://lin-group.cn/server/iThermo/index.html. The high accuracy of the model and the practicability of the developed software package indicate that this study can accelerate the discovery and engineering application of thermally stable proteins.
Collapse
Affiliation(s)
- Zahoor Ahmed
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Abdullah Aman Khan
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China.,Sichuan Artificial Intelligence Research Institute, Yibin, China
| | - Ijaz Gul
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Tsinghua Shenzhen International Graduate School, Institute of Biopharmaceutical and Health Engineering, Tsinghua University, Shenzhen, China
| | - Fu-Ying Dao
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao-Yue Zhang
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou, China
| | - Lixia Tang
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
17
|
Prediction of Intrinsically Disordered Proteins Using Machine Learning Based on Low Complexity Methods. ALGORITHMS 2022. [DOI: 10.3390/a15030086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Prediction of intrinsic disordered proteins is a hot area in the field of bio-information. Due to the high cost of evaluating the disordered regions of protein sequences using experimental methods, we used a low-complexity prediction scheme. Sequence complexity is used in this scheme to calculate five features for each residue of the protein sequence, including the Shannon entropy, the Topo-logical entropy, the Permutation entropy and the weighted average values of two propensities. Particularly, this is the first time that permutation entropy has been applied to the field of protein sequencing. In addition, in the data preprocessing stage, an appropriately sized sliding window and a comprehensive oversampling scheme can be used to improve the prediction performance of our scheme, and two ensemble learning algorithms are also used to verify the prediction results before and after. The results show that adding permutation entropy improves the performance of the prediction algorithm, in which the MCC value can be improved from the original 0.465 to 0.526 in our scheme, proving its universality. Finally, we compare the simulation results of our scheme with those of some existing schemes to demonstrate its effectiveness.
Collapse
|
18
|
Tran HV, Nguyen QH. iAnt: Combination of Convolutional Neural Network and Random Forest Models Using PSSM and BERT Features to Identify Antioxidant Proteins. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210820095144] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Reactive Oxygen Species (ROS) play many roles in the body, such as cell signaling,
homeostasis, or protection from harmful bacteria. However, an excess of ROS in the body will
damage lipids, proteins, and DNA. Many studies have shown that various environmental factors increase
the amount of ROS produced in the body. Antioxidant proteins are responsible for neutralizing
these ROS or free radicals. Although the amount of data on protein sequences has increased over the
last two decades, we still lack bioinformatics tools to be able to accurately identify antioxidant protein
sequences. Furthermore, biochemical methods to determine antioxidant proteins are very expensive and
time-consuming. Therefore, a machine learning approach must be used to speed up the computation.
Methods:
In this study, we propose a new method that combines a convolutional neural network and Random
Forest using two features, the normalized PSSM and the best-selected feature of the ProtBert output.
Results:
Our model gave very good results on the independent test dataset with 97.3% sensitivity and
95.9% specificity. Comparison with current state-of-the-art models shows that our model is superior.
We have also installed iAnt as an online website with a friendly interface available at a website: http:
//antixiodant.nguyenhongquang.edu.vn.
Conclusion:
iAnt has been developed to accurately identify the antioxidant protein. It shows results
outperforming the existing state-of-the-art methods; it is also available online.
Collapse
Affiliation(s)
- Hoang V. Tran
- Department of Computer Engineering, School of Information and Communication Technology, Hanoi University of
Science and Technology, Hanoi, Vietnam
| | - Quang H. Nguyen
- Department of Computer Engineering, School of Information and Communication Technology, Hanoi University of
Science and Technology, Hanoi, Vietnam
| |
Collapse
|
19
|
Jabeen A, de March CA, Matsunami H, Ranganathan S. Machine Learning Assisted Approach for Finding Novel High Activity Agonists of Human Ectopic Olfactory Receptors. Int J Mol Sci 2021; 22:ijms222111546. [PMID: 34768977 PMCID: PMC8583936 DOI: 10.3390/ijms222111546] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/21/2021] [Accepted: 10/22/2021] [Indexed: 12/29/2022] Open
Abstract
Olfactory receptors (ORs) constitute the largest superfamily of G protein-coupled receptors (GPCRs). ORs are involved in sensing odorants as well as in other ectopic roles in non-nasal tissues. Matching of an enormous number of the olfactory stimulation repertoire to its counterpart OR through machine learning (ML) will enable understanding of olfactory system, receptor characterization, and exploitation of their therapeutic potential. In the current study, we have selected two broadly tuned ectopic human OR proteins, OR1A1 and OR2W1, for expanding their known chemical space by using molecular descriptors. We present a scheme for selecting the optimal features required to train an ML-based model, based on which we selected the random forest (RF) as the best performer. High activity agonist prediction involved screening five databases comprising ~23 M compounds, using the trained RF classifier. To evaluate the effectiveness of the machine learning based virtual screening and check receptor binding site compatibility, we used docking of the top target ligands to carefully develop receptor model structures. Finally, experimental validation of selected compounds with significant docking scores through in vitro assays revealed two high activity novel agonists for OR1A1 and one for OR2W1.
Collapse
Affiliation(s)
- Amara Jabeen
- Applied BioSciences, Macquarie University, Sydney, NSW 2109, Australia;
| | - Claire A. de March
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA;
| | - Hiroaki Matsunami
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA;
- Department of Neurobiology, Duke Institute for Brain Sciences, Duke University, Durham, NC 27710, USA
- Correspondence: (H.M.); (S.R.)
| | - Shoba Ranganathan
- Applied BioSciences, Macquarie University, Sydney, NSW 2109, Australia;
- Correspondence: (H.M.); (S.R.)
| |
Collapse
|
20
|
Ahmed H, Alarabi L, El-Sappagh S, Soliman H, Elmogy M. Genetic variations analysis for complex brain disease diagnosis using machine learning techniques: opportunities and hurdles. PeerJ Comput Sci 2021; 7:e697. [PMID: 34616886 PMCID: PMC8459785 DOI: 10.7717/peerj-cs.697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 08/05/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVES This paper presents an in-depth review of the state-of-the-art genetic variations analysis to discover complex genes associated with the brain's genetic disorders. We first introduce the genetic analysis of complex brain diseases, genetic variation, and DNA microarrays. Then, the review focuses on available machine learning methods used for complex brain disease classification. Therein, we discuss the various datasets, preprocessing, feature selection and extraction, and classification strategies. In particular, we concentrate on studying single nucleotide polymorphisms (SNP) that support the highest resolution for genomic fingerprinting for tracking disease genes. Subsequently, the study provides an overview of the applications for some specific diseases, including autism spectrum disorder, brain cancer, and Alzheimer's disease (AD). The study argues that despite the significant recent developments in the analysis and treatment of genetic disorders, there are considerable challenges to elucidate causative mutations, especially from the viewpoint of implementing genetic analysis in clinical practice. The review finally provides a critical discussion on the applicability of genetic variations analysis for complex brain disease identification highlighting the future challenges. METHODS We used a methodology for literature surveys to obtain data from academic databases. Criteria were defined for inclusion and exclusion. The selection of articles was followed by three stages. In addition, the principal methods for machine learning to classify the disease were presented in each stage in more detail. RESULTS It was revealed that machine learning based on SNP was widely utilized to solve problems of genetic variation for complex diseases related to genes. CONCLUSIONS Despite significant developments in genetic diseases in the past two decades of the diagnosis and treatment, there is still a large percentage in which the causative mutation cannot be determined, and a final genetic diagnosis remains elusive. So, we need to detect the variations of the genes related to brain disorders in the early disease stages.
Collapse
Affiliation(s)
- Hala Ahmed
- Information Technology Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - Louai Alarabi
- Department of Computer Science, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Shaker El-Sappagh
- Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Information Systems Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
| | - Hassan Soliman
- Information Technology Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - Mohammed Elmogy
- Information Technology Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| |
Collapse
|
21
|
An Innovative Machine Learning Approach to Predict the Dietary Fiber Content of Packaged Foods. Nutrients 2021; 13:nu13093195. [PMID: 34579072 PMCID: PMC8470168 DOI: 10.3390/nu13093195] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 01/23/2023] Open
Abstract
Underconsumption of dietary fiber is prevalent worldwide and is associated with multiple adverse health conditions. Despite the importance of fiber, the labeling of fiber content on packaged foods and beverages is voluntary in most countries, making it challenging for consumers and policy makers to monitor fiber consumption. Here, we developed a machine learning approach for automated and systematic prediction of fiber content using nutrient information commonly available on packaged products. An Australian packaged food dataset with known fiber content information was divided into training (n = 8986) and test datasets (n = 2455). Utilization of a k-nearest neighbors machine learning algorithm explained a greater proportion of variance in fiber content than an existing manual fiber prediction approach (R2 = 0.84 vs. R2 = 0.68). Our findings highlight the opportunity to use machine learning to efficiently predict the fiber content of packaged products on a large scale.
Collapse
|
22
|
Schizophrenia Detection Using Machine Learning Approach from Social Media Content. SENSORS 2021; 21:s21175924. [PMID: 34502815 PMCID: PMC8434514 DOI: 10.3390/s21175924] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/28/2021] [Accepted: 08/30/2021] [Indexed: 12/15/2022]
Abstract
Schizophrenia is a severe mental disorder that ranks among the leading causes of disability worldwide. However, many cases of schizophrenia remain untreated due to failure to diagnose, self-denial, and social stigma. With the advent of social media, individuals suffering from schizophrenia share their mental health problems and seek support and treatment options. Machine learning approaches are increasingly used for detecting schizophrenia from social media posts. This study aims to determine whether machine learning could be effectively used to detect signs of schizophrenia in social media users by analyzing their social media texts. To this end, we collected posts from the social media platform Reddit focusing on schizophrenia, along with non-mental health related posts (fitness, jokes, meditation, parenting, relationships, and teaching) for the control group. We extracted linguistic features and content topics from the posts. Using supervised machine learning, we classified posts belonging to schizophrenia and interpreted important features to identify linguistic markers of schizophrenia. We applied unsupervised clustering to the features to uncover a coherent semantic representation of words in schizophrenia. We identified significant differences in linguistic features and topics including increased use of third person plural pronouns and negative emotion words and symptom-related topics. We distinguished schizophrenic from control posts with an accuracy of 96%. Finally, we found that coherent semantic groups of words were the key to detecting schizophrenia. Our findings suggest that machine learning approaches could help us understand the linguistic characteristics of schizophrenia and identify schizophrenia or otherwise at-risk individuals using social media texts.
Collapse
|
23
|
Pérez-Reynoso FD, Rodríguez-Guerrero L, Salgado-Ramírez JC, Ortega-Palacios R. Human-Machine Interface: Multiclass Classification by Machine Learning on 1D EOG Signals for the Control of an Omnidirectional Robot. SENSORS (BASEL, SWITZERLAND) 2021; 21:5882. [PMID: 34502773 PMCID: PMC8434373 DOI: 10.3390/s21175882] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 08/24/2021] [Accepted: 08/26/2021] [Indexed: 01/25/2023]
Abstract
People with severe disabilities require assistance to perform their routine activities; a Human-Machine Interface (HMI) will allow them to activate devices that respond according to their needs. In this work, an HMI based on electrooculography (EOG) is presented, the instrumentation is placed on portable glasses that have the task of acquiring both horizontal and vertical EOG signals. The registration of each eye movement is identified by a class and categorized using the one hot encoding technique to test precision and sensitivity of different machine learning classification algorithms capable of identifying new data from the eye registration; the algorithm allows to discriminate blinks in order not to disturb the acquisition of the eyeball position commands. The implementation of the classifier consists of the control of a three-wheeled omnidirectional robot to validate the response of the interface. This work proposes the classification of signals in real time and the customization of the interface, minimizing the user's learning curve. Preliminary results showed that it is possible to generate trajectories to control an omnidirectional robot to implement in the future assistance system to control position through gaze orientation.
Collapse
Affiliation(s)
| | - Liliam Rodríguez-Guerrero
- Research Center on Technology of Information and Systems (CITIS), Electric and Control Academic Group, Universidad Autónoma del Estado de Hidalgo (UAEH), Pachuca de Soto 42039, Mexico
| | | | - Rocío Ortega-Palacios
- Biomedical Engineering, Universidad Politécnica de Pachuca (UPP), Zempoala 43830, Mexico
| |
Collapse
|
24
|
Ji C, Liu Z, Wang Y, Ni J, Zheng C. GATNNCDA: A Method Based on Graph Attention Network and Multi-Layer Neural Network for Predicting circRNA-Disease Associations. Int J Mol Sci 2021; 22:8505. [PMID: 34445212 PMCID: PMC8395191 DOI: 10.3390/ijms22168505] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 07/30/2021] [Accepted: 08/03/2021] [Indexed: 12/30/2022] Open
Abstract
Circular RNAs (circRNAs) are a new class of endogenous non-coding RNAs with covalent closed loop structure. Researchers have revealed that circRNAs play an important role in human diseases. As experimental identification of interactions between circRNA and disease is time-consuming and expensive, effective computational methods are an urgent need for predicting potential circRNA-disease associations. In this study, we proposed a novel computational method named GATNNCDA, which combines Graph Attention Network (GAT) and multi-layer neural network (NN) to infer disease-related circRNAs. Specially, GATNNCDA first integrates disease semantic similarity, circRNA functional similarity and the respective Gaussian Interaction Profile (GIP) kernel similarities. The integrated similarities are used as initial node features, and then GAT is applied for further feature extraction in the heterogeneous circRNA-disease graph. Finally, the NN-based classifier is introduced for prediction. The results of fivefold cross validation demonstrated that GATNNCDA achieved an average AUC of 0.9613 and AUPR of 0.9433 on the CircR2Disease dataset, and outperformed other state-of-the-art methods. In addition, case studies on breast cancer and hepatocellular carcinoma showed that 20 and 18 of the top 20 candidates were respectively confirmed in the validation datasets or published literature. Therefore, GATNNCDA is an effective and reliable tool for discovering circRNA-disease associations.
Collapse
Affiliation(s)
- Cunmei Ji
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Z.L.); (Y.W.); (J.N.)
| | - Zhihao Liu
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Z.L.); (Y.W.); (J.N.)
| | - Yutian Wang
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Z.L.); (Y.W.); (J.N.)
| | - Jiancheng Ni
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Z.L.); (Y.W.); (J.N.)
| | - Chunhou Zheng
- School of Artificial Intelligence, Anhui University, Hefei 230601, China
| |
Collapse
|
25
|
Shen Z, Liu T, Xu T. Accurate Identification of Antioxidant Proteins Based on a Combination of Machine Learning Techniques and Hidden Markov Model Profiles. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5770981. [PMID: 34413898 PMCID: PMC8369162 DOI: 10.1155/2021/5770981] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 07/15/2021] [Accepted: 07/26/2021] [Indexed: 01/19/2023]
Abstract
Antioxidant proteins (AOPs) play important roles in the management and prevention of several human diseases due to their ability to neutralize excess free radicals. However, the identification of AOPs by using wet-lab experimental techniques is often time-consuming and expensive. In this study, we proposed an accurate computational model, called AOP-HMM, to predict AOPs by extracting discriminatory evolutionary features from hidden Markov model (HMM) profiles. First, auto cross-covariance (ACC) variables were applied to transform the HMM profiles into fixed-length feature vectors. Then, we performed the analysis of variance (ANOVA) method to reduce the dimensionality of the raw feature space. Finally, a support vector machine (SVM) classifier was adopted to conduct the prediction of AOPs. To comprehensively evaluate the performance of the proposed AOP-HMM model, the 10-fold cross-validation (CV), the jackknife CV, and the independent test were carried out on two widely used benchmark datasets. The experimental results demonstrated that AOP-HMM outperformed most of the existing methods and could be used to quickly annotate AOPs and guide the experimental process.
Collapse
Affiliation(s)
- Zhehan Shen
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Ting Xu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| |
Collapse
|
26
|
Reshi AA, Ashraf I, Rustam F, Shahzad HF, Mehmood A, Choi GS. Diagnosis of vertebral column pathologies using concatenated resampling with machine learning algorithms. PeerJ Comput Sci 2021; 7:e547. [PMID: 34395856 PMCID: PMC8323723 DOI: 10.7717/peerj-cs.547] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 04/25/2021] [Indexed: 06/13/2023]
Abstract
Medical diagnosis through the classification of biomedical attributes is one of the exponentially growing fields in bioinformatics. Although a large number of approaches have been presented in the past, wide use and superior performance of the machine learning (ML) methods in medical diagnosis necessitates significant consideration for automatic diagnostic methods. This study proposes a novel approach called concatenated resampling (CR) to increase the efficacy of traditional ML algorithms. The performance is analyzed leveraging four ML approaches like tree-based ensemble approaches, and linear machine learning approach for automatic diagnosis of inter-vertebral pathologies with increased. Besides, undersampling, over-sampling, and proposed CR techniques have been applied to unbalanced training dataset to analyze the impact of these techniques on the accuracy of each of the classification model. Extensive experiments have been conducted to make comparisons among different classification models using several metrics including accuracy, precision, recall, and F 1 score. Comparative analysis has been performed on the experimental results to identify the best performing classifier along with the application of the re-sampling technique. The results show that the extra tree classifier achieves an accuracy of 0.99 in association with the proposed CR technique.
Collapse
Affiliation(s)
- Aijaz Ahmad Reshi
- College of Computer Science and Engineering, Department of Computer Science, Taibah University, Al Madinah Al Munawarah, Saudi Arabia
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongbuk, Gyeongsan-si, South Korea
| | - Furqan Rustam
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Hina Fatima Shahzad
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Arif Mehmood
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Gyu Sang Choi
- Information and Communication Engineering, Yeungnam University, Gyeongbuk, Gyeongsan-si, South Korea
| |
Collapse
|
27
|
Automated Identification of Sleep Disorder Types Using Triplet Half-Band Filter and Ensemble Machine Learning Techniques with EEG Signals. ELECTRONICS 2021. [DOI: 10.3390/electronics10131531] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
A sleep disorder is a medical condition that affects an individual’s regular sleeping pattern and routine, hence negatively affecting the individual’s health. The traditional procedures of identifying sleep disorders by clinicians involve questionnaires and polysomnography (PSG), which are subjective, time-consuming, and inconvenient. Hence, an automated sleep disorder identification is required to overcome these limitations. In the proposed study, we have proposed a method using electroencephalogram (EEG) signals for the automated identification of six sleep disorders, namely insomnia, nocturnal frontal lobe epilepsy (NFLE), narcolepsy, rapid eye movement behavior disorder (RBD), periodic leg movement disorder (PLM), and sleep-disordered breathing (SDB). To the best of our belief, this is one of the first studies ever undertaken to identify sleep disorders using EEG signals employing cyclic alternating pattern (CAP) sleep database. After sleep-scoring EEG epochs, we have created eight different data subsets of EEG epochs to develop the proposed model. A novel optimal triplet half-band filter bank (THFB) is used to obtain the subbands of EEG signals. We have extracted Hjorth parameters from subbands of EEG epochs. The selected features are fed to various supervised machine learning algorithms for the automated classification of sleep disorders. Our proposed system has obtained the highest accuracy of 99.2%, 98.2%, 96.2%, 98.3%, 98.8%, and 98.8% for insomnia, narcolepsy, NFLE, PLM, RBD, and SDB classes against normal healthy subjects, respectively, applying ensemble boosted trees classifier. As a result, we have attained the highest accuracy of 91.3% to identify the type of sleep disorder. The proposed method is simple, fast, efficient, and may reduce the challenges faced by medical practitioners during the diagnosis of various sleep disorders accurately in less time at sleep clinics and homes.
Collapse
|
28
|
Anteghini M, Martins dos Santos V, Saccenti E. In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins. Int J Mol Sci 2021; 22:6409. [PMID: 34203866 PMCID: PMC8232616 DOI: 10.3390/ijms22126409] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 05/31/2021] [Accepted: 06/09/2021] [Indexed: 01/28/2023] Open
Abstract
Peroxisomes are ubiquitous membrane-bound organelles, and aberrant localisation of peroxisomal proteins contributes to the pathogenesis of several disorders. Many computational methods focus on assigning protein sequences to subcellular compartments, but there are no specific tools tailored for the sub-localisation (matrix vs. membrane) of peroxisome proteins. We present here In-Pero, a new method for predicting protein sub-peroxisomal cellular localisation. In-Pero combines standard machine learning approaches with recently proposed multi-dimensional deep-learning representations of the protein amino-acid sequence. It showed a classification accuracy above 0.9 in predicting peroxisomal matrix and membrane proteins. The method is trained and tested using a double cross-validation approach on a curated data set comprising 160 peroxisomal proteins with experimental evidence for sub-peroxisomal localisation. We further show that the proposed approach can be easily adapted (In-Mito) to the prediction of mitochondrial protein localisation obtaining performances for certain classes of proteins (matrix and inner-membrane) superior to existing tools.
Collapse
Affiliation(s)
- Marco Anteghini
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands;
- LifeGlimmer GmbH, 12163 Berlin, Germany
| | - Vitor Martins dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands;
- LifeGlimmer GmbH, 12163 Berlin, Germany
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands;
| |
Collapse
|
29
|
Prediction of African Swine Fever Virus Inhibitors by Molecular Docking-Driven Machine Learning Models. Molecules 2021; 26:molecules26123592. [PMID: 34208385 PMCID: PMC8231271 DOI: 10.3390/molecules26123592] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Revised: 05/23/2021] [Accepted: 06/09/2021] [Indexed: 01/31/2023] Open
Abstract
African swine fever virus (ASFV) causes a highly contagious and severe hemorrhagic viral disease with high mortality in domestic pigs of all ages. Although the virus is harmless to humans, the ongoing ASFV epidemic could have severe economic consequences for global food security. Recent studies have found a few antiviral agents that can inhibit ASFV infections. However, currently, there are no vaccines or antiviral drugs. Hence, there is an urgent need to identify new drugs to treat ASFV. Based on the structural information data on the targets of ASFV, we used molecular docking and machine learning models to identify novel antiviral agents. We confirmed that compounds with high affinity present in the region of interest belonged to subsets in the chemical space using principal component analysis and k-means clustering in molecular docking studies of FDA-approved drugs. These methods predicted pentagastrin as a potential antiviral drug against ASFVs. Finally, it was also observed that the compound had an inhibitory effect on AsfvPolX activity. Results from the present study suggest that molecular docking and machine learning models can play an important role in identifying potential antiviral drugs against ASFVs.
Collapse
|
30
|
Application of artificial intelligence for detection of chemico-biological interactions associated with oxidative stress and DNA damage. Chem Biol Interact 2021; 345:109533. [PMID: 34051207 DOI: 10.1016/j.cbi.2021.109533] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 05/17/2021] [Accepted: 05/24/2021] [Indexed: 12/16/2022]
Abstract
In recent years, various AI-based methods have been developed in order to uncover chemico-biological interactions associated with DNA damage and oxidative stress. Various decision trees, bayesian networks, random forests, logistic regression models, support vector machines as well as deep learning tools, have great potential in the area of molecular biology and toxicology, and it is estimated that in the future, they will greatly contribute to our understanding of molecular and cellular mechanisms associated with DNA damage and repair. In this concise review, we discuss recent attempts to build machine learning tools for assessment of radiation - induced DNA damage as well as algorithms that can analyze the data from the most frequently used DNA damage assays in molecular biology. We also review recent works on the detection of antioxidant proteins with machine learning, and the use of AI-related methods for prediction and evaluation of noncoding DNA sequences. Finally, we discuss previously published research on the potential application of machine learning tools in aging research.
Collapse
|
31
|
Toma M, Concu R. Computational Biology: A New Frontier in Applied Biology. BIOLOGY 2021; 10:biology10050374. [PMID: 33925472 PMCID: PMC8145007 DOI: 10.3390/biology10050374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 04/25/2021] [Indexed: 11/16/2022]
Abstract
All living things are related to one another [...].
Collapse
Affiliation(s)
- Milan Toma
- Serota Academic Center (Room 138), New York Institute of Technology, Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, Northern Boulevard, P.O. Box 8000, Old Westbury, NY 11568, USA
- Correspondence: (M.T.); (R.C.)
| | - Riccardo Concu
- Faculty of Science, University of Porto, Rua do Campo Alegre, s/n, 4169-007 Porto, Portugal
- Correspondence: (M.T.); (R.C.)
| |
Collapse
|
32
|
Abstract
Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-Layer Perceptron (MLP) models is the best model to predict the cure class.
Collapse
|
33
|
Boosted Prediction of Antihypertensive Peptides Using Deep Learning. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11052316] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Heart attack and other heart-related diseases are among the main causes of fatalities in the world. These diseases and some other severe problems like kidney failure and paralysis are mainly caused by hypertension. Since bioactive peptides extracted from naturally existing food substances possess antihypertensive activity, these antihypertensive peptides (AHTP) can function as prospective replacements for existing pharmacological drugs with no or fewer side effects. Such naturally existing peptides can be identified using in-silico approaches. The in-silico methods have been proven to save huge amounts of time and money in the identification of effective peptides. The proposed methodology is a deep learning-based in-silico approach for the identification of antihypertensive peptides (AHTPs). An ensemble method is proposed that combines convolutional neural network (CNN) and support vector machine (SVM) classifiers. Amino acid composition (AAC) and g-gap dipeptide composition (DPC) techniques are used for feature extraction. The proposed methodology has been evaluated on two standard antihypertensive peptide sequence datasets. The model yields 95% accuracy on the benchmarking dataset and 88.9% accuracy on the independent dataset. Comparative analysis is provided to demonstrate that the proposed method outperforms existing state-of-the-art methods on both of the benchmarking and independent datasets.
Collapse
|
34
|
Auliah FN, Nilamyani AN, Shoombuatong W, Alam MA, Hasan MM, Kurata H. PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations. Int J Mol Sci 2021; 22:ijms22042120. [PMID: 33672741 PMCID: PMC7924619 DOI: 10.3390/ijms22042120] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 02/12/2021] [Accepted: 02/18/2021] [Indexed: 12/30/2022] Open
Abstract
Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.
Collapse
Affiliation(s)
- Firda Nurul Auliah
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Andi Nur Nilamyani
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA;
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Correspondence:
| |
Collapse
|
35
|
A Novel Approach for Cognitive Clustering of Parkinsonisms through Affinity Propagation. ALGORITHMS 2021. [DOI: 10.3390/a14020049] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Cluster analysis is widely applied in the neuropsychological field for exploring patterns in cognitive profiles, but traditional hierarchical and non-hierarchical approaches could be often poorly effective or even inapplicable on certain type of data. Moreover, these traditional approaches need the initial specification of the number of clusters, based on a priori knowledge not always owned. For this reason, we proposed a novel method for cognitive clustering through the affinity propagation (AP) algorithm. In particular, we applied the AP clustering on the regression residuals of the Mini Mental State Examination scores—a commonly used screening tool for cognitive impairment—of a cohort of 49 Parkinson’s disease, 48 Progressive Supranuclear Palsy and 44 healthy control participants. We found four clusters, where two clusters (68 and 30 participants) showed almost intact cognitive performance, one cluster had a moderate cognitive impairment (34 participants), and the last cluster had a more extensive cognitive deficit (8 participants). The findings showed, for the first time, an intra- and inter-diagnostic heterogeneity in the cognitive profile of Parkinsonisms patients. Our novel method of unsupervised learning could represent a reliable tool for supporting the neuropsychologists in understanding the natural structure of the cognitive performance in the neurodegenerative diseases.
Collapse
|
36
|
Prediction of Protein-ATP Binding Residues Based on Ensemble of Deep Convolutional Neural Networks and LightGBM Algorithm. Int J Mol Sci 2021; 22:ijms22020939. [PMID: 33477866 PMCID: PMC7832895 DOI: 10.3390/ijms22020939] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 01/13/2021] [Accepted: 01/16/2021] [Indexed: 12/13/2022] Open
Abstract
Accurately identifying protein-ATP binding residues is important for protein function annotation and drug design. Previous studies have used classic machine-learning algorithms like support vector machine (SVM) and random forest to predict protein-ATP binding residues; however, as new machine-learning techniques are being developed, the prediction performance could be further improved. In this paper, an ensemble predictor that combines deep convolutional neural network and LightGBM with ensemble learning algorithm is proposed. Three subclassifiers have been developed, including a multi-incepResNet-based predictor, a multi-Xception-based predictor, and a LightGBM predictor. The final prediction result is the combination of outputs from three subclassifiers with optimized weight distribution. We examined the performance of our proposed predictor using two datasets: a classic ATP-binding benchmark dataset and a newly proposed ATP-binding dataset. Our predictor achieved area under the curve (AUC) values of 0.925 and 0.902 and Matthews Correlation Coefficient (MCC) values of 0.639 and 0.642, respectively, which are both better than other state-of-art prediction methods.
Collapse
|
37
|
Quantitative Spectral Data Analysis Using Extreme Learning Machines Algorithm Incorporated with PCA. ALGORITHMS 2021. [DOI: 10.3390/a14010018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Extreme learning machine (ELM) is a popular randomization-based learning algorithm that provides a fast solution for many regression and classification problems. In this article, we present a method based on ELM for solving the spectral data analysis problem, which essentially is a class of inverse problems. It requires determining the structural parameters of a physical sample from the given spectroscopic curves. We proposed that the unknown target inverse function is approximated by an ELM through adding a linear neuron to correct the localized effect aroused by Gaussian basis functions. Unlike the conventional methods involving intensive numerical computations, under the new conceptual framework, the task of performing spectral data analysis becomes a learning task from data. As spectral data are typical high-dimensional data, the dimensionality reduction technique of principal component analysis (PCA) is applied to reduce the dimension of the dataset to ensure convergence. The proposed conceptual framework is illustrated using a set of simulated Rutherford backscattering spectra. The results have shown the proposed method can achieve prediction inaccuracies of less than 1%, which outperform the predictions from the multi-layer perceptron and numerical-based techniques. The presented method could be implemented as application software for real-time spectral data analysis by integrating it into a spectroscopic data collection system.
Collapse
|
38
|
Feature Selection from Lyme Disease Patient Survey Using Machine Learning. ALGORITHMS 2020. [DOI: 10.3390/a13120334] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry, MyLymeData, developed by the nonprofit LymeDisease.org. We apply various machine learning methods in order to measure the effect of individual features in predicting participants’ answers to the Global Rating of Change (GROC) survey questions that assess the self-reported degree to which their condition improved, worsened, or remained unchanged following antibiotic treatment. We use basic linear regression, support vector machines, neural networks, entropy-based decision tree models, and k-nearest neighbors approaches. We first analyze the general performance of the model and then identify the most important features for predicting participant answers to GROC. After we identify the “key” features, we separate them from the dataset and demonstrate the effectiveness of these features at identifying GROC. In doing so, we highlight possible directions for future study both mathematically and clinically.
Collapse
|
39
|
Abstract
The percentage of seniors in the global population is constantly growing and solutions in the field of fall detection and early detection of neuro-degenerative pathologies have a crucial role in order to increase life expectancy and quality of life. This study aims to extend fall detection and effective recognition of early signs of diseases to new smart environments, conceiving the decentralization of diagnostic monitoring in everyday life activities in a more pervasive paradigm. Inspiring to research outcomes, in this work an architecture is designed to detect falls in crowded indoor environments during events/exhibitions, for favoring a timely and effective intervention. It also foresees a continue monitoring of the gait for seniors during the visit, thus extracting key features which are stored on a dedicated database. The proposed solution allows third party researchers to perform analysis on the obtained gait datasets, through the adoption of advanced data-mining techniques for the detection of early signs of neuro-degenerative diseases and other pathologies. The architecture designed here aims to provide a step forward concerning the extension of smart monitoring environments for the detection of falls and early signs of pathologies in everyday life, in a more pervasive and decentralized paradigm.
Collapse
|