1
|
Shen G, Ye F, Cheng W, Li Q. A modified deep learning method for Alzheimer's disease detection based on the facial submicroscopic features in mice. Biomed Eng Online 2024; 23:109. [PMID: 39482695 PMCID: PMC11526719 DOI: 10.1186/s12938-024-01305-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Accepted: 10/25/2024] [Indexed: 11/03/2024] Open
Abstract
Alzheimer's disease (AD) is a chronic disease among people aged 65 and older. As the aging population continues to grow at a rapid pace, AD has emerged as a pressing public health issue globally. Early detection of the disease is important, because increasing evidence has illustrated that early diagnosis holds the key to effective treatment of AD. In this work, we developed and refined a multi-layer cyclic Residual convolutional neural network model, specifically tailored to identify AD-related submicroscopic characteristics in the facial images of mice. Our experiments involved classifying the mice into two distinct groups: a normal control group and an AD group. Compared with the other deep learning models, the proposed model achieved a better detection performance in the dataset of the mouse experiment. The accuracy, sensitivity, specificity and precision for AD identification with our proposed model were as high as 99.78%, 100%, 99.65% and 99.44%, respectively. Moreover, the heat maps of AD correlation in the facial images of the mice were acquired with the class activation mapping algorithm. It was proven that the facial images contained AD-related submicroscopic features. Consequently, through our mouse experiments, we validated the feasibility and accuracy of utilizing a facial image-based deep learning model for AD identification. Therefore, the present study suggests the potential of using facial images for AD detection in humans through deep learning-based methods.
Collapse
Affiliation(s)
- Guosheng Shen
- Institute of Modern Physics, Chinese Academy of Sciences, 509 Nanchang Road, Lanzhou, 730000, Gansu Province, China
- Key Laboratory of Basic Research On Heavy Ion Radiation Application in Medicine, Lanzhou, 730000, Gansu Province, China
- Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou, 730000, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Fei Ye
- Institute of Modern Physics, Chinese Academy of Sciences, 509 Nanchang Road, Lanzhou, 730000, Gansu Province, China
- Key Laboratory of Basic Research On Heavy Ion Radiation Application in Medicine, Lanzhou, 730000, Gansu Province, China
- Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou, 730000, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wei Cheng
- Institute of Modern Physics, Chinese Academy of Sciences, 509 Nanchang Road, Lanzhou, 730000, Gansu Province, China
- Key Laboratory of Basic Research On Heavy Ion Radiation Application in Medicine, Lanzhou, 730000, Gansu Province, China
- Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou, 730000, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qiang Li
- Institute of Modern Physics, Chinese Academy of Sciences, 509 Nanchang Road, Lanzhou, 730000, Gansu Province, China.
- Key Laboratory of Basic Research On Heavy Ion Radiation Application in Medicine, Lanzhou, 730000, Gansu Province, China.
- Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou, 730000, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
2
|
Nyholm J, Ghazi AN, Ghazi SN, Sanmartin Berglund J. Prediction of dementia based on older adults' sleep disturbances using machine learning. Comput Biol Med 2024; 171:108126. [PMID: 38342045 DOI: 10.1016/j.compbiomed.2024.108126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/14/2023] [Accepted: 02/06/2024] [Indexed: 02/13/2024]
Abstract
BACKGROUND The most common degenerative condition in older adults is dementia, which can be predicted using a number of indicators and whose progression can be slowed down. One of the indicators of an increased risk of dementia is sleep disturbances. This study aims to examine if machine learning can predict dementia and which sleep disturbance factors impact dementia. METHODS This study uses five machine learning algorithms (gradient boosting, logistic regression, gaussian naive Bayes, random forest and support vector machine) and data on the older population (60+) in Sweden from the Swedish National Study on Ageing and Care - Blekinge (n=4175). Each algorithm uses 10-fold stratified cross-validation to obtain the results, which consist of the Brier score for checking accuracy and the feature importance for examining the factors which impact dementia. The algorithms use 16 features which are on personal and sleep disturbance factors. RESULTS Logistic regression found an association between dementia and sleep disturbances. However, it is slight for the features in the study. Gradient boosting was the most accurate algorithm with 92.9% accuracy, 0.926 f1-score, 0.974 ROC AUC and 0.056 Brier score. The significant factors were different in each machine learning algorithm. If the person sleeps more than two hours during the day, their sex, education level, age, waking up during the night and if the person snores are the variables that most consistently have the highest feature importance in all algorithms. CONCLUSION There is an association between sleep disturbances and dementia, which machine learning algorithms can predict. Furthermore, the risk factors for dementia are different across the algorithms, but sleep disturbances can predict dementia.
Collapse
Affiliation(s)
- Joel Nyholm
- Department of Computer Science, Blekinge Institute of Technology, Karlskrona, 37179, Blekinge, Sweden
| | - Ahmad Nauman Ghazi
- Department of Software Engineering, Blekinge Institute of Technology, Karlskrona, 37179, Blekinge, Sweden.
| | - Sarah Nauman Ghazi
- Department of Health, Blekinge Institute of Technology, Karlskrona, 37179, Blekinge, Sweden
| | | |
Collapse
|
3
|
Xu X, Li J, Zhu Z, Zhao L, Wang H, Song C, Chen Y, Zhao Q, Yang J, Pei Y. A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical Diagnosis. Bioengineering (Basel) 2024; 11:219. [PMID: 38534493 PMCID: PMC10967767 DOI: 10.3390/bioengineering11030219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/28/2024] Open
Abstract
Disease diagnosis represents a critical and arduous endeavor within the medical field. Artificial intelligence (AI) techniques, spanning from machine learning and deep learning to large model paradigms, stand poised to significantly augment physicians in rendering more evidence-based decisions, thus presenting a pioneering solution for clinical practice. Traditionally, the amalgamation of diverse medical data modalities (e.g., image, text, speech, genetic data, physiological signals) is imperative to facilitate a comprehensive disease analysis, a topic of burgeoning interest among both researchers and clinicians in recent times. Hence, there exists a pressing need to synthesize the latest strides in multi-modal data and AI technologies in the realm of medical diagnosis. In this paper, we narrow our focus to five specific disorders (Alzheimer's disease, breast cancer, depression, heart disease, epilepsy), elucidating advanced endeavors in their diagnosis and treatment through the lens of artificial intelligence. Our survey not only delineates detailed diagnostic methodologies across varying modalities but also underscores commonly utilized public datasets, the intricacies of feature engineering, prevalent classification models, and envisaged challenges for future endeavors. In essence, our research endeavors to contribute to the advancement of diagnostic methodologies, furnishing invaluable insights for clinical decision making.
Collapse
Affiliation(s)
- Xi Xu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (X.X.); (J.L.); (Z.Z.); (L.Z.); (H.W.); (C.S.); (Y.C.)
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (X.X.); (J.L.); (Z.Z.); (L.Z.); (H.W.); (C.S.); (Y.C.)
| | - Zhichao Zhu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (X.X.); (J.L.); (Z.Z.); (L.Z.); (H.W.); (C.S.); (Y.C.)
| | - Linna Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (X.X.); (J.L.); (Z.Z.); (L.Z.); (H.W.); (C.S.); (Y.C.)
| | - Huina Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (X.X.); (J.L.); (Z.Z.); (L.Z.); (H.W.); (C.S.); (Y.C.)
| | - Changwei Song
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (X.X.); (J.L.); (Z.Z.); (L.Z.); (H.W.); (C.S.); (Y.C.)
| | - Yining Chen
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (X.X.); (J.L.); (Z.Z.); (L.Z.); (H.W.); (C.S.); (Y.C.)
| | - Qing Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (X.X.); (J.L.); (Z.Z.); (L.Z.); (H.W.); (C.S.); (Y.C.)
| | - Jijiang Yang
- Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;
| | - Yan Pei
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan;
| |
Collapse
|
4
|
Rana MM, Islam MM, Talukder MA, Uddin MA, Aryal S, Alotaibi N, Alyami SA, Hasan KF, Moni MA. A robust and clinically applicable deep learning model for early detection of Alzheimer's. IET IMAGE PROCESSING 2023; 17:3959-3975. [DOI: 10.1049/ipr2.12910] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/06/2023] [Indexed: 12/30/2024]
Abstract
AbstractAlzheimer's disease, often known as dementia, is a severe neurodegenerative disorder that causes irreversible memory loss by destroying brain cells. People die because there is no specific treatment for this disease. Alzheimer's is most common among seniors 65 years and older. However, the progress of this disease can be reduced if it can be diagnosed earlier. Recently, artificial intelligence has instilled hope in the diagnosis of Alzheimer's disease by performing sophisticated analyses on extensive patient datasets, enabling the identification of subtle patterns that may elude human experts. Researchers have investigated various deep learning and machine learning models to diagnose this disease at an early stage using image datasets. In this paper, a new Deep learning (DL) methodology is proposed, where MRI images are fed into the model after applying various pre‐processing techniques. The proposed Alzheimer's disease detection approach adopts transfer learning for multi‐class classification using brain MRIs. The MRI Images are classified into four categories: mild dementia (MD), moderate dementia (MOD), very mild dementia (VMD), and non‐dementia (ND). The model is implemented and extensive performance analysis is performed. The finding shows that the model obtains 97.31% accuracy. The model outperforms the state‐of‐the‐art models in terms of accuracy, precision, recall, and F‐score.
Collapse
Affiliation(s)
- Md Masud Rana
- Department of Computer Science and Engineering Jagannath University Dhaka Bangladesh
| | - Md Manowarul Islam
- Department of Computer Science and Engineering Jagannath University Dhaka Bangladesh
| | - Md. Alamin Talukder
- Department of Computer Science and Engineering Jagannath University Dhaka Bangladesh
| | - Md Ashraf Uddin
- School of Information Technology Deakin University Geelong Waurn Ponds Campus Australia
| | - Sunil Aryal
- School of Information Technology Deakin University Geelong Waurn Ponds Campus Australia
| | - Naif Alotaibi
- Department of Mathematics and Statistics Faculty of Science Imam Mohammad Ibn Saud Islamic University (IMSIU) Riyadh Saudi Arabia
| | - Salem A. Alyami
- Department of Mathematics and Statistics Faculty of Science Imam Mohammad Ibn Saud Islamic University (IMSIU) Riyadh Saudi Arabia
| | - Khondokar Fida Hasan
- School of Professional Studies University of New South Wales Canberra ACT Australia
| | - Mohammad Ali Moni
- AI and Cyber Futures Institute Charles Stuart University Bathurst NSW Australia
| |
Collapse
|
5
|
Alatrany AS, Khan W, Hussain AJ, Mustafina J, Al-Jumeily D. Transfer Learning for Classification of Alzheimer's Disease Based on Genome Wide Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2700-2711. [PMID: 37018274 DOI: 10.1109/tcbb.2022.3233869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Alzheimer's disease (AD) is a type of brain disorder that is regarded as a degenerative disease because the corresponding symptoms aggravate with the time progression. Single nucleotide polymorphisms (SNPs) have been identified as relevant biomarkers for this condition. This study aims to identify SNPs biomarkers associated with the AD in order to perform a reliable classification of AD. In contrast to existing related works, we utilize deep transfer learning with varying experimental analysis for reliable classification of AD. For this purpose, the convolutional neural networks (CNN) are firstly trained over the genome-wide association studies (GWAS) dataset requested from the AD neuroimaging initiative. We then employ the deep transfer learning for further training of our CNN (as base model) over a different AD GWAS dataset, to extract the final set of features. The extracted features are then fed into Support Vector Machine for classification of AD. Detailed experiments are performed using multiple datasets and varying experimental configurations. The statistical outcomes indicate an accuracy of 89% which is a significant improvement when benchmarked with existing related works.
Collapse
|
6
|
Jin Y, Ren Z, Wang W, Zhang Y, Zhou L, Yao X, Wu T. Classification of Alzheimer's disease using robust TabNet neural networks on genetic data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:8358-8374. [PMID: 37161202 DOI: 10.3934/mbe.2023366] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Alzheimer's disease (AD) is one of the most common neurodegenerative diseases and its onset is significantly associated with genetic factors. Being the capabilities of high specificity and accuracy, genetic testing has been considered as an important technique for AD diagnosis. In this paper, we presented an improved deep learning (DL) algorithm, namely differential genes screening TabNet (DGS-TabNet) for AD binary and multi-class classifications. For performance evaluation, our proposed approach was compared with three novel DLs of multi-layer perceptron (MLP), neural oblivious decision ensembles (NODE), TabNet as well as five classical machine learnings (MLs) including decision tree (DT), random forests (RF), gradient boosting decision tree (GBDT), light gradient boosting machine (LGBM) and support vector machine (SVM) on the public data set of gene expression omnibus (GEO). Moreover, the biological interpretability of global important genetic features implemented for AD classification was revealed by the Kyoto encyclopedia of genes and genomes (KEGG) and gene ontology (GO). The results demonstrated that our proposed DGS-TabNet achieved the best performance with an accuracy of 93.80% for binary classification, and with an accuracy of 88.27% for multi-class classification. Meanwhile, the gene pathway analyses demonstrated that there existed two most important global genetic features of AVIL and NDUFS4 and those obtained 22 feature genes were partially correlated with AD pathogenesis. It was concluded that the proposed DGS-TabNet could be used to detect AD-susceptible genes and the biological interpretability of susceptible genes also revealed the potential possibility of being AD biomarkers.
Collapse
Affiliation(s)
- Yu Jin
- College of Medical Imaging, Jiading District Central Hospital affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Zhe Ren
- College of Medical Imaging, Jiading District Central Hospital affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Wenjie Wang
- College of Medical Imaging, Jiading District Central Hospital affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Yulei Zhang
- College of Medical Imaging, Jiading District Central Hospital affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Liang Zhou
- College of Medical Imaging, Jiading District Central Hospital affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
| | - Xufeng Yao
- College of Medical Imaging, Jiading District Central Hospital affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
| | - Tao Wu
- College of Medical Imaging, Jiading District Central Hospital affiliated Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
| |
Collapse
|
7
|
Han K, Wang J, Wang Y, Zhang L, Yu M, Xie F, Zheng D, Xu Y, Ding Y, Wan J. A review of methods for predicting DNA N6-methyladenine sites. Brief Bioinform 2023; 24:6887111. [PMID: 36502371 DOI: 10.1093/bib/bbac514] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 10/07/2022] [Accepted: 10/27/2022] [Indexed: 12/14/2022] Open
Abstract
Deoxyribonucleic acid(DNA) N6-methyladenine plays a vital role in various biological processes, and the accurate identification of its site can provide a more comprehensive understanding of its biological effects. There are several methods for 6mA site prediction. With the continuous development of technology, traditional techniques with the high costs and low efficiencies are gradually being replaced by computer methods. Computer methods that are widely used can be divided into two categories: traditional machine learning and deep learning methods. We first list some existing experimental methods for predicting the 6mA site, then analyze the general process from sequence input to results in computer methods and review existing model architectures. Finally, the results were summarized and compared to facilitate subsequent researchers in choosing the most suitable method for their work.
Collapse
Affiliation(s)
- Ke Han
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China.,College of Pharmacy, Harbin University of Commerce, Harbin, 150076, China
| | - Jianchun Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yu Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Lei Zhang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Mengyao Yu
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Fang Xie
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Dequan Zheng
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yaoqun Xu
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Jie Wan
- Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
8
|
Tsoi KKF, Jia P, Dowling NM, Titiner JR, Wagner M, Capuano AW, Donohue MC. Applications of artificial intelligence in dementia research. CAMBRIDGE PRISMS. PRECISION MEDICINE 2022; 1:e9. [PMID: 38550934 PMCID: PMC10953738 DOI: 10.1017/pcm.2022.10] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 10/24/2022] [Accepted: 11/08/2022] [Indexed: 11/06/2024]
Abstract
More than 50 million older people worldwide are suffering from dementia, and this number is estimated to increase to 150 million by 2050. Greater caregiver burdens and financial impacts on the healthcare system are expected as we wait for an effective treatment for dementia. Researchers are constantly exploring new therapies and screening approaches for the early detection of dementia. Artificial intelligence (AI) is widely applied in dementia research, including machine learning and deep learning methods for dementia diagnosis and progression detection. Computerized apps are also convenient tools for patients and caregivers to monitor cognitive function changes. Furthermore, social robots can potentially provide daily life support or guidance for the elderly who live alone. This review aims to provide an overview of AI applications in dementia research. We divided the applications into three categories according to different stages of cognitive impairment: (1) cognitive screening and training, (2) diagnosis and prognosis for dementia, and (3) dementia care and interventions. There are numerous studies on AI applications for dementia research. However, one challenge that remains is comparing the effectiveness of different AI methods in real clinical settings.
Collapse
Affiliation(s)
- Kelvin K. F. Tsoi
- JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Sha Tin, Hong Kong
- Stanley Ho Big Data Decision Analytics Research Centre, The Chinese University of Hong Kong, Sha Tin, Hong Kong
| | - Pingping Jia
- JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Sha Tin, Hong Kong
| | - N. Maritza Dowling
- Department of Acute and Chronic tableCare, School of Nursing, The George Washington University, Washington, DC, USA
- Department of Epidemiology and Biostatistics, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA
| | | | - Maude Wagner
- Department of Neurological Sciences, Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Ana W. Capuano
- Department of Neurological Sciences, Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Michael C. Donohue
- Alzheimer’s Therapeutic Research Institute (ATRI), University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
9
|
Feng C, Wu J, Wei H, Xu L, Zou Q. CRCF: A Method of Identifying Secretory Proteins of Malaria Parasites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2149-2157. [PMID: 34061749 DOI: 10.1109/tcbb.2021.3085589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Malaria is a mosquito-borne disease that results in millions of cases and deaths annually. The development of a fast computational method that identifies secretory proteins of the malaria parasite is important for research on antimalarial drugs and vaccines. Thus, a method was developed to identify the secretory proteins of malaria parasites. In this method, a reduced alphabet was selected to recode the original protein sequence. A feature synthesis method was used to synthesise three different types of feature information. Finally, the random forest method was used as a classifier to identify the secretory proteins. In addition, a web server was developed to share the proposed algorithm. Experiments using the benchmark dataset demonstrated that the overall accuracy achieved by the proposed method was greater than 97.8 percent using the 10-fold cross-validation method. Furthermore, the reduced schemes and characteristic performance analyses are discussed.
Collapse
|
10
|
de Dieu Uwisengeyimana J, Nguchu BA, Wang Y, Zhang D, Liu Y, Jiang Z, Wang X, Qiu B. Longitudinal resting-state functional connectivity and regional brain atrophy-based biomarkers of preclinical cognitive impairment in healthy old adults. Aging Clin Exp Res 2022; 34:1303-1313. [PMID: 35023051 DOI: 10.1007/s40520-021-02067-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Accepted: 12/27/2021] [Indexed: 11/28/2022]
Abstract
BACKGROUND Intervention against age-related neurodegenerative diseases may be difficult once extensive structural and functional deteriorations have already occurred in the brain. AIM Investigating 6-year longitudinal changes and implications of regional brain atrophy and functional connectivity in the triple-network model as biomarkers of preclinical cognitive impairment in healthy aging. METHODS We acquired longitudinal cognitive scores and magnetic resonance imaging (MRI) data from 74 healthy old adults. Resting-state functional MRI (rs-fMRI) analysis was conducted using FSL6.0.1 to examine functional connectivity changes and regional brain morphometries were quantified using FreeSurfer5.3. Finally, we cross-validated and compared two support vector machine (SVM) regression models to predict future 6-year cognition score from the baseline regional brain atrophy and resting-state functional connectivity (rs-FC) measures. RESULTS After a 6-year follow-up, our results (P < 0.05-corrected) indicated significant connectivity reduction within all the three brain networks, significant differences in regional brain volumes and cortical thickness. We also observed significant improvement in episodic memory and significant decline in executive functions. Finally, comparing the two models, we observed that regional brain atrophy predictors were more efficient in approximating future 6-year cognitive scores (R = 0.756, P < 0.0001) than rs-FC predictors (R = 0.6, P < 0.0001). CONCLUSION This study used longitudinal data to keep subject variability low and to increase the validity of the results. We demonstrated significant changes in structural and functional MRI over 6 years. Our findings present a potential neuroimaging-based biomarker to detect cognitive impairment and prevent risks of neurodegenerative diseases in healthy old adults.
Collapse
Affiliation(s)
- Jean de Dieu Uwisengeyimana
- Hefei National Lab for Physical Sciences at the Microscale and Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, 230026, Anhui, China
- Department of Electrical and Electronics Engineering, College of Science and Technology, University of Rwanda, Kigali, Rwanda
| | - Benedictor Alexander Nguchu
- Hefei National Lab for Physical Sciences at the Microscale and Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Yaming Wang
- Hefei National Lab for Physical Sciences at the Microscale and Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Du Zhang
- Hefei National Lab for Physical Sciences at the Microscale and Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Yanpeng Liu
- Hefei National Lab for Physical Sciences at the Microscale and Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Zhoufan Jiang
- Hefei National Lab for Physical Sciences at the Microscale and Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Xiaoxiao Wang
- Hefei National Lab for Physical Sciences at the Microscale and Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, 230026, Anhui, China.
| | - Bensheng Qiu
- Hefei National Lab for Physical Sciences at the Microscale and Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, 230026, Anhui, China.
| |
Collapse
|
11
|
Wavelet-Based Fractal Analysis of rs-fMRI for Classification of Alzheimer's Disease. SENSORS 2022; 22:s22093102. [PMID: 35590793 PMCID: PMC9100383 DOI: 10.3390/s22093102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/11/2022] [Accepted: 04/15/2022] [Indexed: 12/04/2022]
Abstract
The resting-state functional magnetic resonance imaging (rs-fMRI) modality has gained widespread acceptance as a promising method for analyzing a variety of neurological and psychiatric diseases. It is established that resting-state neuroimaging data exhibit fractal behavior, manifested in the form of slow-decaying auto-correlation and power-law scaling of the power spectrum across low-frequency components. With this property, the rs-fMRI signal can be broken down into fractal and nonfractal components. The fractal nature originates from several sources, such as cardiac fluctuations, respiration and system noise, and carries no information on the brain’s neuronal activities. As a result, the conventional correlation of rs-fMRI signals may not accurately reflect the functional dynamic of spontaneous neuronal activities. This problem can be solved by using a better representation of neuronal activities provided by the connectivity of nonfractal components. In this work, the nonfractal connectivity of rs-fMRI is used to distinguish Alzheimer’s patients from healthy controls. The automated anatomical labeling (AAL) atlas is used to extract the blood-oxygenation-level-dependent time series signals from 116 brain regions, yielding a 116 × 116 nonfractal connectivity matrix. From this matrix, significant connections evaluated using the p-value are selected as an input to a classifier for the classification of Alzheimer’s vs. normal controls. The nonfractal-based approach provides a good representation of the brain’s neuronal activity. It outperformed the fractal and Pearson-based connectivity approaches by 16.4% and 17.2%, respectively. The classification algorithm developed based on the nonfractal connectivity feature and support vector machine classifier has shown an excellent performance, with an accuracy of 90.3% and 83.3% for the XHSLF dataset and ADNI dataset, respectively. For further validation of our proposed work, we combined the two datasets (XHSLF+ADNI) and still received an accuracy of 90.2%. The proposed work outperformed the recently published work by a margin of 8.18% and 11.2%, respectively.
Collapse
|
12
|
Zhang H, Zou Q, Ju Y, Song C, Chen D. Distance-based support vector machine to predict DNA N6-methyladenine modification. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220404145517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
DNA N6-methyladenine plays an important role in the restriction-modification system to isolate invasion from adventive DNA. The shortcomings of the high time-consumption and high costs of experimental methods have been exposed, and some computational methods have emerged. The support vector machine theory has received extensive attention in the bioinformatics field due to its solid theoretical foundation and many good characteristics.
Objective:
General machine learning methods include an important step of extracting features. The research has omitted this step and replaced with easy-to-obtain sequence distances matrix to obtain better results
Method:
First sequence alignment technology was used to achieve the similarity matrix. Then a novel transformation turned the similarity matrix into a distance matrix. Next, the similarity-distance matrix is made positive semi-definite so that it can be used in the kernel matrix. Finally, the LIBSVM software was applied to solve the support vector machine.
Results:
The five-fold cross-validation of this model on rice and mouse data has achieved excellent accuracy rates of 92.04% and 96.51%, respectively. This shows that the DB-SVM method has obvious advantages compared with traditional machine learning methods. Meanwhile this model achieved 0.943,0.982 and 0.818 accuracy,0.944, 0.982, and 0.838 Matthews correlation coefficient and 0.942, 0.982 and 0.840 F1 scores for the rice, M. musculus and cross-species genome datasets, respectively.
Conclusion:
These outcomes show that this model outperforms the iIM-CNN and csDMA in the prediction of DNA 6mA modification, which are the lastest research on DNA 6mA.
Collapse
Affiliation(s)
- Haoyu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Chenggang Song
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou 324000, China
| |
Collapse
|
13
|
Machine learning techniques for diagnosis of alzheimer disease, mild cognitive disorder, and other types of dementia. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103293] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
14
|
Li J, He S, Guo F, Zou Q. HSM6AP: a high-precision predictor for the Homo sapiens N6-methyladenosine (m^6 A) based on multiple weights and feature stitching. RNA Biol 2021; 18:1882-1892. [PMID: 33446014 PMCID: PMC8583144 DOI: 10.1080/15476286.2021.1875180] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 12/02/2020] [Accepted: 01/08/2021] [Indexed: 01/21/2023] Open
Abstract
Recent studies have shown that RNA methylation modification can affect RNA transcription, metabolism, splicing and stability. In addition, RNA methylation modification has been associated with cancer, obesity and other diseases. Based on information about human genome and machine learning, this paper discusses the effect of the fusion sequence and gene-level feature extraction on the accuracy of methylation site recognition. The significant limitation of existing computing tools was exposed by discovered of new features. (1) Most prediction models are based solely on sequence features and use SVM or random forest as classification methods. (2) Limited by the number of samples, the model may not achieve good performance. In order to establish a better prediction model for methylation sites, we must set specific weighting strategies for training samples and find more powerful and informative feature matrices to establish a comprehensive model. In this paper, we present HSM6AP, a high-precision predictor for the Homo sapiens N6-methyladenosine (m 6 A ) based on multiple weights and feature stitching. Compared with existing methods, HSM6AP samples were creatively weighted during training, and a wide range of features were explored. Max-Relevance-Max-Distance (MRMD) is employed for feature selection, and the feature matrix is generated by fusing a single feature. The extreme gradient boosting (XGBoost), an integrated machine learning algorithm based on decision tree, is used for model training and improves model performance through parameter adjustment. Two rigorous independent data sets demonstrated the superiority of HSM6AP in identifying methylation sites. HSM6AP is an advanced predictor that can be directly employed by users (especially non-professional users) to predict methylation sites. Users can access our related tools and data sets at the following website: http://lab.malab.cn/~lijing/HSM6AP.html The codes of our tool can be publicly accessible at https://github.com/lijingtju/HSm6AP.git.
Collapse
Affiliation(s)
- Jing Li
- Institute of computational biology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Shida He
- Institute of computational biology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- Institute of computational biology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Bioinformatics Laboratory, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
15
|
Tan MS, Cheah PL, Chin AV, Looi LM, Chang SW. A review on omics-based biomarkers discovery for Alzheimer's disease from the bioinformatics perspectives: Statistical approach vs machine learning approach. Comput Biol Med 2021; 139:104947. [PMID: 34678481 DOI: 10.1016/j.compbiomed.2021.104947] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 10/12/2021] [Accepted: 10/12/2021] [Indexed: 12/26/2022]
Abstract
Alzheimer's Disease (AD) is a neurodegenerative disease that affects cognition and is the most common cause of dementia in the elderly. As the number of elderly individuals increases globally, the incidence and prevalence of AD are expected to increase. At present, AD is diagnosed clinically, according to accepted criteria. The essential elements in the diagnosis of AD include a patients history, a physical examination and neuropsychological testing, in addition to appropriate investigations such as neuroimaging. The omics-based approach is an emerging field of study that may not only aid in the diagnosis of AD but also facilitate the exploration of factors that influence the development of the disease. Omics techniques, including genomics, transcriptomics, proteomics and metabolomics, may reveal the pathways that lead to neuronal death and identify biomolecular markers associated with AD. This will further facilitate an understanding of AD neuropathology. In this review, omics-based approaches that were implemented in studies on AD were assessed from a bioinformatics perspective. Current state-of-the-art statistical and machine learning approaches used in the single omics analysis of AD were compared based on correlations of variants, differential expression, functional analysis and network analysis. This was followed by a review of the approaches used in the integration and analysis of multi-omics of AD. The strengths and limitations of multi-omics analysis methods were explored and the issues and challenges associated with omics studies of AD were highlighted. Lastly, future studies in this area of research were justified.
Collapse
Affiliation(s)
- Mei Sze Tan
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Phaik-Leng Cheah
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Ai-Vyrn Chin
- Division of Geriatric Medicine, Department of Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Lai-Meng Looi
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Siow-Wee Chang
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia.
| |
Collapse
|
16
|
Minhas S, Khanum A, Alvi A, Riaz F, Khan SA, Alsolami F, A Khan M. Early MCI-to-AD Conversion Prediction Using Future Value Forecasting of Multimodal Features. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:6628036. [PMID: 34608385 PMCID: PMC8487363 DOI: 10.1155/2021/6628036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 07/27/2021] [Accepted: 08/12/2021] [Indexed: 11/18/2022]
Abstract
In Alzheimer's disease (AD) progression, it is imperative to identify the subjects with mild cognitive impairment before clinical symptoms of AD appear. This work proposes a technique for decision support in identifying subjects who will show transition from mild cognitive impairment (MCI) to Alzheimer's disease (AD) in the future. We used robust predictors from multivariate MRI-derived biomarkers and neuropsychological measures and tracked their longitudinal trajectories to predict signs of AD in the MCI population. Assuming piecewise linear progression of the disease, we designed a novel weighted gradient offset-based technique to forecast the future marker value using readings from at least two previous follow-up visits. Later, the complete predictor trajectories are used as features for a standard support vector machine classifier to identify MCI-to-AD progressors amongst the MCI patients enrolled in the Alzheimer's disease neuroimaging initiative (ADNI) cohort. We explored the performance of both unimodal and multimodal models in a 5-fold cross-validation setup. The proposed technique resulted in a high classification AUC of 91.2% and 95.7% for 6-month- and 1-year-ahead AD prediction, respectively, using multimodal markers. In the end, we discuss the efficacy of MRI markers as compared to NM for MCI-to-AD conversion prediction.
Collapse
Affiliation(s)
- Sidra Minhas
- Department of Computer Science, Forman Christian College University, Lahore, Pakistan
| | - Aasia Khanum
- Department of Computer Science, Forman Christian College University, Lahore, Pakistan
| | - Atif Alvi
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| | - Farhan Riaz
- Department of Computer Engineering, National University of Sciences & Technology, EME College, Rawalpindi, Pakistan
| | - Shoab A Khan
- Department of Computer Engineering, National University of Sciences & Technology, EME College, Rawalpindi, Pakistan
| | - Fawaz Alsolami
- Department of Computer Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muazzam A Khan
- Department of Computer Sciences, Quaid I Azam University, Islamabad, Pakistan
| |
Collapse
|
17
|
Xu L, Ru X, Song R. Application of Machine Learning for Drug-Target Interaction Prediction. Front Genet 2021; 12:680117. [PMID: 34234813 PMCID: PMC8255962 DOI: 10.3389/fgene.2021.680117] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 05/28/2021] [Indexed: 11/13/2022] Open
Abstract
Exploring drug–target interactions by biomedical experiments requires a lot of human, financial, and material resources. To save time and cost to meet the needs of the present generation, machine learning methods have been introduced into the prediction of drug–target interactions. The large amount of available drug and target data in existing databases, the evolving and innovative computer technologies, and the inherent characteristics of various types of machine learning have made machine learning techniques the mainstream method for drug–target interaction prediction research. In this review, details of the specific applications of machine learning in drug–target interaction prediction are summarized, the characteristics of each algorithm are analyzed, and the issues that need to be further addressed and explored for future research are discussed. The aim of this review is to provide a sound basis for the construction of high-performance models.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Xiaoqing Ru
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Rong Song
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
18
|
Xu L, Jiang S, Wu J, Zou Q. An in silico approach to identification, categorization and prediction of nucleic acid binding proteins. Brief Bioinform 2021. [PMID: 32793956 DOI: 10.1101/2020.05.05.078741] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023] Open
Abstract
The interaction between proteins and nucleic acid plays an important role in many processes, such as transcription, translation and DNA repair. The mechanisms of related biological events can be understood by exploring the function of proteins in these interactions. The number of known protein sequences has increased rapidly in recent years, but the databases for describing the structure and function of protein have unfortunately grown quite slowly. Thus, improving such databases is meaningful for predicting protein-nucleic acid interactions. Furthermore, the mechanism of related biological events, such as viral infection or designing novel drug targets, can be further understood by understanding the function of proteins in these interactions. The information for each sequence, including its function and interaction sites, were collected and identified, and a database called PNIDB was built. The proteins in PNIDB were grouped into 27 classes, such as transcription, immune system, and structural protein, etc. The function of each protein was then predicted using a machine learning method. Using our method, the predictor was trained on labeled sequences, and then the function of a protein was predicted based on the trained classifier. The prediction accuracy achieved a score of 77.43% by 10-fold cross validation.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| | | | - Jin Wu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| | - Quan Zou
- School of Management, Shenzhen Polytechnic
| |
Collapse
|
19
|
Abstract
Background:
Bioluminescence is a unique and significant phenomenon in nature.
Bioluminescence is important for the lifecycle of some organisms and is valuable in biomedical
research, including for gene expression analysis and bioluminescence imaging technology. In recent
years, researchers have identified a number of methods for predicting bioluminescent proteins
(BLPs), which have increased in accuracy, but could be further improved.
Method:
In this study, a new bioluminescent proteins prediction method, based on a voting
algorithm, is proposed. Four methods of feature extraction based on the amino acid sequence were
used. 314 dimensional features in total were extracted from amino acid composition,
physicochemical properties and k-spacer amino acid pair composition. In order to obtain the highest
MCC value to establish the optimal prediction model, a voting algorithm was then used to build the
model. To create the best performing model, the selection of base classifiers and vote counting rules
are discussed.
Results:
The proposed model achieved 93.4% accuracy, 93.4% sensitivity and
91.7% specificity in the test set, which was better than any other method. A previous prediction of
bioluminescent proteins in three lineages was also improved using the model building method,
resulting in greatly improved accuracy.
Collapse
Affiliation(s)
- Shulin Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba Science City, Japan
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Shuguang Han
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
20
|
Tăuţan AM, Ionescu B, Santarnecchi E. Artificial intelligence in neurodegenerative diseases: A review of available tools with a focus on machine learning techniques. Artif Intell Med 2021; 117:102081. [PMID: 34127244 DOI: 10.1016/j.artmed.2021.102081] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 02/21/2021] [Accepted: 04/26/2021] [Indexed: 10/21/2022]
Abstract
Neurodegenerative diseases have shown an increasing incidence in the older population in recent years. A significant amount of research has been conducted to characterize these diseases. Computational methods, and particularly machine learning techniques, are now very useful tools in helping and improving the diagnosis as well as the disease monitoring process. In this paper, we provide an in-depth review on existing computational approaches used in the whole neurodegenerative spectrum, namely for Alzheimer's, Parkinson's, and Huntington's Diseases, Amyotrophic Lateral Sclerosis, and Multiple System Atrophy. We propose a taxonomy of the specific clinical features, and of the existing computational methods. We provide a detailed analysis of the various modalities and decision systems employed for each disease. We identify and present the sleep disorders which are present in various diseases and which represent an important asset for onset detection. We overview the existing data set resources and evaluation metrics. Finally, we identify current remaining open challenges and discuss future perspectives.
Collapse
Affiliation(s)
- Alexandra-Maria Tăuţan
- University "Politehnica" of Bucharest, Splaiul Independenţei 313, 060042 Bucharest, Romania.
| | - Bogdan Ionescu
- University "Politehnica" of Bucharest, Splaiul Independenţei 313, 060042 Bucharest, Romania.
| | - Emiliano Santarnecchi
- Berenson-Allen Center for Noninvasive Brain Stimulation, Harvard Medical School, 330 Brookline Avenue, Boston, United States.
| |
Collapse
|
21
|
Zhao X, Yao H, Li X. Unearthing of Key Genes Driving the Pathogenesis of Alzheimer's Disease via Bioinformatics. Front Genet 2021; 12:641100. [PMID: 33936168 PMCID: PMC8085575 DOI: 10.3389/fgene.2021.641100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 03/15/2021] [Indexed: 01/23/2023] Open
Abstract
Alzheimer’s disease (AD) is a neurodegenerative disease with unelucidated molecular pathogenesis. Herein, we aimed to identify potential hub genes governing the pathogenesis of AD. The AD datasets of GSE118553 and GSE131617 were collected from the NCBI GEO database. The weighted gene coexpression network analysis (WGCNA), differential gene expression analysis, and functional enrichment analysis were performed to reveal the hub genes and verify their role in AD. Hub genes were validated by machine learning algorithms. We identified modules and their corresponding hub genes from the temporal cortex (TC), frontal cortex (FC), entorhinal cortex (EC), and cerebellum (CE). We obtained 33, 42, 42, and 41 hub genes in modules associated with AD in TC, FC, EC, and CE tissues, respectively. Significant differences were recorded in the expression levels of hub genes between AD and the control group in the TC and EC tissues (P < 0.05). The differences in the expressions of FCGRT, SLC1A3, PTN, PTPRZ1, and PON2 in the FC and CE tissues among the AD and control groups were significant (P < 0.05). The expression levels of PLXNB1, GRAMD3, and GJA1 were statistically significant between the Braak NFT stages of AD. Overall, our study uncovered genes that may be involved in AD pathogenesis and revealed their potential for the development of AD biomarkers and appropriate AD therapeutics targets.
Collapse
Affiliation(s)
- Xingxing Zhao
- Department of Neurology, Bethune Hospital Affiliated to Shanxi Medical University, Taiyuan, China.,Department of Cardiology, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Hongmei Yao
- Department of Cardiology, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Xinyi Li
- Department of Neurology, Bethune Hospital Affiliated to Shanxi Medical University, Taiyuan, China
| |
Collapse
|
22
|
Niu M, Lin Y, Zou Q. sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks. PLANT MOLECULAR BIOLOGY 2021; 105:483-495. [PMID: 33385273 DOI: 10.1007/s11103-020-01102-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 12/01/2020] [Indexed: 06/12/2023]
Abstract
KEY MESSAGE We proposed an ensemble convolutional neural network model to identify sgRNA high on-target activity in four crops and we used one-hot encoding and k-mers for sequence encoding. As an important component of the CRISPR/Cas9 system, single-guide RNA (sgRNA) plays an important role in gene redirection and editing. sgRNA has played an important role in the improvement of agronomic species, but there is a lack of effective bioinformatics tools to identify the activity of sgRNA in agronomic species. Therefore, it is necessary to develop a method based on machine learning to identify sgRNA high on-target activity. In this work, we proposed a simple convolutional neural network method to identify sgRNA high on-target activity. Our study used one-hot encoding and k-mers for sequence data conversion and a voting algorithm for constructing the convolutional neural network ensemble model sgRNACNN for the prediction of sgRNA activity. The ensemble model sgRNACNN was used for predictions in four crops: Glycine max, Zea mays, Sorghum bicolor and Triticum aestivum. The accuracy rates of the four crops in the sgRNACNN model were 82.43%, 80.33%, 78.25% and 87.49%, respectively. The experimental results showed that sgRNACNN realizes the identification of high on-target activity sgRNA of agronomic data and can meet the demands of sgRNA activity prediction in agronomy to a certain extent. These results have certain significance for guiding crop gene editing and academic research. The source code and relevant dataset can be found in the following link: https://github.com/nmt315320/sgRNACNN.git .
Collapse
Affiliation(s)
- Mengting Niu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuan Lin
- Department of System Integration, Sparebanken Vest, Bergen, Norway.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
23
|
Zhang Q, Wang Q, He C, Fan D, Zhu Y, Zang F, Tan C, Zhang S, Shu H, Zhang Z, Feng H, Wang Z, Xie C. Altered Regional Cerebral Blood Flow and Brain Function Across the Alzheimer's Disease Spectrum: A Potential Biomarker. Front Aging Neurosci 2021; 13:630382. [PMID: 33692680 PMCID: PMC7937726 DOI: 10.3389/fnagi.2021.630382] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 01/20/2021] [Indexed: 12/14/2022] Open
Abstract
Objective: To investigate variation in the characteristics of regional cerebral blood flow (rCBF), brain activity, and intrinsic functional connectivity (FC) across the Alzheimer's disease spectrum (ADS). Methods: The study recruited 20 individuals in each of the following categories: Alzheimer's disease (AD), mild cognitive impairment (MCI), subjective cognitive decline (SCD), and healthy control (HC). All participants completed the 3.0T resting-state functional MRI (rs-fMRI) and arterial spin labeling scans in addition to neuropsychological tests. Additionally, the normalized CBF, regional homogeneity (ReHo), and amplitude of low-frequency fluctuation (ALFF) of individual subjects were compared in the ADS. Moreover, the changes in intrinsic FC were investigated across the ADS using the abnormal rCBF regions as seeds and behavioral correlations. Finally, a support-vector classifier model of machine learning was used to distinguish individuals with ADS from HC. Results: Compared to the HC subjects, patients with AD showed the poorest level of rCBF in the left precuneus (LPCUN) and right middle frontal gyrus (RMFG) among all participants. In addition, there was a significant decrease in the ALFF in the bilateral posterior cingulate cortex (PCC) and ReHo in the right PCC. Moreover, RMFG- and LPCUN-based FC analysis revealed that the altered FCs were primarily located in the posterior brain regions. Finally, a combination of altered rCBF, ALFF, and ReHo in posterior cingulate cortex/precuneus (PCC/PCUN) showed a better ability to differentiate ADS from HC, AD from SCD and MCI, but not MCI from SCD. Conclusions: The study demonstrated the significance of an altered rCBF and brain activity in the early stages of ADS. These findings, therefore, present a potential diagnostic neuroimaging-based biomarker in ADS. Additionally, the study provides a better understanding of the pathophysiology of AD.
Collapse
Affiliation(s)
- Qianqian Zhang
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Qing Wang
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Cancan He
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Dandan Fan
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Yao Zhu
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Feifei Zang
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Chang Tan
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Shaoke Zhang
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Hao Shu
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Zhijun Zhang
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China.,Neuropsychiatric Institute, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China.,The Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing, China
| | - Haixia Feng
- Department of Nursing, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Zan Wang
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Chunming Xie
- Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China.,Neuropsychiatric Institute, Affiliated ZhongDa Hospital, School of Medicine, Southeast University, Nanjing, China.,The Key Laboratory of Developmental Genes and Human Disease, Southeast University, Nanjing, China
| |
Collapse
|
24
|
Huang Q, Zhou W, Guo F, Xu L, Zhang L. 6mA-Pred: identifying DNA N6-methyladenine sites based on deep learning. PeerJ 2021; 9:e10813. [PMID: 33604189 PMCID: PMC7866889 DOI: 10.7717/peerj.10813] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/30/2020] [Indexed: 01/03/2023] Open
Abstract
With the accumulation of data on 6mA modification sites, an increasing number of scholars have begun to focus on the identification of 6mA sites. Despite the recognized importance of 6mA sites, methods for their identification remain lacking, with most existing methods being aimed at their identification in individual species. In the present study, we aimed to develop an identification method suitable for multiple species. Based on previous research, we propose a method for 6mA site recognition. Our experiments prove that the proposed 6mA-Pred method is effective for identifying 6mA sites in genes from taxa such as rice, Mus musculus, and human. A series of experimental results show that 6mA-Pred is an excellent method. We provide the source code used in the study, which can be obtained from http://39.100.246.211:5004/6mA_Pred/.
Collapse
Affiliation(s)
- Qianfei Huang
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, China
| |
Collapse
|
25
|
He S, Guo F, Zou Q, HuiDing. MRMD2.0: A Python Tool for Machine Learning with Feature Ranking and Reduction. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200503030350] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aims:
The study aims to find a way to reduce the dimensionality of the dataset.
Background:
Dimensionality reduction is the key issue of the machine learning process. It does
not only improve the prediction performance but also could recommend the intrinsic features and
help to explore the biological expression of the machine learning “black box”.
Objective:
A variety of feature selection algorithms are used to select data features to achieve
dimensionality reduction.
Methods:
First, MRMD2.0 integrated 7 different popular feature ranking algorithms with
PageRank strategy. Second, optimized dimensionality was detected with forward adding strategy.
Result:
We have achieved good results in our experiments.
Conclusion:
Several works have been tested with MRMD2.0. It showed well performance.
Otherwise, it also can draw the performance curves according to the feature dimensionality. If
users want to sacrifice accuracy for fewer features, they can select the dimensionality from the
performance curves.
Other:
We developed friendly python tools together with the web server. The users could upload
their csv, arff or libsvm format files. Then the webserver would help to rank features and find the
optimized dimensionality.
Collapse
Affiliation(s)
- Shida He
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - HuiDing
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
26
|
Cui F, Zhang Z, Zou Q. Sequence representation approaches for sequence-based protein prediction tasks that use deep learning. Brief Funct Genomics 2021; 20:61-73. [PMID: 33527980 DOI: 10.1093/bfgp/elaa030] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 12/16/2020] [Accepted: 12/18/2020] [Indexed: 11/12/2022] Open
Abstract
Deep learning has been increasingly used in bioinformatics, especially in sequence-based protein prediction tasks, as large amounts of biological data are available and deep learning techniques have been developed rapidly in recent years. For sequence-based protein prediction tasks, the selection of a suitable model architecture is essential, whereas sequence data representation is a major factor in controlling model performance. Here, we summarized all the main approaches that are used to represent protein sequence data (amino acid sequence encoding or embedding), which include end-to-end embedding methods, non-contextual embedding methods and embedding methods that use transfer learning and others that are applied for some specific tasks (such as protein sequence embedding based on extracted features for protein structure predictions and graph convolutional network-based embedding for drug discovery tasks). We have also reviewed the architectures of various types of embedding models theoretically and the development of these types of sequence embedding approaches to facilitate researchers and users in selecting the model that best suits their requirements.
Collapse
Affiliation(s)
- Feifei Cui
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Zilong Zhang
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
27
|
Bai Z, Chen M, Lin Q, Ye Y, Fan H, Wen K, Zeng J, Huang D, Mo W, Lei Y, Liao Z. Identification of Methicillin-Resistant Staphylococcus Aureus From Methicillin-Sensitive Staphylococcus Aureus and Molecular Characterization in Quanzhou, China. Front Cell Dev Biol 2021; 9:629681. [PMID: 33553185 PMCID: PMC7858276 DOI: 10.3389/fcell.2021.629681] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 01/04/2021] [Indexed: 12/17/2022] Open
Abstract
To distinguish Methicillin-Resistant Staphylococcus aureus (MRSA) from Methicillin-Sensitive Staphylococcus aureus (MSSA) in the protein sequences level, test the susceptibility to antibiotic of all Staphylococcus aureus isolates from Quanzhou hospitals, define the virulence factor and molecular characteristics of the MRSA isolates. MRSA and MSSA Pfam protein sequences were used to extract feature vectors of 188D, n-gram and 400D. Weka software was applied to classify the two Staphylococcus aureus and performance effect was evaluated. Antibiotic susceptibility testing of the 81 Staphylococcus aureus was performed by the Mérieux Microbial Analysis Instrument. The 65 MRSA isolates were characterized by Panton-Valentine leukocidin (PVL), X polymorphic region of Protein A (spa), multilocus sequence typing test (MLST), staphylococcus chromosomal cassette mec (SCCmec) typing. After comparing the results of Weka six classifiers, the highest correctly classified rates were 91.94, 70.16, and 62.90% from 188D, n-gram and 400D, respectively. Antimicrobial susceptibility test of the 81 Staphylococcus aureus: Penicillin-resistant rate was 100%. No resistance to teicoplanin, linezolid, and vancomycin. The resistance rate of the MRSA isolates to clindamycin, erythromycin and tetracycline was higher than that of the MSSAs. Among the 65 MRSA isolates, the positive rate of PVL gene was 47.7% (31/65). Seventeen sequence types (STs) were identified among the 65 isolates, and ST59 was the most prevalent. SCCmec type III and IV were observed at 24.6 and 72.3%, respectively. Two isolates did not be typed. Twenty-one spa types were identified, spa t437 (34/65, 52.3%) was the most predominant type. MRSA major clone type of molecular typing was CC59-ST59-spa t437-IV (28/65, 43.1%). Overall, 188D feature vectors can be applied to successfully distinguish MRSA from MSSA. In Quanzhou, the detection rate of PVL virulence factor was high, suggesting a high pathogenic risk of MRSA infection. The cross-infection of CA-MRSA and HA-MRSA was presented, the molecular characteristics were increasingly blurred, HA-MRSA with typical CA-MRSA molecular characteristics has become an important cause of healthcare-related infections. CC59-ST59-spa t437-IV was the main clone type in Quanzhou, which was rare in other parts of mainland China.
Collapse
Affiliation(s)
- Zhimin Bai
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Department of Clinical Laboratory, Jinjiang Municipal Hospital, Jinjiang, China
| | - Min Chen
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Microbiological Laboratory Sanming Center for Disease Control and Prevention, Sanming, China
| | - Qiaofa Lin
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| | - Ying Ye
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| | - Hongmei Fan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| | - Kaizhen Wen
- Department of Clinical Laboratory, Jinjiang Municipal Hospital, Jinjiang, China
| | - Jianxing Zeng
- Department of Clinical Laboratory, Jinjiang Municipal Hospital, Jinjiang, China
| | - Donghong Huang
- Department of Clinical Laboratory, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Wenfei Mo
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| | - Ying Lei
- Department of Clinical Laboratory, Quanzhou Women's and Children's Hospital, Quanzhou, China
| | - Zhijun Liao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| |
Collapse
|
28
|
Liu T, Chen JM, Zhang D, Zhang Q, Peng B, Xu L, Tang H. ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features. Front Cell Dev Biol 2021; 8:621144. [PMID: 33490085 PMCID: PMC7820372 DOI: 10.3389/fcell.2020.621144] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 11/24/2020] [Indexed: 01/24/2023] Open
Abstract
Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer's disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to accurately identify and classify apolipoproteins. Although it is possible to identify apolipoproteins accurately through biochemical experiments, they are expensive and time-consuming. This work aims to establish a high-efficiency and high-accuracy prediction model for recognition of apolipoproteins and their subfamilies. We firstly constructed a high-quality benchmark dataset including 270 apolipoproteins and 535 non-apolipoproteins. Based on the dataset, pseudo-amino acid composition (PseAAC) and composition of k-spaced amino acid pairs (CKSAAP) were used as input vectors. To improve the prediction accuracy and eliminate redundant information, analysis of variance (ANOVA) was used to rank the features. And the incremental feature selection was utilized to obtain the best feature subset. Support vector machine (SVM) was proposed to construct the classification model, which could produce the accuracy of 97.27%, sensitivity of 96.30%, and specificity of 97.76% for discriminating apolipoprotein from non-apolipoprotein in 10-fold cross-validation. In addition, the same process was repeated to generate a new model for predicting apolipoprotein subfamilies. The new model could achieve an overall accuracy of 95.93% in 10-fold cross-validation. According to our proposed model, a convenient webserver called ApoPred was established, which can be freely accessed at http://tang-biolab.com/server/ApoPred/service.html. We expect that this work will contribute to apolipoprotein function research and drug development in relevant diseases.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Jia-Mao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Dan Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Bowen Peng
- Division of international Cooperation, Health Commission of Sichuan Province, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
- Central Nervous System Drug Key Laboratory of Sichuan Province, Luzhou, China
| |
Collapse
|
29
|
Li J, Zhang L, He S, Guo F, Zou Q. SubLocEP: a novel ensemble predictor of subcellular localization of eukaryotic mRNA based on machine learning. Brief Bioinform 2021; 22:6059770. [PMID: 33388743 DOI: 10.1093/bib/bbaa401] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 11/28/2020] [Accepted: 12/08/2020] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION mRNA location corresponds to the location of protein translation and contributes to precise spatial and temporal management of the protein function. However, current assignment of subcellular localization of eukaryotic mRNA reveals important limitations: (1) turning multiple classifications into multiple dichotomies makes the training process tedious; (2) the majority of the models trained by classical algorithm are based on the extraction of single sequence information; (3) the existing state-of-the-art models have not reached an ideal level in terms of prediction and generalization ability. To achieve better assignment of subcellular localization of eukaryotic mRNA, a better and more comprehensive model must be developed. RESULTS In this paper, SubLocEP is proposed as a two-layer integrated prediction model for accurate prediction of the location of sequence samples. Unlike the existing models based on limited features, SubLocEP comprehensively considers additional feature attributes and is combined with LightGBM to generated single feature classifiers. The initial integration model (single-layer model) is generated according to the categories of a feature. Subsequently, two single-layer integration models are weighted (sequence-based: physicochemical properties = 3:2) to produce the final two-layer model. The performance of SubLocEP on independent datasets is sufficient to indicate that SubLocEP is an accurate and stable prediction model with strong generalization ability. Additionally, an online tool has been developed that contains experimental data and can maximize the user convenience for estimation of subcellular localization of eukaryotic mRNA.
Collapse
Affiliation(s)
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology
| | | | | | | |
Collapse
|
30
|
Lv Z, Ding H, Wang L, Zou Q. A Convolutional Neural Network Using Dinucleotide One-hot Encoder for identifying DNA N6-Methyladenine Sites in the Rice Genome. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.056] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
31
|
Zhang X, Ma H, Zou Q, Wu J. Analysis of Cyclin-Dependent Kinase 1 as an Independent Prognostic Factor for Gastric Cancer Based on Statistical Methods. Front Cell Dev Biol 2020; 8:620164. [PMID: 33365314 PMCID: PMC7750425 DOI: 10.3389/fcell.2020.620164] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 11/03/2020] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE The aim of this study was to investigate the expression of cyclin-dependent kinase 1 (CDK1) in gastric cancer (GC), evaluate its relationship with the clinicopathological features and prognosis of GC, and analyze the advantage of CDK1 as a potential independent prognostic factor for GC. METHODS The Cancer Genome Atlas (TCGA) data and corresponding clinical features of GC were collected. First, the aim gene was selected by combining five topological analysis methods, where the gene expression in paracancerous and GC tissues was analyzed by Limma package and Wilcox test. Second, the correlation between gene expression and clinical features was analyzed by logistic regression. Finally, the survival analysis was carried out by using the Kaplan-Meier. The gene prognostic value was evaluated by univariate and multivariate Cox analyses, and the gene potential biological function was explored by gene set enrichment analysis (GSEA). RESULTS CDK1 was selected as one of the most important genes associated with GC. The expression level of CDK1 in GC tissues was significantly higher than that in paracancerous tissues, which was significantly correlated with pathological stage and grade. The survival rate of the CDK1 high expression group was significantly lower than that of the low expression group. CDK1 expression was significantly correlated with overall survival (OS). CDK1 expression was mainly involved in prostate cancer, small cell lung cancer, and GC and was enriched in the WNT signaling pathway and T cell receptor signaling pathway. CONCLUSION CDK1 may serve as an independent prognostic factor for GC. It is also expected to be a new target for molecular targeted therapy of GC.
Collapse
Affiliation(s)
- Xu Zhang
- School of Mathematics and Statistics, Southwest University, Chongqing, China
| | - Hua Ma
- School of Mathematics and Statistics, Southwest University, Chongqing, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Hainan Key Laboratory for Computational Science and Application, Hainan Normal University, Haikou, China
| | - Jin Wu
- School of Management, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
32
|
Mishra R, Li B. The Application of Artificial Intelligence in the Genetic Study of Alzheimer's Disease. Aging Dis 2020; 11:1567-1584. [PMID: 33269107 PMCID: PMC7673858 DOI: 10.14336/ad.2020.0312] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/12/2020] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disease in which genetic factors contribute approximately 70% of etiological effects. Studies have found many significant genetic and environmental factors, but the pathogenesis of AD is still unclear. With the application of microarray and next-generation sequencing technologies, research using genetic data has shown explosive growth. In addition to conventional statistical methods for the processing of these data, artificial intelligence (AI) technology shows obvious advantages in analyzing such complex projects. This article first briefly reviews the application of AI technology in medicine and the current status of genetic research in AD. Then, a comprehensive review is focused on the application of AI in the genetic research of AD, including the diagnosis and prognosis of AD based on genetic data, the analysis of genetic variation, gene expression profile, gene-gene interaction in AD, and genetic analysis of AD based on a knowledge base. Although many studies have yielded some meaningful results, they are still in a preliminary stage. The main shortcomings include the limitations of the databases, failing to take advantage of AI to conduct a systematic biology analysis of multilevel databases, and lack of a theoretical framework for the analysis results. Finally, we outlook the direction of future development. It is crucial to develop high quality, comprehensive, large sample size, data sharing resources; a multi-level system biology AI analysis strategy is one of the development directions, and computational creativity may play a role in theory model building, verification, and designing new intervention protocols for AD.
Collapse
Affiliation(s)
- Rohan Mishra
- Washington Institute for Health Sciences, Arlington, VA 22203, USA
| | - Bin Li
- Washington Institute for Health Sciences, Arlington, VA 22203, USA
- Georgetown University Medical Center, Washington D.C. 20057, USA
| |
Collapse
|
33
|
Wang C, Wu J, Xu L, Zou Q. NonClasGP-Pred: robust and efficient prediction of non-classically secreted proteins by integrating subset-specific optimal models of imbalanced data. Microb Genom 2020; 6:mgen000483. [PMID: 33245691 PMCID: PMC8116686 DOI: 10.1099/mgen.0.000483] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 11/06/2020] [Indexed: 01/01/2023] Open
Abstract
Non-classically secreted proteins (NCSPs) are proteins that are located in the extracellular environment, although there is a lack of known signal peptides or secretion motifs. They usually perform different biological functions in intracellular and extracellular environments, and several of their biological functions are linked to bacterial virulence and cell defence. Accurate protein localization is essential for all living organisms, however, the performance of existing methods developed for NCSP identification has been unsatisfactory and in particular suffer from data deficiency and possible overfitting problems. Further improvement is desirable, especially to address the lack of informative features and mining subset-specific features in imbalanced datasets. In the present study, a new computational predictor was developed for NCSP prediction of gram-positive bacteria. First, to address the possible prediction bias caused by the data imbalance problem, ten balanced subdatasets were generated for ensemble model construction. Then, the F-score algorithm combined with sequential forward search was used to strengthen the feature representation ability for each of the training subdatasets. Third, the subset-specific optimal feature combination process was adopted to characterize the original data from different aspects, and all subdataset-based models were integrated into a unified model, NonClasGP-Pred, which achieved an excellent performance with an accuracy of 93.23 %, a sensitivity of 100 %, a specificity of 89.01 %, a Matthew's correlation coefficient of 87.68 % and an area under the curve value of 0.9975 for ten-fold cross-validation. Based on assessment on the independent test dataset, the proposed model outperformed state-of-the-art available toolkits. For availability and implementation, see: http://lab.malab.cn/~wangchao/softwares/NonClasGP/.
Collapse
Affiliation(s)
- Chao Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, PR China
| | - Jin Wu
- School of Management, Shenzhen Polytechnic, Shenzhen, PR China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, PR China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, PR China
- Hainan Key Laboratory for Computational Science and Application, Hainan Normal University, Haikou, PR China
| |
Collapse
|
34
|
Ao C, Zhou W, Gao L, Dong B, Yu L. Prediction of antioxidant proteins using hybrid feature representation method and random forest. Genomics 2020; 112:4666-4674. [DOI: 10.1016/j.ygeno.2020.08.016] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 08/10/2020] [Accepted: 08/13/2020] [Indexed: 12/19/2022]
|
35
|
Meng C, Wu J, Guo F, Dong B, Xu L. CWLy-pred: A novel cell wall lytic enzyme identifier based on an improved MRMD feature selection method. Genomics 2020; 112:4715-4721. [DOI: 10.1016/j.ygeno.2020.08.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 08/04/2020] [Accepted: 08/13/2020] [Indexed: 10/25/2022]
|
36
|
Hou R, Wu J, Xu L, Zou Q, Wu YJ. Computational Prediction of Protein Arginine Methylation Based on Composition-Transition-Distribution Features. ACS OMEGA 2020; 5:27470-27479. [PMID: 33134710 PMCID: PMC7594152 DOI: 10.1021/acsomega.0c03972] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 10/06/2020] [Indexed: 06/11/2023]
Abstract
Arginine methylation is one of the most essential protein post-translational modifications. Identifying the site of arginine methylation is a critical problem in biology research. Unfortunately, biological experiments such as mass spectrometry are expensive and time-consuming. Hence, predicting arginine methylation by machine learning is an alternative fast and efficient way. In this paper, we focus on the systematic characterization of arginine methylation with composition-transition-distribution (CTD) features. The presented framework consists of three stages. In the first stage, we extract CTD features from 1750 samples and exploit decision tree to generate accurate prediction. The accuracy of prediction can reach 96%. In the second stage, the support vector machine can predict the number of arginine methylation sites with 0.36 R-squared. In the third stage, experiments carried out with the updated arginine methylation site data set show that utilizing CTD features and adopting random forest as the classifier outperform previous methods. The accuracy of identification can reach 82.1 and 82.5% in single methylarginine and double methylarginine data sets, respectively. The discovery presented in this paper can be helpful for future research on arginine methylation.
Collapse
Affiliation(s)
- Ruiyan Hou
- Laboratory
of Molecular Toxicology, State Key Laboratory of Integrated Management
of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- College
of Life Science, University of Chinese Academy
of Sciences, Beijing 100049, China
| | - Jin Wu
- School
of Management, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School
of Electronic and Engineering, Shenzhen
Polytechnic, Shenzhen 518055, China
| | - Quan Zou
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yi-Jun Wu
- Laboratory
of Molecular Toxicology, State Key Laboratory of Integrated Management
of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
37
|
Guo Z, Wang P, Liu Z, Zhao Y. Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction. Front Bioeng Biotechnol 2020; 8:584807. [PMID: 33195148 PMCID: PMC7642589 DOI: 10.3389/fbioe.2020.584807] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 09/11/2020] [Indexed: 01/19/2023] Open
Abstract
Thermophilicity is a very important property of proteins, as it sometimes determines denaturation and cell death. Thus, methods for predicting thermophilic proteins and non-thermophilic proteins are of interest and can contribute to the design and engineering of proteins. In this article, we describe the use of feature dimension reduction technology and LIBSVM to identify thermophilic proteins. The highest accuracy obtained by cross-validation was 96.02% with 119 parameters. When using only 16 features, we obtained an accuracy of 93.33%. We discuss the importance of the different characteristics in identification and report a comparison of the performance of support vector machine to that of other methods.
Collapse
Affiliation(s)
- Zifan Guo
- School of Aeronautics and Astronautic, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zhendong Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
| | - Yuming Zhao
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| |
Collapse
|
38
|
Dou L, Li X, Zhang L, Xiang H, Xu L. iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier. J Proteome Res 2020; 20:191-201. [PMID: 33090794 DOI: 10.1021/acs.jproteome.0c00314] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Lysine glutarylation is a newly reported post-translational modification (PTM) that plays significant roles in regulating metabolic and mitochondrial processes. Accurate identification of protein glutarylation is the primary task to better investigate molecular functions and various applications. Due to the common disadvantages of the time-consuming and expensive nature of traditional biological sequencing techniques as well as the explosive growth of protein data, building precise computational models to rapidly diagnose glutarylation is a popular and feasible solution. In this work, we proposed a novel AdaBoost-based predictor called iGlu_AdaBoost to distinguish glutarylation and non-glutarylation sequences. Here, the top 37 features were chosen from a total of 1768 combined features using Chi2 following incremental feature selection (IFS) to build the model, including 188D, the composition of k-spaced amino acid pairs (CKSAAP), and enhanced amino acid composition (EAAC). With the help of the hybrid-sampling method SMOTE-Tomek, the AdaBoost algorithm was performed with satisfactory recall, specificity, and AUC values of 87.48%, 72.49%, and 0.89 over 10-fold cross validation as well as 72.73%, 71.92%, and 0.63 over independent test, respectively. Further feature analysis inferred that positively charged amino acids RK play critical roles in glutarylation recognition. Our model presented the well generalization ability and consistency of the prediction results of positive and negative samples, which is comparable to four published tools. The proposed predictor is an efficient tool to find potential glutarylation sites and provides helpful suggestions for further research on glutarylation mechanisms and concerned disease treatments.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150000, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen 518172, China
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| |
Collapse
|
39
|
A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8926750. [PMID: 33133228 PMCID: PMC7591939 DOI: 10.1155/2020/8926750] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 08/14/2020] [Accepted: 09/16/2020] [Indexed: 12/14/2022]
Abstract
With the development of computer technology, many machine learning algorithms have been applied to the field of biology, forming the discipline of bioinformatics. Protein function prediction is a classic research topic in this subject area. Though many scholars have made achievements in identifying protein by different algorithms, they often extract a large number of feature types and use very complex classification methods to obtain little improvement in the classification effect, and this process is very time-consuming. In this research, we attempt to utilize as few features as possible to classify vesicular transportation proteins and to simultaneously obtain a comparative satisfactory classification result. We adopt CTDC which is a submethod of the method of composition, transition, and distribution (CTD) to extract only 39 features from each sequence, and LibSVM is used as the classification method. We use the SMOTE method to deal with the problem of dataset imbalance. There are 11619 protein sequences in our dataset. We selected 4428 sequences to train our classification model and selected other 1832 sequences from our dataset to test the classification effect and finally achieved an accuracy of 71.77%. After dimension reduction by MRMD, the accuracy is 72.16%.
Collapse
|
40
|
Li Q, Xu L, Li Q, Zhang L. Identification and Classification of Enhancers Using Dimension Reduction Technique and Recurrent Neural Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8852258. [PMID: 33133227 PMCID: PMC7591959 DOI: 10.1155/2020/8852258] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 09/16/2020] [Accepted: 09/30/2020] [Indexed: 12/21/2022]
Abstract
Enhancers are noncoding fragments in DNA sequences, which play an important role in gene transcription and translation. However, due to their high free scattering and positional variability, the identification and classification of enhancers have a higher level of complexity than those of coding genes. In order to solve this problem, many computer studies have been carried out in this field, but there are still some deficiencies in these prediction models. In this paper, we use various feature extraction strategies, dimension reduction technology, and a comprehensive application of machine model and recurrent neural network model to achieve an accurate prediction of enhancer identification and classification with the accuracy of was 76.7% and 84.9%, respectively. The model proposed in this paper is superior to the previous methods in performance index or feature dimension, which provides inspiration for the prediction of enhancers by computer technology in the future.
Collapse
Affiliation(s)
- Qingwen Li
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Qingyuan Li
- Forestry and Fruit Tree Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, China
| |
Collapse
|
41
|
Xu L, Liang G, Chen B, Tan X, Xiang H, Liao C. A Computational Method for the Identification of Endolysins and Autolysins. Protein Pept Lett 2020; 27:329-336. [PMID: 31577192 DOI: 10.2174/0929866526666191002104735] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 06/27/2019] [Accepted: 09/03/2019] [Indexed: 12/21/2022]
Abstract
BACKGROUND Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. OBJECTIVE In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. METHODS We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. RESULTS Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. CONCLUSION The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Guangmin Liang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Baowen Chen
- School of Software, Shenzhen Institute of Information Technology, Shenzhen, China
| | - Xu Tan
- School of Software, Shenzhen Institute of Information Technology, Shenzhen, China
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Changrui Liao
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
42
|
Xu L, Jiang S, Wu J, Zou Q. An in silico approach to identification, categorization and prediction of nucleic acid binding proteins. Brief Bioinform 2020; 22:5892348. [PMID: 32793956 DOI: 10.1093/bib/bbaa171] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 06/22/2020] [Accepted: 07/01/2020] [Indexed: 01/29/2023] Open
Abstract
The interaction between proteins and nucleic acid plays an important role in many processes, such as transcription, translation and DNA repair. The mechanisms of related biological events can be understood by exploring the function of proteins in these interactions. The number of known protein sequences has increased rapidly in recent years, but the databases for describing the structure and function of protein have unfortunately grown quite slowly. Thus, improving such databases is meaningful for predicting protein-nucleic acid interactions. Furthermore, the mechanism of related biological events, such as viral infection or designing novel drug targets, can be further understood by understanding the function of proteins in these interactions. The information for each sequence, including its function and interaction sites, were collected and identified, and a database called PNIDB was built. The proteins in PNIDB were grouped into 27 classes, such as transcription, immune system, and structural protein, etc. The function of each protein was then predicted using a machine learning method. Using our method, the predictor was trained on labeled sequences, and then the function of a protein was predicted based on the trained classifier. The prediction accuracy achieved a score of 77.43% by 10-fold cross validation.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| | | | - Jin Wu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| | - Quan Zou
- School of Management, Shenzhen Polytechnic
| |
Collapse
|
43
|
Li Q, Zhou W, Wang D, Wang S, Li Q. Prediction of Anticancer Peptides Using a Low-Dimensional Feature Model. Front Bioeng Biotechnol 2020; 8:892. [PMID: 32903381 PMCID: PMC7434836 DOI: 10.3389/fbioe.2020.00892] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 07/10/2020] [Indexed: 01/09/2023] Open
Abstract
Cancer is still a severe health problem globally. The therapy of cancer traditionally involves the use of radiotherapy or anticancer drugs to kill cancer cells, but these methods are quite expensive and have side effects, which will cause great harm to patients. With the find of anticancer peptides (ACPs), significant progress has been achieved in the therapy of tumors. Therefore, it is invaluable to accurately identify anticancer peptides. Although biochemical experiments can solve this work, this method is expensive and time-consuming. To promote the application of anticancer peptides in cancer therapy, machine learning can be used to recognize anticancer peptides by extracting the feature vectors of anticancer peptides. Nevertheless, poor performance usually be found in training the machine learning model to utilizing high-dimensional features in practice. In order to solve the above job, this paper put forward a 19-dimensional feature model based on anticancer peptide sequences, which has lower dimensionality and better performance than some existing methods. In addition, this paper also separated a model with a low number of dimensions and acceptable performance. The few features identified in this study may represent the important features of anticancer peptides.
Collapse
Affiliation(s)
- Qingwen Li
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Wenyang Zhou
- Center for Bioinformatics, School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Sui Wang
- Key Laboratory of Soybean Biology in Chinese Ministry of Education, Northeast Agricultural University, Harbin, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
| | - Qingyuan Li
- Forestry and Fruit Tree Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan, China
| |
Collapse
|
44
|
Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs. BIOMED RESEARCH INTERNATIONAL 2020; 2020:9235920. [PMID: 32596396 PMCID: PMC7273372 DOI: 10.1155/2020/9235920] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 04/22/2020] [Indexed: 11/17/2022]
Abstract
Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.
Collapse
|
45
|
Wang C, Zhang Y, Han S. Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2468789. [PMID: 32566672 PMCID: PMC7275950 DOI: 10.1155/2020/2468789] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 03/20/2020] [Accepted: 03/25/2020] [Indexed: 12/19/2022]
Abstract
Fungi play essential roles in many ecological processes, and taxonomic classification is fundamental for microbial community characterization and vital for the study and preservation of fungal biodiversity. To cope with massive fungal barcode data, tools that can implement extensive volumes of barcode sequences, especially the internal transcribed spacer (ITS) region, are necessary. However, high variation in the ITS region and computational requirements for processing high-dimensional features remain challenging for existing predictors. In this study, we developed Its2vec, a bioinformatics tool for the classification of fungal ITS barcodes to the species level. An ITS database covering more than 25,000 species in a broad range of fungal taxa was assembled. For dimensionality reduction, a word embedding algorithm was used to represent an ITS sequence as a dense low-dimensional vector. A random forest-based classifier was built for species identification. Benchmarking results showed that our model achieved an accuracy comparable to that of several state-of-the-art predictors, and more importantly, it could implement large datasets and greatly reduce dimensionality. We expect the Its2vec model to be helpful for fungal species identification and, thus, for revealing microbial community structures and in deepening our understanding of their functional mechanisms.
Collapse
Affiliation(s)
- Chao Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Shuguang Han
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 60054, China
| |
Collapse
|
46
|
Meng C, Hu Y, Zhang Y, Guo F. PSBP-SVM: A Machine Learning-Based Computational Identifier for Predicting Polystyrene Binding Peptides. Front Bioeng Biotechnol 2020; 8:245. [PMID: 32296690 PMCID: PMC7137786 DOI: 10.3389/fbioe.2020.00245] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/09/2020] [Indexed: 12/11/2022] Open
Abstract
Polystyrene binding peptides (PSBPs) play a key role in the immobilization process. The correct identification of PSBPs is the first step of all related works. In this paper, we proposed a novel support vector machine-based bioinformatic identification model. This model contains four machine learning steps, including feature extraction, feature selection, model training and optimization. In a five-fold cross validation test, this model achieves 90.38, 84.62, 87.50, and 0.90% SN, SP, ACC, and AUC, respectively. The performance of this model outperforms the state-of-the-art identifier in terms of the SN and ACC with a smaller feature set. Furthermore, we constructed a web server that includes the proposed model, which is freely accessible at http://server.malab.cn/PSBP-SVM/index.jsp.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China.,College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Yang Hu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
47
|
Dou L, Li X, Ding H, Xu L, Xiang H. Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem? MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 19:293-303. [PMID: 31865116 PMCID: PMC6931122 DOI: 10.1016/j.omtn.2019.11.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/29/2019] [Accepted: 11/11/2019] [Indexed: 01/01/2023]
Abstract
Pseudouridine (Ψ) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, Ψ sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Ψ sites, they are expensive and time consuming, especially in the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Ψ sites on uncharacterized RNA sequences. Several predictors have been proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identified Ψ sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated using 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 datasets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the combined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Ψ identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the corresponding accuracies are generally in the range of 60%-70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Ψ modification prediction problem.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China.
| |
Collapse
|
48
|
Lv Z, Zhang J, Ding H, Zou Q. RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites. Front Bioeng Biotechnol 2020; 8:134. [PMID: 32175316 PMCID: PMC7054385 DOI: 10.3389/fbioe.2020.00134] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 02/10/2020] [Indexed: 12/21/2022] Open
Abstract
One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu.
Collapse
Affiliation(s)
- Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
49
|
Huang Q, Zhang J, Wei L, Guo F, Zou Q. 6mA-RicePred: A Method for Identifying DNA N 6-Methyladenine Sites in the Rice Genome Based on Feature Fusion. FRONTIERS IN PLANT SCIENCE 2020; 11:4. [PMID: 32076430 PMCID: PMC7006724 DOI: 10.3389/fpls.2020.00004] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/06/2020] [Indexed: 06/01/2023]
Abstract
MOTIVATION The biological function of N 6-methyladenine DNA (6mA) in plants is largely unknown. Rice is one of the most important crops worldwide and is a model species for molecular and genetic studies. There are few methods for 6mA site recognition in the rice genome, and an effective computational method is needed. RESULTS In this paper, we propose a new computational method called 6mA-Pred to identify 6mA sites in the rice genome. 6mA-Pred employs a feature fusion method to combine advantageous features from other methods and thus obtain a new feature to identify 6mA sites. This method achieved an accuracy of 87.27% in the identification of 6mA sites with 10-fold cross-validation and achieved an accuracy of 85.6% in independent test sets.
Collapse
Affiliation(s)
- Qianfei Huang
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Leyi Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
50
|
Li Q, Dong B, Wang D, Wang S. Identification of Secreted Proteins From Malaria Protozoa With Few Features. IEEE ACCESS 2020; 8:89793-89801. [DOI: 10.1109/access.2020.2994206] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2025]
|