1
|
Sharma A, Selvam S, Balaji PD, Madhavan T. ANN multi-layer perceptron for prediction of blood-brain barrier permeable compounds for central nervous system therapeutics. J Biomol Struct Dyn 2024:1-6. [PMID: 38497749 DOI: 10.1080/07391102.2024.2326671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 02/28/2024] [Indexed: 03/19/2024]
Abstract
Endothelial cells produce a semipermeable barrier known as the blood-brain barrier (BBB) to keep undesired chemicals out of the central nervous system (CNS). However, this barrier also restricts the exploration of potential new medications due to insufficient exposure. To address this challenge, machine learning (ML) algorithms can be useful to predict the BBB permeability of chemical compounds. Support vector machines, continuous neural networks, and deep learning approaches have been used to identify compounds that can penetrate the BBB. However, predicting BBB permeability based solely on chemical structure can be difficult. In the current research, we developed an ML model using a large dataset to predict BBB permeability, which could be used for early-stage drug screening of potential CNS medications. Our artificial neural network ANN algorithm exhibited an accuracy of 0.94, specificity of 0.83, sensitivity of 0.97, AUC of 0.96, and MCC of 0.83. These metrics suggest that our model has a high accuracy rate in predicting BBB permeability and therefore has the potential to advance drug discovery efforts in the CNS. This study's outcomes demonstrate the potential for ML models to predict BBB permeability accurately, aiding in the identification of new CNS therapeutic options.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Aditi Sharma
- Computational Biology Lab, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science & Technology, Kattankulathur, Tamil Nadu, India
| | - Subathra Selvam
- Computational Biology Lab, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science & Technology, Kattankulathur, Tamil Nadu, India
| | - Priya Dharshini Balaji
- Computational Biology Lab, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science & Technology, Kattankulathur, Tamil Nadu, India
| | - Thirumurthy Madhavan
- Computational Biology Lab, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science & Technology, Kattankulathur, Tamil Nadu, India
| |
Collapse
|
2
|
Li H, Zhang J, Zhao Y, Yang W. Predicting Corynebacterium glutamicum promoters based on novel feature descriptor and feature selection technique. Front Microbiol 2023; 14:1141227. [PMID: 36937275 PMCID: PMC10018189 DOI: 10.3389/fmicb.2023.1141227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 02/10/2023] [Indexed: 03/06/2023] Open
Abstract
The promoter is an important noncoding DNA regulatory element, which combines with RNA polymerase to activate the expression of downstream genes. In industry, artificial arginine is mainly synthesized by Corynebacterium glutamicum. Replication of specific promoter regions can increase arginine production. Therefore, it is necessary to accurately locate the promoter in C. glutamicum. In the wet experiment, promoter identification depends on sigma factors and DNA splicing technology, this is a laborious job. To quickly and conveniently identify the promoters in C. glutamicum, we have developed a method based on novel feature representation and feature selection to complete this task, describing the DNA sequences through statistical parameters of multiple physicochemical properties, filtering redundant features by combining analysis of variance and hierarchical clustering, the prediction accuracy of the which is as high as 91.6%, the sensitivity of 91.9% can effectively identify promoters, and the specificity of 91.2% can accurately identify non-promoters. In addition, our model can correctly identify 181 promoters and 174 non-promoters among 400 independent samples, which proves that the developed prediction model has excellent robustness.
Collapse
Affiliation(s)
- HongFei Li
- College of Life Science, Northeast Forestry University, Harbin, China
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Jingyu Zhang
- Department of Neurology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yuming Zhao
- College of Life Science, Northeast Forestry University, Harbin, China
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Yuming Zhao, ; Wen Yang,
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
- *Correspondence: Yuming Zhao, ; Wen Yang,
| |
Collapse
|
3
|
Gu X, Ding Y, Xiao P, He T. A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins. Front Genet 2022; 13:935717. [PMID: 36506312 PMCID: PMC9727185 DOI: 10.3389/fgene.2022.935717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/02/2022] [Indexed: 11/24/2022] Open
Abstract
There is a great deal of importance to SNARE proteins, and their absence from function can lead to a variety of diseases. The SNARE protein is known as a membrane fusion protein, and it is crucial for mediating vesicle fusion. The identification of SNARE proteins must therefore be conducted with an accurate method. Through extensive experiments, we have developed a model based on graph-regularized k-local hyperplane distance nearest neighbor model (GHKNN) binary classification. In this, the model uses the physicochemical property extraction method to extract protein sequence features and the SMOTE method to upsample protein sequence features. The combination achieves the most accurate performance for identifying all protein sequences. Finally, we compare the model based on GHKNN binary classification with other classifiers and measure them using four different metrics: SN, SP, ACC, and MCC. In experiments, the model performs significantly better than other classifiers.
Collapse
Affiliation(s)
- Xingyue Gu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pengfeng Xiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
4
|
Ma D, Chen Z, He Z, Huang X. A SNARE Protein Identification Method Based on iLearnPlus to Efficiently Solve the Data Imbalance Problem. Front Genet 2022; 12:818841. [PMID: 35154261 PMCID: PMC8832978 DOI: 10.3389/fgene.2021.818841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 12/14/2021] [Indexed: 11/13/2022] Open
Abstract
Machine learning has been widely used to solve complex problems in engineering applications and scientific fields, and many machine learning-based methods have achieved good results in different fields. SNAREs are key elements of membrane fusion and required for the fusion process of stable intermediates. They are also associated with the formation of some psychiatric disorders. This study processes the original sequence data with the synthetic minority oversampling technique (SMOTE) to solve the problem of data imbalance and produces the most suitable machine learning model with the iLearnPlus platform for the identification of SNARE proteins. Ultimately, a sensitivity of 66.67%, specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 were obtained in the cross-validation dataset, and a sensitivity of 66.67%, specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 were obtained in the independent dataset (the adaptive skip dipeptide composition descriptor was used for feature extraction, and LightGBM with proper parameters was used as the classifier). These results demonstrate that this combination can perform well in the classification of SNARE proteins and is superior to other methods.
Collapse
|
5
|
Dou L, Zhou W, Zhang L, Xu L, Han K. Accurate identification of RNA D modification using multiple features. RNA Biol 2021; 18:2236-2246. [PMID: 33729104 PMCID: PMC8632091 DOI: 10.1080/15476286.2021.1898160] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 02/13/2021] [Accepted: 02/23/2021] [Indexed: 10/21/2022] Open
Abstract
As one of the common post-transcriptional modifications in tRNAs, dihydrouridine (D) has prominent effects on regulating the flexibility of tRNA as well as cancerous diseases. Facing with the expensive and time-consuming sequencing techniques to detect D modification, precise computational tools can largely promote the progress of molecular mechanisms and medical developments. We proposed a novel predictor, called iRNAD_XGBoost, to identify potential D sites using multiple RNA sequence representations. In this method, by considering the imbalance problem using hybrid sampling method SMOTEEEN, the XGBoost-selected top 30 features are applied to construct model. The optimized model showed high Sn and Sp values of 97.13% and 97.38% over jackknife test, respectively. For the independent experiment, these two metrics separately achieved 91.67% and 94.74%. Compared with iRNAD method, this model illustrated high generalizability and consistent prediction efficiencies for positive and negative samples, which yielded satisfactory MCC scores of 0.94 and 0.86, respectively. It is inferred that the chemical property and nucleotide density features (CPND), electron-ion interaction pseudopotential (EIIP and PseEIIP) as well as dinucleotide composition (DNC) are crucial to the recognition of D modification. The proposed predictor is a promising tool to help experimental biologists investigate molecular functions.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, SichuanChina
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, HeilongjiangChina
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, Guangdong, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, HeilongjiangChina
| |
Collapse
|
6
|
Jiao S, Zou Q, Guo H, Shi L. iTTCA-RF: a random forest predictor for tumor T cell antigens. J Transl Med 2021; 19:449. [PMID: 34706730 PMCID: PMC8554859 DOI: 10.1186/s12967-021-03084-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/16/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. METHODS In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. RESULTS Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA . CONCLUSIONS We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.
Collapse
Affiliation(s)
- Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Huannan Guo
- Department of Oncology, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China.
| |
Collapse
|
7
|
Liu T, Chen J, Zhang Q, Hippe K, Hunt C, Le T, Cao R, Tang H. The Development of Machine Learning Methods in discriminating Secretory Proteins of Malaria Parasite. Curr Med Chem 2021; 29:807-821. [PMID: 34636289 DOI: 10.2174/0929867328666211005140625] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/28/2021] [Accepted: 08/15/2021] [Indexed: 11/22/2022]
Abstract
Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learning-based identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University. United States
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University. United States
| | - Thu Le
- Department of Computer Science, Pacific Lutheran University. United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University. United States
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| |
Collapse
|
8
|
Chen L, Zhou X, Zeng T, Pan X, Zhang YH, Huang T, Fang Z, Cai YD. Recognizing Pattern and Rule of Mutation Signatures Corresponding to Cancer Types. Front Cell Dev Biol 2021; 9:712931. [PMID: 34513841 PMCID: PMC8427289 DOI: 10.3389/fcell.2021.712931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 07/02/2021] [Indexed: 11/20/2022] Open
Abstract
Cancer has been generally defined as a cluster of systematic malignant pathogenesis involving abnormal cell growth. Genetic mutations derived from environmental factors and inherited genetics trigger the initiation and progression of cancers. Although several well-known factors affect cancer, mutation features and rules that affect cancers are relatively unknown due to limited related studies. In this study, a computational investigation on mutation profiles of cancer samples in 27 types was given. These profiles were first analyzed by the Monte Carlo Feature Selection (MCFS) method. A feature list was thus obtained. Then, the incremental feature selection (IFS) method adopted such list to extract essential mutation features related to 27 cancer types, find out 207 mutation rules and construct efficient classifiers. The top 37 mutation features corresponding to different cancer types were discussed. All the qualitatively analyzed gene mutation features contribute to the distinction of different types of cancers, and most of such mutation rules are supported by recent literature. Therefore, our computational investigation could identify potential biomarkers and prediction rules for cancers in the mutation signature level.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, China.,College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Xianchao Zhou
- School of Life Sciences and Technology, ShanghaiTech University, Shanghai, China.,Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Tao Zeng
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Xiaoyong Pan
- Key Laboratory of System Control and Information Processing, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Ministry of Education of China, Shanghai, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.,Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Zhaoyuan Fang
- Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
9
|
Charoenkwan P, Shoombuatong W, Nantasupha C, Muangmool T, Suprasert P, Charoenkwan K. iPMI: Machine Learning-Aided Identification of Parametrial Invasion in Women with Early-Stage Cervical Cancer. Diagnostics (Basel) 2021; 11:diagnostics11081454. [PMID: 34441388 PMCID: PMC8391438 DOI: 10.3390/diagnostics11081454] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 08/09/2021] [Accepted: 08/10/2021] [Indexed: 01/18/2023] Open
Abstract
Radical hysterectomy is a recommended treatment for early-stage cervical cancer. However, the procedure is associated with significant morbidities resulting from the removal of the parametrium. Parametrial cancer invasion (PMI) is found in a minority of patients but the efficient system used to predict it is lacking. In this study, we develop a novel machine learning (ML)-based predictive model based on a random forest model (called iPMI) for the practical identification of PMI in women. Data of 1112 stage IA-IIA cervical cancer patients who underwent primary surgery were collected and considered as the training dataset, while data from an independent cohort of 116 consecutive patients were used as the independent test dataset. Based on these datasets, iPMI-Econ was then developed by using basic clinicopathological data available prior to surgery, while iPMI-Power was also introduced by adding pelvic node metastasis and uterine corpus invasion to the iPMI-Econ. Both 10-fold cross-validations and independent test results showed that iPMI-Power outperformed other well-known ML classifiers (e.g., logistic regression, decision tree, k-nearest neighbor, multi-layer perceptron, naive Bayes, support vector machine, and extreme gradient boosting). Upon comparison, it was found that iPMI-Power was effective and had a superior performance to other well-known ML classifiers in predicting PMI. It is anticipated that the proposed iPMI may serve as a cost-effective and rapid approach to guide important clinical decision-making.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 73170, Thailand;
| | - Chalaithorn Nantasupha
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand; (C.N.); (T.M.); (P.S.)
| | - Tanarat Muangmool
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand; (C.N.); (T.M.); (P.S.)
| | - Prapaporn Suprasert
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand; (C.N.); (T.M.); (P.S.)
| | - Kittipat Charoenkwan
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand; (C.N.); (T.M.); (P.S.)
- Correspondence:
| |
Collapse
|
10
|
Li Y, Pu F, Wang J, Zhou Z, Zhang C, He F, Ma Z, Zhang J. Machine Learning Methods in Prediction of Protein Palmitoylation Sites: A Brief Review. Curr Pharm Des 2021; 27:2189-2198. [PMID: 33183190 DOI: 10.2174/1381612826666201112142826] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 07/27/2020] [Indexed: 11/22/2022]
Abstract
Protein palmitoylation is a fundamental and reversible post-translational lipid modification that involves a series of biological processes. Although a large number of experimental studies have explored the molecular mechanism behind the palmitoylation process, the computational methods has attracted much attention for its good performance in predicting palmitoylation sites compared with expensive and time-consuming biochemical experiments. The prediction of protein palmitoylation sites is helpful to reveal its biological mechanism. Therefore, the research on the application of machine learning methods to predict palmitoylation sites has become a hot topic in bioinformatics and promoted the development in the related fields. In this review, we briefly introduced the recent development in predicting protein palmitoylation sites by using machine learningbased methods and discussed their benefits and drawbacks. The perspective of machine learning-based methods in predicting palmitoylation sites was also provided. We hope the review could provide a guide in related fields.
Collapse
Affiliation(s)
- Yanwen Li
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Feng Pu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jingru Wang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Zhiguo Zhou
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Chunhua Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jingbo Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
11
|
Chen XG, Zhang W, Yang X, Li C, Chen H. ACP-DA: Improving the Prediction of Anticancer Peptides Using Data Augmentation. Front Genet 2021; 12:698477. [PMID: 34276801 PMCID: PMC8279753 DOI: 10.3389/fgene.2021.698477] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/07/2021] [Indexed: 12/09/2022] Open
Abstract
Anticancer peptides (ACPs) have provided a promising perspective for cancer treatment, and the prediction of ACPs is very important for the discovery of new cancer treatment drugs. It is time consuming and expensive to use experimental methods to identify ACPs, so computational methods for ACP identification are urgently needed. There have been many effective computational methods, especially machine learning-based methods, proposed for such predictions. Most of the current machine learning methods try to find suitable features or design effective feature learning techniques to accurately represent ACPs. However, the performance of these methods can be further improved for cases with insufficient numbers of samples. In this article, we propose an ACP prediction model called ACP-DA (Data Augmentation), which uses data augmentation for insufficient samples to improve the prediction performance. In our method, to better exploit the information of peptide sequences, peptide sequences are represented by integrating binary profile features and AAindex features, and then the samples in the training set are augmented in the feature space. After data augmentation, the samples are used to train the machine learning model, which is used to predict ACPs. The performance of ACP-DA exceeds that of existing methods, and ACP-DA achieves better performance in the prediction of ACPs compared with a method without data augmentation. The proposed method is available at http://github.com/chenxgscuec/ACPDA.
Collapse
Affiliation(s)
- Xian-Gan Chen
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China.,Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan, China
| | - Xiaofei Yang
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Chenhong Li
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Hengling Chen
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| |
Collapse
|
12
|
Wu X, Yu L. EPSOL: sequence-based protein solubility prediction using multidimensional embedding. Bioinformatics 2021; 37:4314-4320. [PMID: 34145885 DOI: 10.1093/bioinformatics/btab463] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 05/18/2021] [Accepted: 06/17/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The heterologous expression of recombinant protein requires host cells, such as Escherichia coli, and the solubility of protein greatly affects the protein yield. A novel and highly accurate solubility predictor that concurrently improves the production yield and minimizes production cost, and that forecasts protein solubility in an E. coli expression system before the actual experimental work is highly sought. RESULTS In this paper, EPSOL, a novel deep learning architecture for the prediction of protein solubility in an E. coli expression system, which automatically obtains comprehensive protein feature representations using multidimensional embedding, is presented. EPSOL outperformed all existing sequence-based solubility predictors and achieved 0.79 in accuracy and 0.58 in Matthew's correlation coefficient. The higher performance of EPSOL permits large-scale screening for sequence variants with enhanced manufacturability and predicts the solubility of new recombinant proteins in an E. coli expression system with greater reliability. AVAILABILITY AND IMPLEMENTATION EPSOL's best model and results can be downloaded from GitHub (https://github.com/LiangYu-Xidian/EPSOL). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiang Wu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| |
Collapse
|
13
|
Liu M, Dong M, Jing C. A modified real-value negative selection detector-based oversampling approach for multiclass imbalance problems. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.12.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
14
|
Chen L, Li J, Chang M. Cancer Diagnosis and Disease Gene Identification via Statistical Machine Learning. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200207094947] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Diagnosing cancer and identifying the disease gene by using DNA microarray gene
expression data are the hot topics in current bioinformatics. This paper is devoted to the latest
development in cancer diagnosis and gene selection via statistical machine learning. A support
vector machine is firstly introduced for the binary cancer diagnosis. Then, 1-norm support vector
machine, doubly regularized support vector machine, adaptive huberized support vector machine
and other extensions are presented to improve the performance of gene selection. Lasso, elastic
net, partly adaptive elastic net, group lasso, sparse group lasso, adaptive sparse group lasso and
other sparse regression methods are also introduced for performing simultaneous binary cancer
classification and gene selection. In addition to introducing three strategies for reducing multiclass
to binary, methods of directly considering all classes of data in a learning model (multi_class
support vector, sparse multinomial regression, adaptive multinomial regression and so on) are
presented for performing multiple cancer diagnosis. Limitations and promising directions are also
discussed.
Collapse
Affiliation(s)
- Liuyuan Chen
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Juntao Li
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Mingming Chang
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| |
Collapse
|
15
|
Qiu W, Lv Z, Hong Y, Jia J, Xiao X. BOW-GBDT: A GBDT Classifier Combining With Artificial Neural Network for Identifying GPCR-Drug Interaction Based on Wordbook Learning From Sequences. Front Cell Dev Biol 2021; 8:623858. [PMID: 33598456 PMCID: PMC7882597 DOI: 10.3389/fcell.2020.623858] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 12/15/2020] [Indexed: 12/28/2022] Open
Abstract
Background: As a class of membrane protein receptors, G protein-coupled receptors (GPCRs) are very important for cells to complete normal life function and have been proven to be a major drug target for widespread clinical application. Hence, it is of great significance to find GPCR targets that interact with drugs in the process of drug development. However, identifying the interaction of the GPCR–drug pairs by experimental methods is very expensive and time-consuming on a large scale. As more and more database about GPCR–drug pairs are opened, it is viable to develop machine learning models to accurately predict whether there is an interaction existing in a GPCR–drug pair. Methods: In this paper, the proposed model aims to improve the accuracy of predicting the interactions of GPCR–drug pairs. For GPCRs, the work extracts protein sequence features based on a novel bag-of-words (BOW) model improved with weighted Silhouette Coefficient and has been confirmed that it can extract more pattern information and limit the dimension of feature. For drug molecules, discrete wavelet transform (DWT) is used to extract features from the original molecular fingerprints. Subsequently, the above-mentioned two types of features are contacted, and SMOTE algorithm is selected to balance the training dataset. Then, artificial neural network is used to extract features further. Finally, a gradient boosting decision tree (GBDT) model is trained with the selected features. In this paper, the proposed model is named as BOW-GBDT. Results: D92M and Check390 are selected for testing BOW-GBDT. D92M is used for a cross-validation dataset which contains 635 interactive GPCR–drug pairs and 1,225 non-interactive pairs. Check390 is used for an independent test dataset which consists of 130 interactive GPCR–drug pairs and 260 non-interactive GPCR–drug pairs, and each element in Check390 cannot be found in D92M. According to the results, the proposed model has a better performance in generation ability compared with the existing machine learning models. Conclusion: The proposed predictor improves the accuracy of the interactions of GPCR–drug pairs. In order to facilitate more researchers to use the BOW-GBDT, the predictor has been settled into a brand-new server, which is available at http://www.jci-bioinfo.cn/bowgbdt.
Collapse
Affiliation(s)
- Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Zhe Lv
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Yaoqiu Hong
- School of Information Engineering, Jingdezhen University, Jingdezhen, China
| | - Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| |
Collapse
|
16
|
Meng C, Wu J, Guo F, Dong B, Xu L. CWLy-pred: A novel cell wall lytic enzyme identifier based on an improved MRMD feature selection method. Genomics 2020; 112:4715-4721. [DOI: 10.1016/j.ygeno.2020.08.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 08/04/2020] [Accepted: 08/13/2020] [Indexed: 10/25/2022]
|
17
|
Zhai Y, Chen Y, Teng Z, Zhao Y. Identifying Antioxidant Proteins by Using Amino Acid Composition and Protein-Protein Interactions. Front Cell Dev Biol 2020; 8:591487. [PMID: 33195258 PMCID: PMC7658297 DOI: 10.3389/fcell.2020.591487] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 09/18/2020] [Indexed: 12/13/2022] Open
Abstract
Excessive oxidative stress responses can threaten our health, and thus it is essential to produce antioxidant proteins to regulate the body’s oxidative responses. The low number of antioxidant proteins makes it difficult to extract their representative features. Our experimental method did not use structural information but instead studied antioxidant proteins from a sequenced perspective while focusing on the impact of data imbalance on sensitivity, thus greatly improving the model’s sensitivity for antioxidant protein recognition. We developed a method based on the Composition of k-spaced Amino Acid Pairs (CKSAAP) and the Conjoint Triad (CT) features derived from the amino acid composition and protein-protein interactions. SMOTE and the Max-Relevance-Max-Distance algorithm (MRMD) were utilized to unbalance the training data and select the optimal feature subset, respectively. The test set used 10-fold crossing validation and a random forest algorithm for classification according to the selected feature subset. The sensitivity was 0.792, the specificity was 0.808, and the average accuracy was 0.8.
Collapse
Affiliation(s)
- Yixiao Zhai
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| | - Yu Chen
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| | - Zhixia Teng
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| | - Yuming Zhao
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| |
Collapse
|
18
|
Dou L, Li X, Zhang L, Xiang H, Xu L. iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier. J Proteome Res 2020; 20:191-201. [PMID: 33090794 DOI: 10.1021/acs.jproteome.0c00314] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Lysine glutarylation is a newly reported post-translational modification (PTM) that plays significant roles in regulating metabolic and mitochondrial processes. Accurate identification of protein glutarylation is the primary task to better investigate molecular functions and various applications. Due to the common disadvantages of the time-consuming and expensive nature of traditional biological sequencing techniques as well as the explosive growth of protein data, building precise computational models to rapidly diagnose glutarylation is a popular and feasible solution. In this work, we proposed a novel AdaBoost-based predictor called iGlu_AdaBoost to distinguish glutarylation and non-glutarylation sequences. Here, the top 37 features were chosen from a total of 1768 combined features using Chi2 following incremental feature selection (IFS) to build the model, including 188D, the composition of k-spaced amino acid pairs (CKSAAP), and enhanced amino acid composition (EAAC). With the help of the hybrid-sampling method SMOTE-Tomek, the AdaBoost algorithm was performed with satisfactory recall, specificity, and AUC values of 87.48%, 72.49%, and 0.89 over 10-fold cross validation as well as 72.73%, 71.92%, and 0.63 over independent test, respectively. Further feature analysis inferred that positively charged amino acids RK play critical roles in glutarylation recognition. Our model presented the well generalization ability and consistency of the prediction results of positive and negative samples, which is comparable to four published tools. The proposed predictor is an efficient tool to find potential glutarylation sites and provides helpful suggestions for further research on glutarylation mechanisms and concerned disease treatments.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150000, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen 518172, China
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| |
Collapse
|
19
|
Chen T, Wang X, Chu Y, Wang Y, Jiang M, Wei DQ, Xiong Y. T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm. Front Microbiol 2020; 11:580382. [PMID: 33072049 PMCID: PMC7541839 DOI: 10.3389/fmicb.2020.580382] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 08/21/2020] [Indexed: 12/19/2022] Open
Abstract
Type IV secreted effectors (T4SEs) can be translocated into the cytosol of host cells via type IV secretion system (T4SS) and cause diseases. However, experimental approaches to identify T4SEs are time- and resource-consuming, and the existing computational tools based on machine learning techniques have some obvious limitations such as the lack of interpretability in the prediction models. In this study, we proposed a new model, T4SE-XGB, which uses the eXtreme gradient boosting (XGBoost) algorithm for accurate identification of type IV effectors based on optimal features based on protein sequences. After trying 20 different types of features, the best performance was achieved when all features were fed into XGBoost by the 5-fold cross validation in comparison with other machine learning methods. Then, the ReliefF algorithm was adopted to get the optimal feature set on our dataset, which further improved the model performance. T4SE-XGB exhibited highest predictive performance on the independent test set and outperformed other published prediction tools. Furthermore, the SHAP method was used to interpret the contribution of features to model predictions. The identification of key features can contribute to improved understanding of multifactorial contributors to host-pathogen interactions and bacterial pathogenesis. In addition to type IV effector prediction, we believe that the proposed framework can provide instructive guidance for similar studies to construct prediction methods on related biological problems. The data and source code of this study can be freely accessed at https://github.com/CT001002/T4SE-XGB.
Collapse
Affiliation(s)
- Tianhang Chen
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Mingming Jiang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
20
|
Li Q, Zhou W, Wang D, Wang S, Li Q. Prediction of Anticancer Peptides Using a Low-Dimensional Feature Model. Front Bioeng Biotechnol 2020; 8:892. [PMID: 32903381 PMCID: PMC7434836 DOI: 10.3389/fbioe.2020.00892] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 07/10/2020] [Indexed: 01/09/2023] Open
Abstract
Cancer is still a severe health problem globally. The therapy of cancer traditionally involves the use of radiotherapy or anticancer drugs to kill cancer cells, but these methods are quite expensive and have side effects, which will cause great harm to patients. With the find of anticancer peptides (ACPs), significant progress has been achieved in the therapy of tumors. Therefore, it is invaluable to accurately identify anticancer peptides. Although biochemical experiments can solve this work, this method is expensive and time-consuming. To promote the application of anticancer peptides in cancer therapy, machine learning can be used to recognize anticancer peptides by extracting the feature vectors of anticancer peptides. Nevertheless, poor performance usually be found in training the machine learning model to utilizing high-dimensional features in practice. In order to solve the above job, this paper put forward a 19-dimensional feature model based on anticancer peptide sequences, which has lower dimensionality and better performance than some existing methods. In addition, this paper also separated a model with a low number of dimensions and acceptable performance. The few features identified in this study may represent the important features of anticancer peptides.
Collapse
Affiliation(s)
- Qingwen Li
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Wenyang Zhou
- Center for Bioinformatics, School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Sui Wang
- Key Laboratory of Soybean Biology in Chinese Ministry of Education, Northeast Agricultural University, Harbin, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
| | - Qingyuan Li
- Forestry and Fruit Tree Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan, China
| |
Collapse
|
21
|
Predicting Preference of Transcription Factors for Methylated DNA Using Sequence Information. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:1043-1050. [PMID: 33294291 PMCID: PMC7691157 DOI: 10.1016/j.omtn.2020.07.035] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 07/28/2020] [Indexed: 12/12/2022]
Abstract
Transcription factors play key roles in cell-fate decisions by regulating 3D genome conformation and gene expression. The traditional view is that methylation of DNA hinders transcription factors binding to them, but recent research has shown that many transcription factors prefer to bind to methylated DNA. Therefore, identifying such transcription factors and understanding their functions is a stepping-stone for studying methylation-mediated biological processes. In this paper, a two-step discriminated method was proposed to recognize transcription factors and their preference for methylated DNA based only on sequences information. In the first step, the proposed model was used to discriminate transcription factors from non-transcription factors. The areas under the curve (AUCs) are 0.9183 and 0.9116, respectively, for the 5-fold cross-validation test and independent dataset test. Subsequently, for the classification of transcription factors that prefer methylated DNA and transcription factors that prefer non-methylated DNA, our model could produce the AUCs of 0.7744 and 0.7356, respectively, for the 5-fold cross-validation test and independent dataset test. Based on the proposed model, a user-friendly web server called TFPred was built, which can be freely accessed at http://lin-group.cn/server/TFPred/.
Collapse
|
22
|
Feng P, Feng L. Recent Advances on Antioxidant Identification Based on Machine Learning Methods. Curr Drug Metab 2020; 21:804-809. [PMID: 32682368 DOI: 10.2174/1389200221666200719001449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 03/17/2020] [Accepted: 05/13/2020] [Indexed: 11/22/2022]
Abstract
Antioxidants are molecules that can prevent damages to cells caused by free radicals. Recent studies also demonstrated that antioxidants play roles in preventing diseases. However, the number of known molecules with antioxidant activity is very small. Therefore, it is necessary to identify antioxidants from various resources. In the past several years, a series of computational methods have been proposed to identify antioxidants. In this review, we briefly summarized recent advances in computationally identifying antioxidants. The challenges and future perspectives for identifying antioxidants were also discussed. We hope this review will provide insights into researches on antioxidant identification.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Lijing Feng
- School of Sciences, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
23
|
Missing Value Estimation Methods Research for Arrhythmia Classification Using the Modified Kernel Difference-Weighted KNN Algorithms. BIOMED RESEARCH INTERNATIONAL 2020; 2020:7141725. [PMID: 32685521 PMCID: PMC7327608 DOI: 10.1155/2020/7141725] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 05/07/2020] [Accepted: 05/09/2020] [Indexed: 12/12/2022]
Abstract
Electrocardiogram (ECG) signal is critical to the classification of cardiac arrhythmia using some machine learning methods. In practice, the ECG datasets are usually with multiple missing values due to faults or distortion. Unfortunately, many established algorithms for classification require a fully complete matrix as input. Thus it is necessary to impute the missing data to increase the effectiveness of classification for datasets with a few missing values. In this paper, we compare the main methods for estimating the missing values in electrocardiogram data, e.g., the “Zero method”, “Mean method”, “PCA-based method”, and “RPCA-based method” and then propose a novel KNN-based classification algorithm, i.e., a modified kernel Difference-Weighted KNN classifier (MKDF-WKNN), which is fit for the classification of imbalance datasets. The experimental results on the UCI database indicate that the “RPCA-based method” can successfully handle missing values in arrhythmia dataset no matter how many values in it are missing and our proposed classification algorithm, MKDF-WKNN, is superior to other state-of-the-art algorithms like KNN, DS-WKNN, DF-WKNN, and KDF-WKNN for uneven datasets which impacts the accuracy of classification.
Collapse
|
24
|
Zhao T, Hu Y, Zang T, Wang Y. Identifying Protein Biomarkers in Blood for Alzheimer's Disease. Front Cell Dev Biol 2020; 8:472. [PMID: 32626709 PMCID: PMC7314983 DOI: 10.3389/fcell.2020.00472] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 05/20/2020] [Indexed: 12/26/2022] Open
Abstract
Background: At present, the main diagnostic methods for Alzheimer's disease (AD) are positron emission tomography (PET) scanning of the brain and analysis of cerebrospinal fluid (CSF) sample, but these methods are expensive and harmful to patients. Recently, more researchers focus on diagnosing AD by detecting biomarkers in blood, which is a cheaper and harmless way. Therefore, identifying AD-related proteins in blood can help treatment and diagnosis. Methods: We proposed a hypothesis that similar diseases share similar proteins. Diseases with similar symptoms are caused by abnormalities of similar proteins. Assuming that the similarities between AD and other diseases obey the normal distribution, we developed an iterative method based on disease similarity (IBDS). We combined Elastic Network (EN) with Minimum angle regression (MAR) to find the optimal solution. Finally, we used case studies and Summary data Mendelian Random (SMR) to verify our method. Results: We selected 39 diseases which are highly related to AD. They correspond 1,481 kinds of proteins. One hundred and eighty-four proteins are reported to be related to AD in Uniprot and the number would be 284 with our method. The AUC of our method by cross-validation is 0.9251 which is much higher than previous methods. Conclusion: In this paper, we presented a novel method for prioritizing AD-related proteins. Seven proteins have tissue specificity in blood among these 284 proteins, which could be used to diagnose AD in future. Case studies and SMR have been used to prove the relationship between these 7 proteins and AD. Availability and Implementation: https://github.com/zty2009/Identifying-Protein-Biomarkers-in-Blood-for-Alzheimer-s-Disease.
Collapse
Affiliation(s)
- Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
25
|
Meng C, Guo F, Zou Q. CWLy-SVM: A support vector machine-based tool for identifying cell wall lytic enzymes. Comput Biol Chem 2020; 87:107304. [PMID: 32580129 DOI: 10.1016/j.compbiolchem.2020.107304] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 06/07/2020] [Accepted: 06/08/2020] [Indexed: 12/21/2022]
Abstract
Cell wall lytic enzymes, as an important biotechnical tool in drug development, agriculture and the food industry, have attracted more research attention. In this research, the accurate identification of cell wall lytic enzymes is one of the key and fundamental tasks. In this study, in order to eliminate the inefficiency of in vitro experiments, a support vector machine-based cell wall lytic enzyme identification model was constructed using bioinformatics. This machine learning process includes feature extraction, feature selection, model training and optimization. According to the jackknife cross validation test, this model obtained a sensitivity of 0.853, a specificity of 0.977, an MCC of 0.845 and an AUC of 0.915. These benchmark results demonstrate that the proposed model outperforms the state-of-the-art method and that it has powerful cell wall lytic enzyme identification ability. Furthermore, we comprehensively analyzed the selected optimal features and used the proposed model to construct a user friendly web server called the CWLy-SVM to identify cell wall lytic enzymes, which is available at http://server.malab.cn/CWLy-SVM/index.jsp.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China; College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
26
|
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method. BIOMED RESEARCH INTERNATIONAL 2020; 2020:7297631. [PMID: 32352006 PMCID: PMC7174956 DOI: 10.1155/2020/7297631] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 04/01/2020] [Indexed: 12/02/2022]
Abstract
DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.
Collapse
|
27
|
Meng C, Hu Y, Zhang Y, Guo F. PSBP-SVM: A Machine Learning-Based Computational Identifier for Predicting Polystyrene Binding Peptides. Front Bioeng Biotechnol 2020; 8:245. [PMID: 32296690 PMCID: PMC7137786 DOI: 10.3389/fbioe.2020.00245] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/09/2020] [Indexed: 12/11/2022] Open
Abstract
Polystyrene binding peptides (PSBPs) play a key role in the immobilization process. The correct identification of PSBPs is the first step of all related works. In this paper, we proposed a novel support vector machine-based bioinformatic identification model. This model contains four machine learning steps, including feature extraction, feature selection, model training and optimization. In a five-fold cross validation test, this model achieves 90.38, 84.62, 87.50, and 0.90% SN, SP, ACC, and AUC, respectively. The performance of this model outperforms the state-of-the-art identifier in terms of the SN and ACC with a smaller feature set. Furthermore, we constructed a web server that includes the proposed model, which is freely accessible at http://server.malab.cn/PSBP-SVM/index.jsp.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China.,College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Yang Hu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
28
|
Zhang ZM, Tan JX, Wang F, Dao FY, Zhang ZY, Lin H. Early Diagnosis of Hepatocellular Carcinoma Using Machine Learning Method. Front Bioeng Biotechnol 2020; 8:254. [PMID: 32292778 PMCID: PMC7122481 DOI: 10.3389/fbioe.2020.00254] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/12/2020] [Indexed: 12/18/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is a serious cancer which ranked the fourth in cancer-related death worldwide. Hence, more accurate diagnostic models are urgently needed to aid the early HCC diagnosis under clinical scenarios and thus improve HCC treatment and survival. Several conventional methods have been used for discriminating HCC from cirrhosis tissues in patients without HCC (CwoHCC). However, the recognition successful rates are still far from satisfactory. In this study, we applied a computational approach that based on machine learning method to a set of microarray data generated from 1091 HCC samples and 242 CwoHCC samples. The within-sample relative expression orderings (REOs) method was used to extract numerical descriptors from gene expression profiles datasets. After removing the unrelated features by using maximum redundancy minimum relevance (mRMR) with incremental feature selection, we achieved “11-gene-pair” which could produce outstanding results. We further investigated the discriminate capability of the “11-gene-pair” for HCC recognition on several independent datasets. The wonderful results were obtained, demonstrating that the selected gene pairs can be signature for HCC. The proposed computational model can discriminate HCC and adjacent non-cancerous tissues from CwoHCC even for minimum biopsy specimens and inaccurately sampled specimens, which can be practical and effective for aiding the early HCC diagnosis at individual level.
Collapse
Affiliation(s)
- Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiu-Xin Tan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao-Yue Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
29
|
Sun S, Ding H, Wang D, Han S. Identifying Antifreeze Proteins Based on Key Evolutionary Information. Front Bioeng Biotechnol 2020; 8:244. [PMID: 32274383 PMCID: PMC7113384 DOI: 10.3389/fbioe.2020.00244] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 03/09/2020] [Indexed: 01/08/2023] Open
Abstract
Antifreeze proteins are important antifreeze materials that have been widely used in industry, including in cryopreservation, de-icing, and food storage applications. However, the quantity of some commercially produced antifreeze proteins is insufficient for large-scale industrial applications. Further, many antifreeze proteins have properties such as cytotoxicity, severely hindering their applications. Understanding the mechanisms underlying the protein-ice interactions and identifying novel antifreeze proteins are, therefore, urgently needed. In this study, to uncover the mechanisms underlying protein-ice interactions and provide an efficient and accurate tool for identifying antifreeze proteins, we assessed various evolutionary features based on position-specific scoring matrices (PSSMs) and evaluated their importance for discriminating of antifreeze and non-antifreeze proteins. We then parsimoniously selected seven key features with the highest importance. We found that the selected features showed opposite tendencies (regarding the conservation of certain amino acids) between antifreeze and non-antifreeze proteins. Five out of the seven features had relatively high contributions to the discrimination of antifreeze and non-antifreeze proteins, as revealed by a principal component analysis, i.e., the conservation of the replacement of Cys, Trp, and Gly in antifreeze proteins by Ala, Met, and Ala, respectively, in the related proteins, and the conservation of the replacement of Arg in non-antifreeze proteins by Ser and Arg in the related proteins. Based on the seven parsimoniously selected key features, we established a classifier using support vector machine, which outperformed the state-of-the-art tools. These results suggest that understanding evolutionary information is crucial to designing accurate automated methods for discriminating antifreeze and non-antifreeze proteins. Our classifier, therefore, is an efficient tool for annotating new proteins with antifreeze functions based on sequence information and can facilitate their application in industry.
Collapse
Affiliation(s)
- Shanwen Sun
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Shuguang Han
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
30
|
Zhang ZY, Yang YH, Ding H, Wang D, Chen W, Lin H. Design powerful predictor for mRNA subcellular location prediction in Homo sapiens. Brief Bioinform 2020; 22:526-535. [PMID: 31994694 DOI: 10.1093/bib/bbz177] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 11/05/2019] [Accepted: 11/21/2019] [Indexed: 12/14/2022] Open
Abstract
Messenger RNAs (mRNAs) shoulder special responsibilities that transmit genetic code from DNA to discrete locations in the cytoplasm. The locating process of mRNA might provide spatial and temporal regulation of mRNA and protein functions. The situ hybridization and quantitative transcriptomics analysis could provide detail information about mRNA subcellular localization; however, they are time consuming and expensive. It is highly desired to develop computational tools for timely and effectively predicting mRNA subcellular location. In this work, by using binomial distribution and one-way analysis of variance, the optimal nonamer composition was obtained to represent mRNA sequences. Subsequently, a predictor based on support vector machine was developed to identify the mRNA subcellular localization. In 5-fold cross-validation, results showed that the accuracy is 90.12% for Homo sapiens (H. sapiens). The predictor may provide a reference for the study of mRNA localization mechanisms and mRNA translocation strategies. An online web server was established based on our models, which is available at http://lin-group.cn/server/iLoc-mRNA/.
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- Center for Informational Biology at University of Electronic Science and Technology of China
| | - Yu-He Yang
- Center for Informational Biology at University of Electronic Science and Technology of China
| | - Hui Ding
- Center for Informational Biology at University of Electronic Science and Technology of China
| | - Dong Wang
- Department of Bioinformatics at Southern Medical University
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy at Chengdu University of Traditional Chinese Medicine
| | - Hao Lin
- Center for Informational Biology at University of Electronic Science and Technology of China
| |
Collapse
|
31
|
Miao YY, Zhao W, Li GP, Gao Y, Du PF. Predicting Endoplasmic Reticulum Resident Proteins Using Auto-Cross Covariance Transformation With a U-Shaped Residue Weight-Transfer Function. Front Genet 2020; 10:1231. [PMID: 31921288 PMCID: PMC6932965 DOI: 10.3389/fgene.2019.01231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 11/06/2019] [Indexed: 11/13/2022] Open
Abstract
Background: The endoplasmic reticulum (ER) is an important organelle in eukaryotic cells. It is involved in many important biological processes, such as cell metabolism, protein synthesis, and post-translational modification. The proteins that reside within the ER are called ER-resident proteins. These proteins are closely related to the biological functions of the ER. The difference between the ER-resident proteins and other non-resident proteins should be carefully studied. Methods: We developed a support vector machine (SVM)-based method. We developed a U-shaped weight-transfer function and used it, along with the positional-specific physiochemical properties (PSPCP), to integrate together sequence order information, signaling peptides information, and evolutionary information. Result: Our method achieved over 86% accuracy in a jackknife test. We also achieved roughly 86% sensitivity and 67% specificity in an independent dataset test. Our method is capable of identifying ER-resident proteins.
Collapse
Affiliation(s)
- Yang-Yang Miao
- College of Intelligence and Computing, Tianjin University, Tianjin, China.,School of Chemical Engineering, Tianjin University, Tianjin, China
| | - Wei Zhao
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Guang-Ping Li
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yang Gao
- School of Medicine, Nankai University, Tianjin, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
32
|
Basith S, Manavalan B, Hwan Shin T, Lee G. Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening. Med Res Rev 2020; 40:1276-1314. [DOI: 10.1002/med.21658] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 11/26/2019] [Accepted: 12/16/2019] [Indexed: 12/12/2022]
Affiliation(s)
- Shaherin Basith
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | | | - Tae Hwan Shin
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | - Gwang Lee
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| |
Collapse
|
33
|
Wang X, Zhu X, Ye M, Wang Y, Li CD, Xiong Y, Wei DQ. STS-NLSP: A Network-Based Label Space Partition Method for Predicting the Specificity of Membrane Transporter Substrates Using a Hybrid Feature of Structural and Semantic Similarity. Front Bioeng Biotechnol 2019; 7:306. [PMID: 31781551 PMCID: PMC6851049 DOI: 10.3389/fbioe.2019.00306] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 10/17/2019] [Indexed: 12/11/2022] Open
Abstract
Membrane transport proteins play crucial roles in the pharmacokinetics of substrate drugs, the drug resistance in cancer and are vital to the process of drug discovery, development and anti-cancer therapeutics. However, experimental methods to profile a substrate drug against a panel of transporters to determine its specificity are labor intensive and time consuming. In this article, we aim to develop an in silico multi-label classification approach to predict whether a substrate can specifically recognize one of the 13 categories of drug transporters ranging from ATP-binding cassette to solute carrier families using both structural fingerprints and chemical ontologies information of substrates. The data-driven network-based label space partition (NLSP) method was utilized to construct the model based on a hybrid of similarity-based feature by the integration of 2D fingerprint and semantic similarity. This method builds predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes union of label sets for a compound as final prediction. NLSP lies into the ensembles of multi-label classifier category in multi-label learning field. We utilized Cramér's V statistics to quantify the label correlations and depicted them via a heatmap. The jackknife tests and iterative stratification based cross-validation method were adopted on a benchmark dataset to evaluate the prediction performance of the proposed models both in multi-label and label-wise manner. Compared with other powerful multi-label methods, ML-kNN, MTSVM, and RAkELd, our multi-label classification model of NLPS-RF (random forest-based NLSP) has proven to be a feasible and effective model, and performed satisfactorily in the predictive task of transporter-substrate specificity. The idea behind NLSP method is intriguing and the power of NLSP remains to be explored for the multi-label learning problems in bioinformatics. The benchmark dataset, intermediate results and python code which can fully reproduce our experiments and results are available at https://github.com/dqwei-lab/STS.
Collapse
Affiliation(s)
- Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, China
| | - Mingzhi Ye
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Cheng-Dong Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|