Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chen C, Zhang Q, Yu B, Yu Z, Lawrence PJ, Ma Q, Zhang Y. Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput Biol Med 2020;123:103899. [DOI: 10.1016/j.compbiomed.2020.103899] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 06/28/2020] [Accepted: 06/28/2020] [Indexed: 10/23/2022]

For:	Chen C, Zhang Q, Yu B, Yu Z, Lawrence PJ, Ma Q, Zhang Y. Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput Biol Med 2020;123:103899. [DOI: 10.1016/j.compbiomed.2020.103899] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 06/28/2020] [Accepted: 06/28/2020] [Indexed: 10/23/2022]

Number

Cited by Other Article(s)

Predicting Protein–Protein Interactions Based on Ensemble Learning-Based Model from Protein Sequence. BIOLOGY 2022;11:biology11070995. [PMID: 36101379 PMCID: PMC9311754 DOI: 10.3390/biology11070995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 05/27/2022] [Accepted: 06/29/2022] [Indexed: 11/17/2022]

Li X, Han P, Wang G, Chen W, Wang S, Song T. SDNN-PPI: self-attention with deep neural network effect on protein-protein interaction prediction. BMC Genomics 2022;23:474. [PMID: 35761175 PMCID: PMC9235110 DOI: 10.1186/s12864-022-08687-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 06/10/2022] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Protein-protein interactions (PPIs) dominate intracellular molecules to perform a series of tasks such as transcriptional regulation, information transduction, and drug signalling. The traditional wet experiment method to obtain PPIs information is costly and time-consuming.

RESULT

In this paper, SDNN-PPI, a PPI prediction method based on self-attention and deep learning is proposed. The method adopts amino acid composition (AAC), conjoint triad (CT), and auto covariance (AC) to extract global and local features of protein sequences, and leverages self-attention to enhance DNN feature extraction to more effectively accomplish the prediction of PPIs. In order to verify the generalization ability of SDNN-PPI, a 5-fold cross-validation on the intraspecific interactions dataset of Saccharomyces cerevisiae (core subset) and human is used to measure our model in which the accuracy reaches 95.48% and 98.94% respectively. The accuracy of 93.15% and 88.33% are obtained in the interspecific interactions dataset of human-Bacillus Anthracis and Human-Yersinia pestis, respectively. In the independent data set Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, all prediction accuracy is 100%, which is higher than the previous PPIs prediction methods. To further evaluate the advantages and disadvantages of the model, the one-core and crossover network are conducted to predict PPIs, and the data show that the model correctly predicts the interaction pairs in the network.

CONCLUSION

In this paper, AAC, CT and AC methods are used to encode the sequence, and SDNN-PPI method is proposed to predict PPIs based on self-attention deep learning neural network. Satisfactory results are obtained on interspecific and intraspecific data sets, and good performance is also achieved in cross-species prediction. It can also correctly predict the protein interaction of cell and tumor information contained in one-core network and crossover network.The SDNN-PPI proposed in this paper not only explores the mechanism of protein-protein interaction, but also provides new ideas for drug design and disease prevention.

Collapse

Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022;106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]

Dhal SB, Jungbluth K, Lin R, Sabahi SP, Bagavathiannan M, Braga-Neto U, Kalafatis S. A Machine-Learning-Based IoT System for Optimizing Nutrient Supply in Commercial Aquaponic Operations. SENSORS (BASEL, SWITZERLAND) 2022;22:3510. [PMID: 35591199 PMCID: PMC9104751 DOI: 10.3390/s22093510] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 05/01/2022] [Accepted: 05/03/2022] [Indexed: 11/16/2022]

Bhagat SK, Tiyasha T, Kumar A, Malik T, Jawad AH, Khedher KM, Deo RC, Yaseen ZM. Integrative artificial intelligence models for Australian coastal sediment lead prediction: An investigation of in-situ measurements and meteorological parameters effects. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022;309:114711. [PMID: 35182982 DOI: 10.1016/j.jenvman.2022.114711] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 01/17/2022] [Accepted: 02/09/2022] [Indexed: 06/14/2023]

Abstract

Heavy metals (HMs) such as Lead (Pb) have played a vital role in increasing the sediments of the Australian bay's ecosystem. Several meteorological parameters (i.e., minimum, maximum and average temperature (T_min, T_max and T_avg^oC), rainfall (R_n mm) and their interactions with the other batch HMs, are hypothesized to have high impact for the decision-making strategies to minimize the impacts of Pb. Three feature selection (FS) algorithms namely the Boruta method, genetic algorithm (GA) and extreme gradient boosting (XGBoost) were investigated to select the highly important predictors for Pb concentration in the coastal bay sediments of Australia. These FS algorithms were statistically evaluated using principal component analysis (PCA) Biplot along with the correlation metrics describing the statistical characteristics that exist in the input and output parameter space of the models. To ensure a high accuracy attained by the applied predictive artificial intelligence (AI) models i.e., XGBoost, support vector machine (SVM) and random forest (RF), an auto-hyper-parameter tuning process using a Grid-search approach was also implemented. Cu, Ni, Ce, and Fe were selected by all the three applied FS algorithms whereas the T_avg and R_n inputs remained the essential parameters identified by GA and Boruta. The order of the FS outcome was XGBoost > GA > Boruta based on the applied statistical examination and the PCA Biplot results and the order of applied AI predictive models was XGBoost-SVM > GA-SVM > Boruta-SVM, where the SVM model remained at the top performance among the other statistical metrics. Based on the Taylor diagram for model evaluation, the RF model was reflected only marginally different so overall, the proposed integrative AI model provided an evidence a robust and reliable predictive technique used for coastal sediment Pb prediction.

Collapse

Xu Z, York LM, Seethepalli A, Bucciarelli B, Cheng H, Samac DA. Objective Phenotyping of Root System Architecture Using Image Augmentation and Machine Learning in Alfalfa (Medicago sativa L.). PLANT PHENOMICS (WASHINGTON, D.C.) 2022;2022:9879610. [PMID: 35479182 PMCID: PMC9012978 DOI: 10.34133/2022/9879610] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 03/03/2022] [Indexed: 12/28/2022]

Sahni G, Mewara B, Lalwani S, Kumar R. CF-PPI: Centroid based new feature extraction approach for Protein-Protein Interaction Prediction. J EXP THEOR ARTIF IN 2022. [DOI: 10.1080/0952813x.2022.2052189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Pan J, You ZH, Li LP, Huang WZ, Guo JX, Yu CQ, Wang LP, Zhao ZY. DWPPI: A Deep Learning Approach for Predicting Protein–Protein Interactions in Plants Based on Multi-Source Information With a Large-Scale Biological Network. Front Bioeng Biotechnol 2022;10:807522. [PMID: 35387292 PMCID: PMC8978800 DOI: 10.3389/fbioe.2022.807522] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 02/25/2022] [Indexed: 12/30/2022] Open

Yu B, Wang X, Zhang Y, Gao H, Wang Y, Liu Y, Gao X. RPI-MDLStack: Predicting RNA-protein interactions through deep learning with stacking strategy and LASSO. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Wang M, Song L, Zhang Y, Gao H, Yan L, Yu B. Malsite-Deep: Prediction of protein malonylation sites through deep learning and multi-information fusion based on NearMiss-2 strategy. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108191] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Wu Y, Sun L, Sun X, Wang B. A hybrid XGBoost-ISSA-LSTM model for accurate short-term and long-term dissolved oxygen prediction in ponds. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022;29:18142-18159. [PMID: 34686955 DOI: 10.1007/s11356-021-17020-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 10/09/2021] [Indexed: 06/13/2023]

Abstract

Dissolved oxygen (DO) is one of the most critical factors to measure the water quality in ponds, which greatly impacts on healthy growth of aquatic organisms. To improve the prediction accuracy of DO and grasp its changing trends, a novel hybrid DO prediction model based on the long short-term memory network (LSTM) optimized by an improved sparrow search algorithm (ISSA) is proposed. Firstly, to discard redundant information and improve the calculation speed of the model, the key factors that have a greater correlation with DO are selected as the input parameters by extreme gradient boosting (XGBoost). Secondly, towards expanding the searching range of sparrows and balancing the global and local search, we introduce an adaptive factor exponential declining strategy for producers, and an arcsine decreasing strategy for scouters, which nonlinearly decreases with the increase of iterations. Besides, we also improve the position updating of scouters, making the sparrows gradually move to the best position. Finally, LSTM is optimized by ISSA to get the best initial weights and thresholds to construct an XGBoost-ISSA-LSTM DO prediction model. Specifically, we first analyze the method for water quality prediction, which can make short-term prediction (including about 1 h, 2 h) and long-term prediction (including about 12 h, 24 h) of DO. In 1-h prediction, the root mean square error (RMSE) of the model is 0.5571, the mean absolute error (MAE) is 0.2572, and the R² is 0.9276. In 24 h prediction, RMSE of the model is 0.6310, MAE is 0.4562, and R² is 0.9082. The experimental results show that the proposed model has better generalization performance and higher prediction accuracy compared with other common models. Therefore, the presented model based on XGBoost-ISSA-LSTM is more effective and could meet the actual demand of accurate prediction of DO.

Collapse

Industrial Internet of Things for Condition Monitoring and Diagnosis of Dry Vacuum Pumps in Atomic Layer Deposition Equipment. ELECTRONICS 2022. [DOI: 10.3390/electronics11030375] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021;9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open

Abstract

Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.

Collapse

Maruf FA, Pratama R, Song G. DNN-Boost: Somatic mutation identification of tumor-only whole-exome sequencing data using deep neural network and XGBoost. J Bioinform Comput Biol 2021;19:2140017. [PMID: 34895111 DOI: 10.1142/s0219720021400175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

O'Neil LJ, Hu P, Liu Q, Islam MM, Spicer V, Rech J, Hueber A, Anaparti V, Smolik I, El-Gabalawy HS, Schett G, Wilkins JA. Proteomic Approaches to Defining Remission and the Risk of Relapse in Rheumatoid Arthritis. Front Immunol 2021;12:729681. [PMID: 34867950 PMCID: PMC8636686 DOI: 10.3389/fimmu.2021.729681] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 10/20/2021] [Indexed: 12/29/2022] Open

Affiliation(s)

Liam J O'Neil Section of Rheumatology, Department of Internal Medicine, University of Manitoba, Winnipeg, MB, Canada.,Manitoba Centre for Proteomics and Systems Biology, University of Manitoba and Health Sciences Centre, Winnipeg, MB, Canada
Pingzhao Hu Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada.,Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
Qian Liu Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada.,Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
Md Mohaiminul Islam Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada.,Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
Victor Spicer Manitoba Centre for Proteomics and Systems Biology, University of Manitoba and Health Sciences Centre, Winnipeg, MB, Canada
Juergen Rech Department of Medicine, Friedrich-Alexander University Erlangen-Nuernberg and Universitaetsklinikum Erlangen, Erlangen, Germany
Axel Hueber Department of Medicine, Friedrich-Alexander University Erlangen-Nuernberg and Universitaetsklinikum Erlangen, Erlangen, Germany
Vidyanand Anaparti Manitoba Centre for Proteomics and Systems Biology, University of Manitoba and Health Sciences Centre, Winnipeg, MB, Canada
Irene Smolik Section of Rheumatology, Department of Internal Medicine, University of Manitoba, Winnipeg, MB, Canada
Hani S El-Gabalawy Section of Rheumatology, Department of Internal Medicine, University of Manitoba, Winnipeg, MB, Canada.,Manitoba Centre for Proteomics and Systems Biology, University of Manitoba and Health Sciences Centre, Winnipeg, MB, Canada
Georg Schett Department of Medicine, Friedrich-Alexander University Erlangen-Nuernberg and Universitaetsklinikum Erlangen, Erlangen, Germany
John A Wilkins Section of Rheumatology, Department of Internal Medicine, University of Manitoba, Winnipeg, MB, Canada.,Manitoba Centre for Proteomics and Systems Biology, University of Manitoba and Health Sciences Centre, Winnipeg, MB, Canada

Collapse

Jiang F, Ma J. A comprehensive study of macro factors related to traffic fatality rates by XGBoost-based model and GIS techniques. ACCIDENT; ANALYSIS AND PREVENTION 2021;163:106431. [PMID: 34758411 DOI: 10.1016/j.aap.2021.106431] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Revised: 07/09/2021] [Accepted: 09/30/2021] [Indexed: 06/13/2023]

Zhang Y, Jiang Z, Chen C, Wei Q, Gu H, Yu B. DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier. Interdiscip Sci 2021;14:311-330. [PMID: 34731411 DOI: 10.1007/s12539-021-00488-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/12/2022]

Abstract

Accurate prediction of drug-target interactions (DTIs), which is often used in the fields of drug discovery and drug repositioning, is regarded a key challenge in the study of drug science. In this paper, a new method called DeepStack-DTIs is proposed to predict DTIs. First, for the target protein, pseudo-position specific score matrix, pseudo amino acid composition and SPIDER3 are used to extract the different feature information of the target protein. Meanwhile, the path-based fingerprint features of each drug are extracted. Then, the synthetic minority oversampling technique (SMOTE) and light gradient boosting machine (LightGBM) are used for data balancing and feature selection, respectively. Finally, the processed features are input to the deep-stacked ensemble classifier composed of gated recurrent unit (GRU), deep neural network (DNN), support vector machine (SVM), eXtreme gradient boosting (XGBoost) and logistic regression (LR) to predict DTIs. Under the five-fold cross-validation and compared with existing methods, the proposed method achieves higher prediction accuracy on the gold standard dataset. To evaluate the predictive power of DeepStack-DTIs, we validate the method on another dataset and predict the drug-target interaction network. The results indicate that DeepStack-DTIs has excellent predictive ability than the other methods, and provides novel insights for the prediction of DTIs. A novel method DeepStack-DTIs for drug-target interactions prediction. PsePSSM, PseAAC, SPIDER3 and FP2 are fused to convert protein sequence and drug molecule information into digital information, respectively. The SMOTE algorithm is used to balance the dataset and LightGBM feature selection algorithm is employed to remove redundant and irrelevant features to select the optimal feature subset. This optimal feature subset is inputted into the deep-stacked ensemble classifier to predict drug-target interactions. The experimental results show DeepStack-DTIs method can significantly improve the prediction accuracy of drug-target interactions.

Collapse

Noh B, Yoon H, Youm C, Kim S, Lee M, Park H, Kim B, Choi H, Noh Y. Prediction of Decline in Global Cognitive Function Using Machine Learning with Feature Ranking of Gait and Physical Fitness Outcomes in Older Adults. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021;18:ijerph182111347. [PMID: 34769864 PMCID: PMC8582857 DOI: 10.3390/ijerph182111347] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 10/26/2021] [Accepted: 10/27/2021] [Indexed: 11/30/2022]

Pan J, Li LP, Yu CQ, You ZH, Guan YJ, Ren ZH. Sequence-Based Prediction of Plant Protein-Protein Interactions by Combining Discrete Sine Transformation With Rotation Forest. Evol Bioinform Online 2021;17:11769343211050067. [PMID: 34671178 PMCID: PMC8521741 DOI: 10.1177/11769343211050067] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 09/13/2021] [Indexed: 11/24/2022] Open

Prediction for understanding the effectiveness of antiviral peptides. Comput Biol Chem 2021;95:107588. [PMID: 34655913 DOI: 10.1016/j.compbiolchem.2021.107588] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 10/01/2021] [Accepted: 10/02/2021] [Indexed: 11/20/2022]

Mahapatra S, Sahu SS. ANOVA-particle swarm optimization-based feature selection and gradient boosting machine classifier for improved protein-protein interaction prediction. Proteins 2021;90:443-454. [PMID: 34528291 DOI: 10.1002/prot.26236] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 08/09/2021] [Accepted: 09/03/2021] [Indexed: 01/22/2023]

BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021;2021:7764764. [PMID: 34484416 PMCID: PMC8413034 DOI: 10.1155/2021/7764764] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 08/13/2021] [Indexed: 01/19/2023]

Shi R, Xu X, Li J, Li Y. Prediction and analysis of train arrival delay based on XGBoost and Bayesian optimization. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107538] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Kang EM, Ryu IH, Lee G, Kim JK, Lee IS, Jeon GH, Song H, Kamiya K, Yoo TK. Development of a Web-Based Ensemble Machine Learning Application to Select the Optimal Size of Posterior Chamber Phakic Intraocular Lens. Transl Vis Sci Technol 2021;10:5. [PMID: 34111253 PMCID: PMC8107636 DOI: 10.1167/tvst.10.6.5] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

Liu Y, Jin S, Song L, Han Y, Yu B. Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier. J Mol Graph Model 2021;107:107962. [PMID: 34198216 DOI: 10.1016/j.jmgm.2021.107962] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 05/03/2021] [Accepted: 06/02/2021] [Indexed: 01/29/2023]

Kaushik M, Chandra Joshi R, Kushwah AS, Gupta MK, Banerjee M, Burget R, Dutta MK. Cytokine gene variants and socio-demographic characteristics as predictors of cervical cancer: A machine learning approach. Comput Biol Med 2021;134:104559. [PMID: 34147008 DOI: 10.1016/j.compbiomed.2021.104559] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 05/30/2021] [Accepted: 06/04/2021] [Indexed: 01/03/2023]

Abstract

Cervical cancer is still one of the most prevalent cancers in women and a significant cause of mortality. Cytokine gene variants and socio-demographic characteristics have been reported as biomarkers for determining the cervical cancer risk in the Indian population. This study was designed to apply a machine learning-based model using these risk factors for better prognosis and prediction of cervical cancer. This study includes the dataset of cytokine gene variants, clinical and socio-demographic characteristics of normal healthy control subjects, and cervical cancer cases. Different risk factors, including demographic details and cytokine gene variants, were analysed using different machine learning approaches. Various statistical parameters were used for evaluating the proposed method. After multi-step data processing and random splitting of the dataset, machine learning methods were applied and evaluated with 5-fold cross-validation and also tested on the unseen data records of a collected dataset for proper evaluation and analysis. The proposed approaches were verified after analysing various performance metrics. The logistic regression technique achieved the highest average accuracy of 82.25% and the highest average F1-score of 82.58% among all the methods. Ridge classifiers and the Gaussian Naïve Bayes classifier achieved the highest sensitivity-85%. The ridge classifier surpasses most of the machine learning classifiers with 84.78% accuracy and 97.83% sensitivity. The risk factors analysed in this study can be taken as biomarkers in developing a cervical cancer diagnosis system. The outcomes demonstrate that the machine learning assisted analysis of cytokine gene variants and socio-demographic characteristics can be utilised effectively for predicting the risk of developing cervical cancer.

Collapse

Wang X, Zhang Y, Yu B, Salhi A, Chen R, Wang L, Liu Z. Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis. Comput Biol Med 2021;134:104516. [PMID: 34119922 DOI: 10.1016/j.compbiomed.2021.104516] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 05/24/2021] [Accepted: 05/24/2021] [Indexed: 12/22/2022]

Shen Z, Wu Q, Wang Z, Chen G, Lin B. Diabetic Retinopathy Prediction by Ensemble Learning Based on Biochemical and Physical Data. SENSORS 2021;21:s21113663. [PMID: 34070287 PMCID: PMC8197325 DOI: 10.3390/s21113663] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 05/15/2021] [Accepted: 05/20/2021] [Indexed: 11/16/2022]

Wang CY, Lee SJ. Regional Population Forecast and Analysis Based on Machine Learning Strategy. ENTROPY 2021;23:e23060656. [PMID: 34073825 PMCID: PMC8225119 DOI: 10.3390/e23060656] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 05/14/2021] [Accepted: 05/18/2021] [Indexed: 01/29/2023]

Chen YZ, Wang ZZ, Wang Y, Ying G, Chen Z, Song J. nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning. Brief Bioinform 2021;22:6277413. [PMID: 34002774 DOI: 10.1093/bib/bbab146] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 03/18/2021] [Accepted: 03/25/2021] [Indexed: 12/20/2022] Open

Abstract

Lysine crotonylation (Kcr) is a newly discovered type of protein post-translational modification and has been reported to be involved in various pathophysiological processes. High-resolution mass spectrometry is the primary approach for identification of Kcr sites. However, experimental approaches for identifying Kcr sites are often time-consuming and expensive when compared with computational approaches. To date, several predictors for Kcr site prediction have been developed, most of which are capable of predicting crotonylation sites on either histones alone or mixed histone and nonhistone proteins together. These methods exhibit high diversity in their algorithms, encoding schemes, feature selection techniques and performance assessment strategies. However, none of them were designed for predicting Kcr sites on nonhistone proteins. Therefore, it is desirable to develop an effective predictor for identifying Kcr sites from the large amount of nonhistone sequence data. For this purpose, we first provide a comprehensive review on six methods for predicting crotonylation sites. Second, we develop a novel deep learning-based computational framework termed as CNNrgb for Kcr site prediction on nonhistone proteins by integrating different types of features. We benchmark its performance against multiple commonly used machine learning classifiers (including random forest, logitboost, naïve Bayes and logistic regression) by performing both 10-fold cross-validation and independent test. The results show that the proposed CNNrgb framework achieves the best performance with high computational efficiency on large datasets. Moreover, to facilitate users' efforts to investigate Kcr sites on human nonhistone proteins, we implement an online server called nhKcr and compare it with other existing tools to illustrate the utility and robustness of our method. The nhKcr web server and all the datasets utilized in this study are freely accessible at http://nhKcr.erc.monash.edu/.

Collapse

Karabulut OC, Karpuzcu BA, Türk E, Ibrahim AH, Süzek BE. ML-AdVInfect: A Machine-Learning Based Adenoviral Infection Predictor. Front Mol Biosci 2021;8:647424. [PMID: 34026828 PMCID: PMC8139618 DOI: 10.3389/fmolb.2021.647424] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 04/22/2021] [Indexed: 01/08/2023] Open

Abstract

Adenoviruses (AdVs) constitute a diverse family with many pathogenic types that infect a broad range of hosts. Understanding the pathogenesis of adenoviral infections is not only clinically relevant but also important to elucidate the potential use of AdVs as vectors in therapeutic applications. For an adenoviral infection to occur, attachment of the viral ligand to a cellular receptor on the host organism is a prerequisite and, in this sense, it is a criterion to decide whether an adenoviral infection can potentially happen. The interaction between any virus and its corresponding host organism is a specific kind of protein-protein interaction (PPI) and several experimental techniques, including high-throughput methods are being used in exploring such interactions. As a result, there has been accumulating data on virus-host interactions including a significant portion reported at publicly available bioinformatics resources. There is not, however, a computational model to integrate and interpret the existing data to draw out concise decisions, such as whether an infection happens or not. In this study, accepting the cellular entry of AdV as a decisive parameter for infectivity, we have developed a machine learning, more precisely support vector machine (SVM), based methodology to predict whether adenoviral infection can take place in a given host. For this purpose, we used the sequence data of the known receptors of AdVs, we identified sets of adenoviral ligands and their respective host species, and eventually, we have constructed a comprehensive adenovirus–host interaction dataset. Then, we committed interaction predictions through publicly available virus-host PPI tools and constructed an AdV infection predictor model using SVM with RBF kernel, with the overall sensitivity, specificity, and AUC of 0.88 ± 0.011, 0.83 ± 0.064, and 0.86 ± 0.030, respectively. ML-AdVInfect is the first of its kind as an effective predictor to screen the infection capacity along with anticipating any cross-species shifts. We anticipate our approach led to ML-AdVInfect can be adapted in making predictions for other viral infections.

Collapse

Prasasty VD, Hutagalung RA, Gunadi R, Sofia DY, Rosmalena R, Yazid F, Sinaga E. Prediction of human-Streptococcus pneumoniae protein-protein interactions using logistic regression. Comput Biol Chem 2021;92:107492. [PMID: 33964803 DOI: 10.1016/j.compbiolchem.2021.107492] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 04/21/2021] [Indexed: 02/07/2023]

Zhang Q, Liu P, Wang X, Zhang Y, Han Y, Yu B. StackPDB: Predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106921] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Novaes MT, Ferreira de Carvalho OL, Guimarães Ferreira PH, Nunes Tiraboschi TL, Silva CS, Zambrano JC, Gomes CM, de Paula Miranda E, Abílio de Carvalho Júnior O, de Bessa Júnior J. Prediction of secondary testosterone deficiency using machine learning: A comparative analysis of ensemble and base classifiers, probability calibration, and sampling strategies in a slightly imbalanced dataset. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Wei L, He W, Malik A, Su R, Cui L, Manavalan B. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework. Brief Bioinform 2020;22:5956930. [PMID: 33152766 DOI: 10.1093/bib/bbaa275] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 09/14/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open