1
|
Ahmed FS, Aly S, Liu X. EPI-Trans: an effective transformer-based deep learning model for enhancer promoter interaction prediction. BMC Bioinformatics 2024; 25:216. [PMID: 38890584 DOI: 10.1186/s12859-024-05784-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 04/15/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND Recognition of enhancer-promoter Interactions (EPIs) is crucial for human development. EPIs in the genome play a key role in regulating transcription. However, experimental approaches for classifying EPIs are too expensive in terms of effort, time, and resources. Therefore, more and more studies are being done on developing computational techniques, particularly using deep learning and other machine learning techniques, to address such problems. Unfortunately, the majority of current computational methods are based on convolutional neural networks, recurrent neural networks, or a combination of them, which don't take into consideration contextual details and the long-range interactions between the enhancer and promoter sequences. A new transformer-based model called EPI-Trans is presented in this study to overcome the aforementioned limitations. The multi-head attention mechanism in the transformer model automatically learns features that represent the long interrelationships between enhancer and promoter sequences. Furthermore, a generic model is created with transferability that can be utilized as a pre-trained model for various cell lines. Moreover, the parameters of the generic model are fine-tuned using a particular cell line dataset to improve performance. RESULTS Based on the results obtained from six benchmark cell lines, the average AUROC for the specific, generic, and best models is 94.2%, 95%, and 95.7%, while the average AUPR is 80.5%, 66.1%, and 79.6% respectively. CONCLUSIONS This study proposed a transformer-based deep learning model for EPI prediction. The comparative results on certain cell lines show that EPI-Trans outperforms other cutting-edge techniques and can provide superior performance on the challenge of recognizing EPI.
Collapse
Affiliation(s)
- Fatma S Ahmed
- Department of Computer Science and Technology, Xiamen University, Xiamen, 361005, China.
- Department of Electrical Engineering, Aswan University, Aswan, 81542, Egypt.
| | - Saleh Aly
- Department of Electrical Engineering, Aswan University, Aswan, 81542, Egypt.
- Department of Information Technology, Majmaah University, 11952, Majmaah, Saudi Arabia.
| | - Xiangrong Liu
- Department of Computer Science and Technology, Xiamen University, Xiamen, 361005, China
| |
Collapse
|
2
|
Kaur D, Arora A, Vigneshwar P, Raghava GPS. Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods. Proteomics 2024:e2400004. [PMID: 38803012 DOI: 10.1002/pmic.202400004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/29/2024] [Accepted: 05/13/2024] [Indexed: 05/29/2024]
Abstract
Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.
Collapse
Affiliation(s)
- Dashleen Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Akanksha Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Palani Vigneshwar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
3
|
Akbar S, Zou Q, Raza A, Alarfaj FK. iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks. Artif Intell Med 2024; 151:102860. [PMID: 38552379 DOI: 10.1016/j.artmed.2024.102860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 02/21/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024]
Abstract
Globally, fungal infections have become a major health concern in humans. Fungal diseases generally occur due to the invading fungus appearing on a specific portion of the body and becoming hard for the human immune system to resist. The recent emergence of COVID-19 has intensely increased different nosocomial fungal infections. The existing wet-laboratory-based medications are expensive, time-consuming, and may have adverse side effects on normal cells. In the last decade, peptide therapeutics have gained significant attention due to their high specificity in targeting affected cells without affecting healthy cells. Motivated by the significance of peptide-based therapies, we developed a highly discriminative prediction scheme called iAFPs-Mv-BiTCN to predict antifungal peptides correctly. The training peptides are encoded using word embedding methods such as skip-gram and attention mechanism-based bidirectional encoder representation using transformer. Additionally, transform-based evolutionary features are generated using the Pseduo position-specific scoring matrix using discrete wavelet transform (PsePSSM-DWT). The fused vector of word embedding and evolutionary descriptors is formed to compensate for the limitations of single encoding methods. A Shapley Additive exPlanations (SHAP) based global interpolation approach is applied to reduce training costs by choosing the optimal feature set. The selected feature set is trained using a bi-directional temporal convolutional network (BiTCN). The proposed iAFPs-Mv-BiTCN model achieved a predictive accuracy of 98.15 % and an AUC of 0.99 using training samples. In the case of the independent samples, our model obtained an accuracy of 94.11 % and an AUC of 0.98. Our iAFPs-Mv-BiTCN model outperformed existing models with a ~4 % and ~5 % higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed iAFPs-Mv-BiTCN model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, PR China.
| | - Ali Raza
- Department of Physical and Numerical Sciences, Qurtuba University of Science and Information Technology, Peshawar, KP 25124, Pakistan
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), Al-Ahsa 31982, Saudi Arabia
| |
Collapse
|
4
|
Ghazikhani H, Butler G. Exploiting protein language models for the precise classification of ion channels and ion transporters. Proteins 2024. [PMID: 38656743 DOI: 10.1002/prot.26694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 03/26/2024] [Accepted: 04/08/2024] [Indexed: 04/26/2024]
Abstract
This study introduces TooT-PLM-ionCT, a comprehensive framework that consolidates three distinct systems, each meticulously tailored for one of the following tasks: distinguishing ion channels (ICs) from membrane proteins (MPs), segregating ion transporters (ITs) from MPs, and differentiating ICs from ITs. Drawing upon the strengths of six Protein Language Models (PLMs)-ProtBERT, ProtBERT-BFD, ESM-1b, ESM-2 (650M parameters), and ESM-2 (15B parameters), TooT-PLM-ionCT employs a combination of traditional classifiers and deep learning models for nuanced protein classification. Originally validated on an existing dataset by previous researchers, our systems demonstrated superior performance in identifying ITs from MPs and distinguishing ICs from ITs, with the IC-MP discrimination achieving state-of-the-art results. In light of recommendations for additional validation, we introduced a new dataset, significantly enhancing the robustness and generalization of our models across bioinformatics challenges. This new evaluation underscored the effectiveness of TooT-PLM-ionCT in adapting to novel data while maintaining high classification accuracy. Furthermore, this study explores critical factors affecting classification accuracy, such as dataset balancing, the impact of using frozen versus fine-tuned PLM representations, and the variance between half and full precision in floating-point computations. To facilitate broader application and accessibility, a web server (https://tootsuite.encs.concordia.ca/service/TooT-PLM-ionCT) has been developed, allowing users to evaluate unknown protein sequences through our specialized systems for IC-MP, IT-MP, and IC-IT classification tasks.
Collapse
Affiliation(s)
- Hamed Ghazikhani
- Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada
| | - Gregory Butler
- Centre for Structural and Functional Genomics, Concordia University, Montréal, Québec, Canada
| |
Collapse
|
5
|
Routila E, Mahran R, Salminen S, Irjala H, Haapio E, Kytö E, Ventelä S, Petterson K, Routila J, Gidwani K, Leivo J. Identification of stemness-related glycosylation changes in head and neck squamous cell carcinoma. BMC Cancer 2024; 24:443. [PMID: 38600440 PMCID: PMC11005150 DOI: 10.1186/s12885-024-12161-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 03/21/2024] [Indexed: 04/12/2024] Open
Abstract
BACKGROUND Altered glycosylation is a hallmark of cancer associated with therapy resistance and tumor behavior. In this study, we investigated the glycosylation profile of stemness-related proteins OCT4, CIP2A, MET, and LIMA1 in HNSCC tumors. METHODS Tumor, adjacent normal tissue, and blood samples of 25 patients were collected together with clinical details. After tissue processing, lectin-based glycovariant screens were performed. RESULTS Strong correlation between glycosylation profiles of all four stemness-related proteins was observed in tumor tissue, whereas glycosylation in tumor tissue, adjacent normal tissue, and serum was differential. CONCLUSIONS A mannose- and galactose-rich glycosylation niche associated with stemness-related proteins was identified.
Collapse
Affiliation(s)
- E Routila
- Department of Life Technologies, University of Turku, Kiinamyllynkatu 10, 20520, Turku, Finland.
- InFLAMES Research Flagship, University of Turku, 20014, Turku, Finland.
- FICAN West Cancer Centre, Turku, Finland.
| | - R Mahran
- Department of Life Technologies, University of Turku, Kiinamyllynkatu 10, 20520, Turku, Finland
- FICAN West Cancer Centre, Turku, Finland
- Department of Chemistry, University of Turku, Henrikinkatu 2, 20500, Turku, Finland
| | - S Salminen
- Department of Life Technologies, University of Turku, Kiinamyllynkatu 10, 20520, Turku, Finland
- FICAN West Cancer Centre, Turku, Finland
| | - H Irjala
- Department for Otorhinolaryngology- Head and Neck surgery, University of Turku and Turku University Hospital, Savitehtaankatu 5, 20520, Turku, Finland
| | - E Haapio
- Department for Otorhinolaryngology- Head and Neck surgery, University of Turku and Turku University Hospital, Savitehtaankatu 5, 20520, Turku, Finland
| | - E Kytö
- Department for Otorhinolaryngology- Head and Neck surgery, University of Turku and Turku University Hospital, Savitehtaankatu 5, 20520, Turku, Finland
| | - S Ventelä
- FICAN West Cancer Centre, Turku, Finland
- Department for Otorhinolaryngology- Head and Neck surgery, University of Turku and Turku University Hospital, Savitehtaankatu 5, 20520, Turku, Finland
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
| | - K Petterson
- Department of Life Technologies, University of Turku, Kiinamyllynkatu 10, 20520, Turku, Finland
| | - J Routila
- FICAN West Cancer Centre, Turku, Finland
- Department for Otorhinolaryngology- Head and Neck surgery, University of Turku and Turku University Hospital, Savitehtaankatu 5, 20520, Turku, Finland
| | - K Gidwani
- Department of Life Technologies, University of Turku, Kiinamyllynkatu 10, 20520, Turku, Finland
| | - J Leivo
- Department of Life Technologies, University of Turku, Kiinamyllynkatu 10, 20520, Turku, Finland
- InFLAMES Research Flagship, University of Turku, 20014, Turku, Finland
- FICAN West Cancer Centre, Turku, Finland
| |
Collapse
|
6
|
Jin B, Wen X, Tian H, Guo H, Hao M, Wu J, Li X, Ren Y, Wang X, Ren X. Standardized uptake value max of the primary lesion combined with tumor markers for clinically predicting distant metastasis in de novo lung adenocarcinoma. Cancer Med 2024; 13:e6961. [PMID: 38549459 PMCID: PMC10979183 DOI: 10.1002/cam4.6961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/22/2023] [Accepted: 01/12/2024] [Indexed: 04/01/2024] Open
Abstract
BACKGROUND To examine standardized uptake valuemax of the primary lesion (pSUVmax) and tumor markers (TMs) for clinically predicting distant metastasis in novo lung adenocarcinoma. METHODS The current retrospective observational study examined individuals diagnosed with de novo lung adenocarcinoma at Shanxi Cancer Hospital between February 2015 and December 2019. RESULTS Totally, 532 de novo lung adenocarcinoma cases were included. They were aged 60.8 ± 9.7 years and comprised 224 women and 268 patients with distant metastasis. The areas under the curves (AUCs) of pSUVmax, lactate dehydrogenase (LDH), carcinoembryonic antigen (CEA), cytokeratin-19 fragment (CYFRA21-1), carbohydrate antigen 125 (CA125), and Grade of TMs for predicting distant metastasis were 0.742, 0.601, 0.671, 0.700, 0.736, and 0.745, respectively. The combination of pSUVmax, LDH, CEA, CYFRA21-1, CA125, and grade of TMs in predicting distant metastasis had an AUC value of 0.816 (95%CI: 0.781-0.851), with sensitivity of 89.2%, specificity of 58.7%, positive predictive value of 73.7%, and negative predictive value of 79.7%, respectively. CONCLUSIONS pSUVmax combined with serum levels of LDH, CEA, CYFRA21-1, CA125, and the grade of TMs may have good performance in predicting distant metastasis of de novo lung adenocarcinoma.
Collapse
Affiliation(s)
- Baoli Jin
- Department of Radiation Oncology, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer HospitalChinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical UniversityTaiyuanChina
| | - Xiaolian Wen
- Department of Oncology, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer HospitalChinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical UniversityTaiyuanChina
| | - Hanji Tian
- Department of Surgery, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer HospitalChinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical UniversityTaiyuanChina
| | | | - Mingyan Hao
- Department of Administration, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer HospitalChinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical UniversityTaiyuanChina
| | - Jing Wu
- Department of Radiation Oncology, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer HospitalChinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical UniversityTaiyuanChina
| | - Xiaomin Li
- Department of Radiation Oncology, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer HospitalChinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical UniversityTaiyuanChina
| | - Yuejun Ren
- Department of MR/CT, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer HospitalChinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical UniversityTaiyuanChina
| | - Xin Wang
- Department of SurgeryFirst Hospital of Shanxi Medical UniversityTaiyuanChina
| | - Xiaolu Ren
- Department of Radiation Oncology, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer HospitalChinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical UniversityTaiyuanChina
| |
Collapse
|
7
|
Gu L, Chen T, Li J, Huang YA, Du Z, Leung VCM, Chen J. Hybrid Bayesian Optimization-Based Graphical Discovery for Methylation Sites Prediction. IEEE J Biomed Health Inform 2024; 28:1917-1926. [PMID: 37801389 DOI: 10.1109/jbhi.2023.3322560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Protein methylation is one of the most important reversible post-translational modifications (PTMs), playing a vital role in the regulation of gene expression. Protein methylation sites serve as biomarkers in cardiovascular and pulmonary diseases, influencing various aspects of normal cell biology and pathogenesis. Nonetheless, the majority of existing computational methods for predicting protein methylation sites (PMSP) have been constructed based on protein sequences, with few methods leveraging the topological information of proteins. To address this issue, we propose an innovative framework for predicting Methylation Sites using Graphs (GraphMethySite) that employs graph convolution network in conjunction with Bayesian Optimization (BO) to automatically discover the graphical structure surrounding a candidate site and improve the predictive accuracy. In order to extract the most optimal subgraphs associated with methylation sites, we extend GraphMethySite by coupling it with a hybrid Bayesian optimization (together named GraphMethySite +) to determine and visualize the topological relevance among amino-acid residues. We evaluated our framework on two extended protein methylation datasets, and empirical results demonstrate that it outperforms existing state-of-the-art methylation prediction methods.
Collapse
|
8
|
Shen C, Mao D, Tang J, Liao Z, Chen S. Prediction of LncRNA-Protein Interactions Based on Kernel Combinations and Graph Convolutional Networks. IEEE J Biomed Health Inform 2024; 28:1937-1948. [PMID: 37327093 DOI: 10.1109/jbhi.2023.3286917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The complexes of long non-coding RNAs bound to proteins can be involved in regulating life activities at various stages of organisms. However, in the face of the growing number of lncRNAs and proteins, verifying LncRNA-Protein Interactions (LPI) based on traditional biological experiments is time-consuming and laborious. Therefore, with the improvement of computing power, predicting LPI has met new development opportunity. In virtue of the state-of-the-art works, a framework called LncRNA-Protein Interactions based on Kernel Combinations and Graph Convolutional Networks (LPI-KCGCN) has been proposed in this article. We first construct kernel matrices by taking advantage of extracting both the lncRNAs and protein concerning the sequence features, sequence similarity features, expression features, and gene ontology. Then reconstruct the existent kernel matrices as the input of the next step. Combined with known LPI interactions, the reconstructed similarity matrices, which can be used as features of the topology map of the LPI network, are exploited in extracting potential representations in the lncRNA and protein space using a two-layer Graph Convolutional Network. The predicted matrix can be finally obtained by training the network to produce scoring matrices w.r.t. lncRNAs and proteins. Different LPI-KCGCN variants are ensemble to derive the final prediction results and testify on balanced and unbalanced datasets. The 5-fold cross-validation shows that the optimal feature information combination on a dataset with 15.5% positive samples has an AUC value of 0.9714 and an AUPR value of 0.9216. On another highly unbalanced dataset with only 5% positive samples, LPI-KCGCN also has outperformed the state-of-the-art works, which achieved an AUC value of 0.9907 and an AUPR value of 0.9267.
Collapse
|
9
|
Bommanapally V, Abeyrathna D, Chundi P, Subramaniam M. Super resolution-based methodology for self-supervised segmentation of microscopy images. Front Microbiol 2024; 15:1255850. [PMID: 38533330 PMCID: PMC10963421 DOI: 10.3389/fmicb.2024.1255850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 02/15/2024] [Indexed: 03/28/2024] Open
Abstract
Data-driven Artificial Intelligence (AI)/Machine learning (ML) image analysis approaches have gained a lot of momentum in analyzing microscopy images in bioengineering, biotechnology, and medicine. The success of these approaches crucially relies on the availability of high-quality microscopy images, which is often a challenge due to the diverse experimental conditions and modes under which these images are obtained. In this study, we propose the use of recent ML-based image super-resolution (SR) techniques for improving the image quality of microscopy images, incorporating them into multiple ML-based image analysis tasks, and describing a comprehensive study, investigating the impact of SR techniques on the segmentation of microscopy images. The impacts of four Generative Adversarial Network (GAN)- and transformer-based SR techniques on microscopy image quality are measured using three well-established quality metrics. These SR techniques are incorporated into multiple deep network pipelines using supervised, contrastive, and non-contrastive self-supervised methods to semantically segment microscopy images from multiple datasets. Our results show that the image quality of microscopy images has a direct influence on the ML model performance and that both supervised and self-supervised network pipelines using SR images perform better by 2%-6% in comparison to baselines, not using SR. Based on our experiments, we also establish that the image quality improvement threshold range [20-64] for the complemented Perception-based Image Quality Evaluator(PIQE) metric can be used as a pre-condition by domain experts to incorporate SR techniques to significantly improve segmentation performance. A plug-and-play software platform developed to integrate SR techniques with various deep networks using supervised and self-supervised learning methods is also presented.
Collapse
Affiliation(s)
- Vidya Bommanapally
- Department of Computer Science, University of Nebraska, Omaha, NE, United States
| | | | | | | |
Collapse
|
10
|
Palacios A, Acharya P, Peidl A, Beck M, Blanco E, Mishra A, Bawa-Khalfe T, Pakhrin S. SumoPred-PLM: human SUMOylation and SUMO2/3 sites Prediction using Pre-trained Protein Language Model. NAR Genom Bioinform 2024; 6:lqae011. [PMID: 38327870 PMCID: PMC10849187 DOI: 10.1093/nargab/lqae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/17/2023] [Accepted: 01/17/2024] [Indexed: 02/09/2024] Open
Abstract
SUMOylation is an essential post-translational modification system with the ability to regulate nearly all aspects of cellular physiology. Three major paralogues SUMO1, SUMO2 and SUMO3 form a covalent bond between the small ubiquitin-like modifier with lysine residues at consensus sites in protein substrates. Biochemical studies continue to identify unique biological functions for protein targets conjugated to SUMO1 versus the highly homologous SUMO2 and SUMO3 paralogues. Yet, the field has failed to harness contemporary AI approaches including pre-trained protein language models to fully expand and/or recognize the SUMOylated proteome. Herein, we present a novel, deep learning-based approach called SumoPred-PLM for human SUMOylation prediction with sensitivity, specificity, Matthew's correlation coefficient, and accuracy of 74.64%, 73.36%, 0.48% and 74.00%, respectively, on the CPLM 4.0 independent test dataset. In addition, this novel platform uses contextualized embeddings obtained from a pre-trained protein language model, ProtT5-XL-UniRef50 to identify SUMO2/3-specific conjugation sites. The results demonstrate that SumoPred-PLM is a powerful and unique computational tool to predict SUMOylation sites in proteins and accelerate discovery.
Collapse
Affiliation(s)
- Andrew Vargas Palacios
- Department of Computer Science and Engineering Technology, University of Houston-Downtown, 1 Main St., Houston, TX 77002, USA
| | - Pujan Acharya
- Department of Computer Science and Engineering Technology, University of Houston-Downtown, 1 Main St., Houston, TX 77002, USA
| | - Anthony Stephen Peidl
- Department of Biology and Biochemistry, Center for Nuclear Receptors & Cell Signaling, University of Houston, Houston, TX 77204, USA
| | - Moriah Rene Beck
- Department of Chemistry and Biochemistry, Wichita State University, 1845 Fairmount St., Wichita, KS 67260, USA
| | - Eduardo Blanco
- Department of Computer Science, University of Arizona, 1040 4th St., Tucson, AZ 85721, USA
| | - Avdesh Mishra
- Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, Kingsville, TX 78363, USA
| | - Tasneem Bawa-Khalfe
- Department of Biology and Biochemistry, Center for Nuclear Receptors & Cell Signaling, University of Houston, Houston, TX 77204, USA
| | - Subash Chandra Pakhrin
- Department of Computer Science and Engineering Technology, University of Houston-Downtown, 1 Main St., Houston, TX 77002, USA
| |
Collapse
|
11
|
Nopour R. Screening ovarian cancer by using risk factors: machine learning assists. Biomed Eng Online 2024; 23:18. [PMID: 38347611 PMCID: PMC10863117 DOI: 10.1186/s12938-024-01219-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/06/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND AND AIM Ovarian cancer (OC) is a prevalent and aggressive malignancy that poses a significant public health challenge. The lack of preventive strategies for OC increases morbidity, mortality, and other negative consequences. Screening OC through risk prediction could be leveraged as a powerful strategy for preventive purposes that have not received much attention. So, this study aimed to leverage machine learning approaches as predictive assistance solutions to screen high-risk groups of OC and achieve practical preventive purposes. MATERIALS AND METHODS As this study is data-driven and retrospective in nature, we leveraged 1516 suspicious OC women data from one concentrated database belonging to six clinical settings in Sari City from 2015 to 2019. Six machine learning (ML) algorithms, including XG-Boost, Random Forest (RF), J-48, support vector machine (SVM), K-nearest neighbor (KNN), and artificial neural network (ANN) were leveraged to construct prediction models for OC. To choose the best model for predicting OC, we compared various prediction models built using the area under the receiver characteristic operator curve (AU-ROC). RESULTS Current experimental results revealed that the XG-Boost with AU-ROC = 0.93 (0.95 CI = [0.91-0.95]) was recognized as the best-performing model for predicting OC. CONCLUSIONS ML approaches possess significant predictive efficiency and interoperability to achieve powerful preventive strategies leveraging OC screening high-risk groups.
Collapse
Affiliation(s)
- Raoof Nopour
- Department of Health Information Management, Student Research Committee, School of Health Management and Information Sciences Branch, Iran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
12
|
Gonzalez Pepe I, Chatelain Y, Kiar G, Glatard T. Numerical stability of DeepGOPlus inference. PLoS One 2024; 19:e0296725. [PMID: 38285635 PMCID: PMC10824456 DOI: 10.1371/journal.pone.0296725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 12/16/2023] [Indexed: 01/31/2024] Open
Abstract
Convolutional neural networks (CNNs) are currently among the most widely-used deep neural network (DNN) architectures available and achieve state-of-the-art performance for many problems. Originally applied to computer vision tasks, CNNs work well with any data with a spatial relationship, besides images, and have been applied to different fields. However, recent works have highlighted numerical stability challenges in DNNs, which also relates to their known sensitivity to noise injection. These challenges can jeopardise their performance and reliability. This paper investigates DeepGOPlus, a CNN that predicts protein function. DeepGOPlus has achieved state-of-the-art performance and can successfully take advantage and annotate the abounding protein sequences emerging in proteomics. We determine the numerical stability of the model's inference stage by quantifying the numerical uncertainty resulting from perturbations of the underlying floating-point data. In addition, we explore the opportunity to use reduced-precision floating point formats for DeepGOPlus inference, to reduce memory consumption and latency. This is achieved by instrumenting DeepGOPlus' execution using Monte Carlo Arithmetic, a technique that experimentally quantifies floating point operation errors and VPREC, a tool that emulates results with customizable floating point precision formats. Focus is placed on the inference stage as it is the primary deliverable of the DeepGOPlus model, widely applicable across different environments. All in all, our results show that although the DeepGOPlus CNN is very stable numerically, it can only be selectively implemented with lower-precision floating-point formats. We conclude that predictions obtained from the pre-trained DeepGOPlus model are very reliable numerically, and use existing floating-point formats efficiently.
Collapse
Affiliation(s)
- Inés Gonzalez Pepe
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Qc, Canada
| | - Yohan Chatelain
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Qc, Canada
| | - Gregory Kiar
- Computational Neuroimaging Laboratory, Child Mind Institute, New York, NY, United States of America
| | - Tristan Glatard
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Qc, Canada
| |
Collapse
|
13
|
Zhang L, Xiao K, Wang X, Kong L. A novel fusion technology utilizing complex network and sequence information for FAD-binding site identification. Anal Biochem 2024; 685:115401. [PMID: 37981176 DOI: 10.1016/j.ab.2023.115401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/08/2023] [Accepted: 11/14/2023] [Indexed: 11/21/2023]
Abstract
Flavin adenine dinucleotide (FAD) binding sites play an increasingly important role as useful targets for inhibiting bacterial infections. To reveal protein topological structural information as a reasonable complement for the identification FAD-binding sites, we designed a novel fusion technology according to sequence and complex network. The specially designed feature vectors were combined and fed into CatBoost for model construction. Moreover, due to the minority class (positive samples) is more significant for biological researches, a random under-sampling technique was applied to solve the imbalance. Compared with the previous methods, our methods achieved the best results for two independent test datasets. Especially, the MCC obtained by FADsite and FADsite_seq were 14.37 %-53.37 % and 21.81 %-60.81 % higher than the results of existing methods on Test6; and they showed improvements ranging from 6.03 % to 21.96 % and 19.77 %-35.70 % on Test4. Meanwhile, statistical tests show that our methods significantly differ from the state-of-the-art methods and the cross-entropy loss shows that our methods have high certainty. The excellent results demonstrated the effectiveness of using sequence and complex network information in identifying FAD-binding sites. It may be complementary to other biological studies. The data and resource codes are available at https://github.com/Kangxiaoneuq/FADsite.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China; Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China
| | - Kang Xiao
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Xueting Wang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Liang Kong
- Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China; School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, PR China.
| |
Collapse
|
14
|
Ma X, Liang Y, Zhang S. iAVPs-ResBi: Identifying antiviral peptides by using deep residual network and bidirectional gated recurrent unit. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21563-21587. [PMID: 38124610 DOI: 10.3934/mbe.2023954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Human history is also the history of the fight against viral diseases. From the eradication of viruses to coexistence, advances in biomedicine have led to a more objective understanding of viruses and a corresponding increase in the tools and methods to combat them. More recently, antiviral peptides (AVPs) have been discovered, which due to their superior advantages, have achieved great impact as antiviral drugs. Therefore, it is very necessary to develop a prediction model to accurately identify AVPs. In this paper, we develop the iAVPs-ResBi model using k-spaced amino acid pairs (KSAAP), encoding based on grouped weight (EBGW), enhanced grouped amino acid composition (EGAAC) based on the N5C5 sequence, composition, transition and distribution (CTD) based on physicochemical properties for multi-feature extraction. Then we adopt bidirectional long short-term memory (BiLSTM) to fuse features for obtaining the most differentiated information from multiple original feature sets. Finally, the deep model is built by combining improved residual network and bidirectional gated recurrent unit (BiGRU) to perform classification. The results obtained are better than those of the existing methods, and the accuracies are 95.07, 98.07, 94.29 and 97.50% on the four datasets, which show that iAVPs-ResBi can be used as an effective tool for the identification of antiviral peptides. The datasets and codes are freely available at https://github.com/yunyunliang88/iAVPs-ResBi.
Collapse
Affiliation(s)
- Xinyan Ma
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| |
Collapse
|
15
|
Sanjurjo-de-No A, Pérez-Zuriaga AM, García A. Analysis and prediction of injury severity in single micromobility crashes with Random Forest. Heliyon 2023; 9:e23062. [PMID: 38144294 PMCID: PMC10746459 DOI: 10.1016/j.heliyon.2023.e23062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 11/24/2023] [Accepted: 11/24/2023] [Indexed: 12/26/2023] Open
Abstract
Urban micromobility represents a significant shift towards sustainable cities, underscoring the paramount importance of its safety. With the surge in micromobility adoption, collisions involving micromobility devices, such as bicycles and e-scooters, have surged in recent years. The second most common crash type involving these vehicles is one that only involves a micromobility vehicle (single micromobility crashes). This study analyzed 6030 single micromobility crashes that occurred in Spanish urban areas from 2016 to 2020. The Random Forest methodology was applied to create a classification model for the purpose of characterizing these crashes, predicting their injury severity, and identifying the primary influencing factors. To address the issue of imbalanced data, resulting from the relatively smaller dataset of fatal and seriously injured crashes compared to slightly injured ones, the Synthetic Minority Oversampling Technique (SMOTE) was applied. The results indicate that certain behaviors, such as not wearing a helmet, riding for leisure, and instances of speeding violations, have the potential to increase injury severity. Additionally, crashes occurring at intersections or at cycle lanes with bad pavement conditions are likely to result in more severe outcomes. Furthermore, the concurrent presence of various other factors also contributes to an escalation in crash injury severity. These findings have the potential to provide valuable insights to authorities, assisting them in the decision-making process to enhance micromobility safety and thereby promoting the creation of more equitable and sustainable urban environments.
Collapse
Affiliation(s)
| | - Ana María Pérez-Zuriaga
- Highway Engineering Research Group (HERG), Universitat Politècnica de València, Camino de Vera, s/n, 46022 Valencia, Spain
| | - Alfredo García
- Highway Engineering Research Group (HERG), Universitat Politècnica de València, Camino de Vera, s/n, 46022 Valencia, Spain
| |
Collapse
|
16
|
Lou LL, Qiu WR, Liu Z, Xu ZC, Xiao X, Huang SF. Stacking-ac4C: an ensemble model using mixed features for identifying n4-acetylcytidine in mRNA. Front Immunol 2023; 14:1267755. [PMID: 38094296 PMCID: PMC10716444 DOI: 10.3389/fimmu.2023.1267755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/14/2023] [Indexed: 12/18/2023] Open
Abstract
N4-acetylcytidine (ac4C) is a modification of cytidine at the nitrogen-4 position, playing a significant role in the translation process of mRNA. However, the precise mechanism and details of how ac4C modifies translated mRNA remain unclear. Since identifying ac4C sites using conventional experimental methods is both labor-intensive and time-consuming, there is an urgent need for a method that can promptly recognize ac4C sites. In this paper, we propose a comprehensive ensemble learning model, the Stacking-based heterogeneous integrated ac4C model, engineered explicitly to identify ac4C sites. This innovative model integrates three distinct feature extraction methodologies: Kmer, electron-ion interaction pseudo-potential values (PseEIIP), and pseudo-K-tuple nucleotide composition (PseKNC). The model also incorporates the robust Cluster Centroids algorithm to enhance its performance in dealing with imbalanced data and alleviate underfitting issues. Our independent testing experiments indicate that our proposed model improves the Mcc by 15.61% and the ROC by 5.97% compared to existing models. To test our model's adaptability, we also utilized a balanced dataset assembled by the authors of iRNA-ac4C. Our model showed an increase in Sn of 4.1%, an increase in Acc of nearly 1%, and ROC improvement of 0.35% on this balanced dataset. The code for our model is freely accessible at https://github.com/louliliang/ST-ac4C.git, allowing users to quickly build their model without dealing with complicated mathematical equations.
Collapse
Affiliation(s)
- Li-Liang Lou
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Zi Liu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Zhao-Chun Xu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Shun-Fa Huang
- School of Information Engineering , Jingdezhen University, Jingdezhen, China
| |
Collapse
|
17
|
Xia F, Guo F, Liu Z, Zeng J, Ma X, Yu C, Li C. Enhanced CT combined with texture analysis for differential diagnosis of pleomorphic adenoma and adenolymphoma. BMC Med Imaging 2023; 23:169. [PMID: 37891554 PMCID: PMC10612226 DOI: 10.1186/s12880-023-01129-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023] Open
Abstract
OBJECTIVE This study sought to evaluate the worth of the general characteristics of enhanced CT images and the histogram parameters of each stage in distinguishing pleomorphic adenoma (PA) and adenolymphoma (AL). METHODS The imaging features and histogram parameters of preoperative enhanced CT images in 20 patients with PA and 29 patients with AL were analyzed. Tumor morphology and histogram parameters of PA and AL were compared. Area under the curve (AUC), sensitivity, and subject operational feature specificity (ROC) analysis were used to determine the differential diagnostic effect of single-stage or multi-stage parameter combinations. RESULTS The difference in CT value and net enhancement value of arterial phase (AP) were significant (p < 0.05); Flat sweep phase (FSP), AP mean, percentiles, 10th, 50th, 90th, 99th and arterial period variance and venous phase (VP) kurtosis in the nine histogram parameters of each period (p < 0.05). An analysis of the ROC curve revealed a maximum area beneath the curve (AUC) in the 90th percentile of FSP for a single-parameter differential diagnosis to be 0.870. The diagnostic efficacy of the mean value of FSP + The 90th percentile of AP + Kurtosis of VP was the best in multi-parameter combination diagnosis, with an AUC of 0.925, and the sensitivity and specificity of 0.900 and 0.850, respectively. CONCLUSION The histogram analysis of enhanced CT images is valuable for the differentiation of PA and AL. Moreover, the combination of single-stage parameters or multi-stage parameters can improve the differential diagnosis efficiency.
Collapse
Affiliation(s)
- Feifei Xia
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Shihezi University, Shihezi, 832000, China
| | - Foqing Guo
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Shihezi University, Shihezi, 832000, China
| | - Zhe Liu
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Shihezi University, Shihezi, 832000, China
| | - Jie Zeng
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Shihezi University, Shihezi, 832000, China
| | - Xuehua Ma
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Shihezi University, Shihezi, 832000, China
| | - Chongqing Yu
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Shihezi University, Shihezi, 832000, China
| | - Changxue Li
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Shihezi University, Shihezi, 832000, China.
| |
Collapse
|
18
|
Jia J, Wei Z, Sun M. EMDL_m6Am: identifying N6,2'-O-dimethyladenosine sites based on stacking ensemble deep learning. BMC Bioinformatics 2023; 24:397. [PMID: 37880673 PMCID: PMC10598967 DOI: 10.1186/s12859-023-05543-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND N6, 2'-O-dimethyladenosine (m6Am) is an abundant RNA methylation modification on vertebrate mRNAs and is present in the transcription initiation region of mRNAs. It has recently been experimentally shown to be associated with several human disorders, including obesity genes, and stomach cancer, among others. As a result, N6,2'-O-dimethyladenosine (m6Am) site will play a crucial part in the regulation of RNA if it can be correctly identified. RESULTS This study proposes a novel deep learning-based m6Am prediction model, EMDL_m6Am, which employs one-hot encoding to expressthe feature map of the RNA sequence and recognizes m6Am sites by integrating different CNN models via stacking. Including DenseNet, Inflated Convolutional Network (DCNN) and Deep Multiscale Residual Network (MSRN), the sensitivity (Sn), specificity (Sp), accuracy (ACC), Mathews correlation coefficient (MCC) and area under the curve (AUC) of our model on the training data set reach 86.62%, 88.94%, 87.78%, 0.7590 and 0.8778, respectively, and the prediction results on the independent test set are as high as 82.25%, 79.72%, 80.98%, 0.6199, and 0.8211. CONCLUSIONS In conclusion, the experimental results demonstrated that EMDL_m6Am greatly improved the predictive performance of the m6Am sites and could provide a valuable reference for the next part of the study. The source code and experimental data are available at: https://github.com/13133989982/EMDL-m6Am .
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.
| | - Zhangying Wei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.
| | - Mingwei Sun
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| |
Collapse
|
19
|
Le NQK, Xu L. Optimizing Hyperparameter Tuning in Machine Learning to Improve the Predictive Performance of Cross-Species N6-Methyladenosine Sites. ACS OMEGA 2023; 8:39420-39426. [PMID: 37901522 PMCID: PMC10600906 DOI: 10.1021/acsomega.3c05074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/28/2023] [Indexed: 10/31/2023]
Abstract
DNA N6-methyladenosine (6 mA) modification carries significant epigenetic information and plays a pivotal role in biological functions, thereby profoundly impacting human development. Precise and reliable detection of 6 mA sites is integral to understanding the mechanisms underpinning DNA modification. The present methods, primarily experimental, used to identify specific molecular sites are often time-intensive and costly. Consequently, the rise of computer-based methods aimed at identifying 6 mA sites provides a welcome alternative. Our research introduces a novel model to discern DNA 6 mA sites in cross-species genomes. This model, developed through machine learning, utilizes extracted sequence information. Hyperparameter tuning was employed to ascertain the most effective feature combination and model implementation, thereby garnering vital information from sequences. Our model demonstrated superior accuracy compared to the existing models when tested using five-fold cross-validation. Thus, our study substantiates the reliability and efficiency of our model as a valuable tool for supplementing experimental research.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional
Master Program in Artificial Intelligence in Medicine, College of
Medicine, Taipei Medical University, Taipei 110, Taiwan
- Research
Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 110, Taiwan
- AIBioMed
Research Group, Taipei Medical University, Taipei 110, Taiwan
- Translational
Imaging Research Center, Taipei Medical
University Hospital, Taipei 110, Taiwan
| | - Ling Xu
- NUS-ISS,
National University of Singapore, Singapore, 119615, Singapore
| |
Collapse
|
20
|
Solanki A, Griffin Z, Sutradhar PR, Pradhan K, Merritt C, Ganguly A, Riedel M. Neural network execution using nicked DNA and microfluidics. PLoS One 2023; 18:e0292228. [PMID: 37856428 PMCID: PMC10586678 DOI: 10.1371/journal.pone.0292228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 09/17/2023] [Indexed: 10/21/2023] Open
Abstract
DNA has been discussed as a potential medium for data storage. Potentially it could be denser, could consume less energy, and could be more durable than conventional storage media such as hard drives, solid-state storage, and optical media. However, performing computations on the data stored in DNA is a largely unexplored challenge. This paper proposes an integrated circuit (IC) based on microfluidics that can perform complex operations such as artificial neural network (ANN) computation on data stored in DNA. We envision such a system to be suitable for highly dense, throughput-demanding bio-compatible applications such as an intelligent Organ-on-Chip or other biomedical applications that may not be latency-critical. It computes entirely in the molecular domain without converting data to electrical form, making it a form of in-memory computing on DNA. The computation is achieved by topologically modifying DNA strands through the use of enzymes called nickases. A novel scheme is proposed for representing data stochastically through the concentration of the DNA molecules that are nicked at specific sites. The paper provides details of the biochemical design, as well as the design, layout, and operation of the microfluidics device. Benchmarks are reported on the performance of neural network computation.
Collapse
Affiliation(s)
- Arnav Solanki
- Department of Electrical and Computer Engineering, University of Minnesota Twin-Cities, Minneapolis, MN, United States of America
| | - Zak Griffin
- Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY, United States of America
| | - Purab Ranjan Sutradhar
- Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY, United States of America
| | - Karisha Pradhan
- Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY, United States of America
| | - Caiden Merritt
- Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY, United States of America
| | - Amlan Ganguly
- Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY, United States of America
| | - Marc Riedel
- Department of Electrical and Computer Engineering, University of Minnesota Twin-Cities, Minneapolis, MN, United States of America
| |
Collapse
|
21
|
Sinha D, Dasmandal T, Paul K, Yeasin M, Bhattacharjee S, Murmu S, Mishra DC, Pal S, Rai A, Archak S. MethSemble-6mA: an ensemble-based 6mA prediction server and its application on promoter region of LBD gene family in Poaceae. FRONTIERS IN PLANT SCIENCE 2023; 14:1256186. [PMID: 37877081 PMCID: PMC10591185 DOI: 10.3389/fpls.2023.1256186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/01/2023] [Indexed: 10/26/2023]
Abstract
The Lateral Organ Boundaries Domain (LBD) containing genes are a set of plant-specific transcription factors and are crucial for controlling both organ development and defense mechanisms as well as anthocyanin synthesis and nitrogen metabolism. It is imperative to understand how methylation regulates gene expression, through predicting methylation sites of their promoters particularly in major crop species. In this study, we developed a user-friendly prediction server for accurate prediction of 6mA sites by incorporating a robust feature set, viz., Binary Encoding of Mono-nucleotide DNA. Our model,MethSemble-6mA, outperformed other state-of-the-art tools in terms of accuracy (93.12%). Furthermore, we investigated the pattern of probable 6mA sites at the upstream promoter regions of the LBD-containing genes in Triticum aestivum and its allied species using the developed tool. On average, each selected species had four 6mA sites, and it was found that with speciation and due course of evolution in wheat, the frequency of methylation have reduced, and a few sites remain conserved. This obviously cues gene birth and gene expression alteration through methylation over time in a species and reflects functional conservation throughout evolution. Since DNA methylation is a vital event in almost all plant developmental processes (e.g., genomic imprinting and gametogenesis) along with other life processes, our findings on epigenetic regulation of LBD-containing genes have dynamic implications in basic and applied research. Additionally, MethSemble-6mA (http://cabgrid.res.in:5799/) will serve as a useful resource for a plant breeders who are interested to pursue epigenetic-based crop improvement research.
Collapse
Affiliation(s)
- Dipro Sinha
- ICAR-Indian Agricultural Statistics Research Institute, Delhi, India
- Graduate School, ICAR-Indian Agricultural Research Institute, Delhi, India
| | - Tanwy Dasmandal
- ICAR-Indian Agricultural Statistics Research Institute, Delhi, India
- Graduate School, ICAR-Indian Agricultural Research Institute, Delhi, India
- ICAR-National Bureau of Fish Genetic Resources, Lucknow, India
| | - Krishnayan Paul
- Graduate School, ICAR-Indian Agricultural Research Institute, Delhi, India
- ICAR-National Institute for Plant Biotechnology, Delhi, India
| | - Md Yeasin
- ICAR-Indian Agricultural Statistics Research Institute, Delhi, India
| | - Sougata Bhattacharjee
- Graduate School, ICAR-Indian Agricultural Research Institute, Delhi, India
- ICAR-National Institute for Plant Biotechnology, Delhi, India
- ICAR-Indian Agricultural Research Institute, Hazaribagh, Jharkhand, India
| | - Sneha Murmu
- ICAR-Indian Agricultural Statistics Research Institute, Delhi, India
| | | | - Soumen Pal
- ICAR-Indian Agricultural Statistics Research Institute, Delhi, India
| | - Anil Rai
- Indian Council of Agricultural Research, Delhi, India
| | - Sunil Archak
- ICAR-National Bureau of Plant Genetic Resources, Delhi, India
| |
Collapse
|
22
|
Wu Q, Chang Y, Yang C, Liu H, Chen F, Dong H, Chen C, Luo Q. Adjuvant chemotherapy or no adjuvant chemotherapy? A prediction model for the risk stratification of recurrence or metastasis of nasopharyngeal carcinoma combining MRI radiomics with clinical factors. PLoS One 2023; 18:e0287031. [PMID: 37751422 PMCID: PMC10522047 DOI: 10.1371/journal.pone.0287031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 05/28/2023] [Indexed: 09/28/2023] Open
Abstract
BACKGROUND Dose adjuvant chemotherapy (AC) should be offered in nasopharyngeal carcinoma (NPC) patients? Different guidelines provided the different recommendations. METHODS In this retrospective study, a total of 140 patients were enrolled and followed for 3 years, with 24 clinical features being collected. The imaging features on the enhanced-MRI sequence were extracted by using PyRadiomics platform. The pearson correlation coefficient and the random forest was used to filter the features associated with recurrence or metastasis. A clinical-radiomics model (CRM) was constructed by the Cox multivariable analysis in training cohort, and was validated in validation cohort. All patients were divided into high- and low-risk groups through the median Rad-score of the model. The Kaplan-Meier survival curves were used to compare the 3-year recurrence or metastasis free rate (RMFR) of patients with or without AC in high- and low-groups. RESULTS In total, 960 imaging features were extracted. A CRM was constructed from nine features (seven imaging features and two clinical factors). In the training cohort, the area under curve (AUC) of CRM for 3-year RMFR was 0.872 (P <0.001), and the sensitivity and specificity were 0.935 and 0.672, respectively; In the validation cohort, the AUC was 0.864 (P <0.001), and the sensitivity and specificity were 1.00 and 0.75, respectively. Kaplan-Meier curve showed that the 3-year RMFR and 3-year cancer specific survival (CSS) rate in the high-risk group were significantly lower than those in the low-risk group (P <0.001). In the high-risk group, patients who received AC had greater 3-year RMFR than those who did not receive AC (78.6% vs. 48.1%) (p = 0.03). CONCLUSION Considering increasing RMFR, a prediction model for NPC based on two clinical factors and seven imaging features suggested the AC needs to be added to patients in the high-risk group and not in the low-risk group.
Collapse
Affiliation(s)
- Qiaoyuan Wu
- The Public Experimental Center of Medicine, Department of Pathology, Affiliated Hospital of Zunyi Medical University, Zunyi, Guizhou, P. R. China
| | - Yonghu Chang
- School of Medical Information Engineering of Zunyi Medical University, Zunyi Medical University, Zunyi, Guizhou, P. R. China
| | - Cheng Yang
- The Third Clinical Medical College of Ningxia Medical University, Yinchuan, Ningxia, P. R. China
| | - Heng Liu
- Department of Radiology, Affiliated Hospital of Zunyi Medical University, Zunyi, Guizhou, P. R. China
| | - Fang Chen
- The Public Experimental Center of Medicine, Department of Pathology, Affiliated Hospital of Zunyi Medical University, Zunyi, Guizhou, P. R. China
| | - Hui Dong
- The Public Experimental Center of Medicine, Department of Pathology, Affiliated Hospital of Zunyi Medical University, Zunyi, Guizhou, P. R. China
| | - Cheng Chen
- Department of Thoracic Surgery, Affiliated Hospital of Zunyi Medical University, Zunyi, Guizhou, P.R. China
| | - Qing Luo
- The Public Experimental Center of Medicine, Department of Pathology, Affiliated Hospital of Zunyi Medical University, Zunyi, Guizhou, P. R. China
| |
Collapse
|
23
|
Wang A, Meng Q, Wang M. Spectrum Sensing Method Based on Residual Dense Network and Attention. SENSORS (BASEL, SWITZERLAND) 2023; 23:7791. [PMID: 37765847 PMCID: PMC10534694 DOI: 10.3390/s23187791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/06/2023] [Accepted: 09/08/2023] [Indexed: 09/29/2023]
Abstract
To address the problems of gradient vanishing and limited feature extraction capability of traditional CNN spectrum sensing methods in deep network structures and to effectively avoid network degradation issues under deep network structures, this paper proposes a collaborative spectrum sensing method based on Residual Dense Network and attention mechanisms. This method involves stacking and normalizing the time-domain information of the signal, constructing a two-dimensional matrix, and mapping it to a grayscale image. The grayscale images are divided into training and testing sets, and the training set is used to train the neural network to extract deep features. Finally, the test set is fed into the well-trained neural network for spectrum sensing. Experimental results show that, under low signal-to-noise ratios, the proposed method demonstrates superior spectral sensing performance compared to traditional collaborative spectrum sensing methods.
Collapse
Affiliation(s)
- Anyi Wang
- School of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Qifeng Meng
- School of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an 710054, China
| | - Mingbo Wang
- School of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an 710054, China
| |
Collapse
|
24
|
Zhang P, Wu H. IChrom-Deep: An Attention-Based Deep Learning Model for Identifying Chromatin Interactions. IEEE J Biomed Health Inform 2023; 27:4559-4568. [PMID: 37402191 DOI: 10.1109/jbhi.2023.3292299] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
Identification of chromatin interactions is crucial for advancing our knowledge of gene regulation. However, due to the limitations of high-throughput experimental techniques, there is an urgent need to develop computational methods for predicting chromatin interactions. In this study, we propose a novel attention-based deep learning model, termed IChrom-Deep, to identify chromatin interactions using sequence features and genomic features. The experimental results based on the datasets of three cell lines demonstrate that the IChrom-Deep achieves satisfactory performance and is superior to the previous methods. We also investigate the effect of DNA sequence and associated features and genomic features on chromatin interactions, and highlight the applicable scenarios of some features, such as sequence conservation and distance. Moreover, we identify a few genomic features that are extremely important across different cell lines, and IChrom-Deep achieves comparable performance with only these significant genomic features versus using all genomic features. It is believed that IChrom-Deep can serve as a useful tool for future studies that seek to identify chromatin interactions.
Collapse
|
25
|
Chen D, Wang R, Jiang Y, Xing Z, Sheng Q, Liu X, Wang R, Xie H, Zhao L. Application of artificial neural network in daily prediction of bleeding in ICU patients treated with anti-thrombotic therapy. BMC Med Inform Decis Mak 2023; 23:171. [PMID: 37653495 PMCID: PMC10470146 DOI: 10.1186/s12911-023-02274-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 08/23/2023] [Indexed: 09/02/2023] Open
Abstract
OBJECTIVES Anti-thrombotic therapy is the basis of thrombosis prevention and treatment. Bleeding is the main adverse event of anti-thrombosis. Existing laboratory indicators cannot accurately reflect the real-time coagulation function. It is necessary to develop tools to dynamically evaluate the risk and benefits of anti-thrombosis to prescribe accurate anti-thrombotic therapy. METHODS The prediction model,daily prediction of bleeding risk in ICU patients treated with anti-thrombotic therapy, was built using deep learning algorithm recurrent neural networks, and the model results and performance were compared with clinicians. RESULTS There was no significant statistical discrepancy in the baseline. ROC curves of the four models in the validation and test set were drawn, respectively. One-layer GRU of the validation set had a larger AUC (0.9462; 95%CI, 0.9147-0.9778). Analysis was conducted in the test set, and the ROC curve showed the superiority of two layers LSTM over one-layer GRU, while the former AUC was 0.8391(95%CI, 0.7786-0.8997). One-layer GRU in the test set possessed a better specificity (sensitivity 0.5942; specificity 0.9300). The Fleiss' k of junior clinicians, senior clinicians, and machine learning classifiers is 0.0984, 0.4562, and 0.8012, respectively. CONCLUSIONS Recurrent neural networks were first applied for daily prediction of bleeding risk in ICU patients treated with anti-thrombotic therapy. Deep learning classifiers are more reliable and consistent than human classifiers. The machine learning classifier suggested strong reliability. The deep learning algorithm significantly outperformed human classifiers in prediction time.
Collapse
Affiliation(s)
- Daonan Chen
- Department of Critical Care Medicine, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, No. 650 New Songjiang Road, Songjiang, Shanghai, 201600, China
| | - Rui Wang
- Department of Critical Care Medicine, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, No. 650 New Songjiang Road, Songjiang, Shanghai, 201600, China
| | - Yihan Jiang
- Department of Critical Care Medicine, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, No. 650 New Songjiang Road, Songjiang, Shanghai, 201600, China
| | - Zijian Xing
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Qiuyang Sheng
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Xiaoqing Liu
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Ruilan Wang
- Department of Critical Care Medicine, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, No. 650 New Songjiang Road, Songjiang, Shanghai, 201600, China
| | - Hui Xie
- Department of Critical Care Medicine, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, No. 650 New Songjiang Road, Songjiang, Shanghai, 201600, China.
| | - Lina Zhao
- Department of Critical Care Medicine, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, No. 650 New Songjiang Road, Songjiang, Shanghai, 201600, China.
| |
Collapse
|
26
|
Yin W, Yang T, Wan G, Zhou X. Identification of image genetic biomarkers of Alzheimer's disease by orthogonal structured sparse canonical correlation analysis based on a diagnostic information fusion. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:16648-16662. [PMID: 37920027 DOI: 10.3934/mbe.2023741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
Alzheimer's disease (AD) is an irreversible neurodegenerative disease, and its incidence increases yearly. Because AD patients will have cognitive impairment and personality changes, it has caused a heavy burden on the family and society. Image genetics takes the structure and function of the brain as a phenotype and studies the influence of genetic variation on the structure and function of the brain. Based on the structural magnetic resonance imaging data and transcriptome data of AD and healthy control samples in the Alzheimer's Disease Neuroimaging Disease database, this paper proposed the use of an orthogonal structured sparse canonical correlation analysis for diagnostic information fusion algorithm. The algorithm added structural constraints to the region of interest (ROI) of the brain. Integrating the diagnostic information of samples can improve the correlation performance between samples. The results showed that the algorithm could extract the correlation between the two modal data and discovered the brain regions most affected by multiple risk genes and their biological significance. In addition, we also verified the diagnostic significance of risk ROIs and risk genes for AD. The code of the proposed algorithm is available at https://github.com/Wanguangyu111/OSSCCA-DIF.
Collapse
Affiliation(s)
- Wei Yin
- Department of Radiology, Xianning Central Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Hubei 437000, China
| | - Tao Yang
- Department of Radiology, Xianning Central Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Hubei 437000, China
| | - GuangYu Wan
- Department of Radiology, Xianning Central Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Hubei 437000, China
| | - Xiong Zhou
- Department of Radiology, Xianning Central Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Hubei 437000, China
| |
Collapse
|
27
|
Sun X, Zhao J, Guo C, Zhu X. Early Prediction of Epilepsy after Encephalitis in Childhood Based on EEG and Clinical Features. Emerg Med Int 2023; 2023:8862598. [PMID: 37485251 PMCID: PMC10359137 DOI: 10.1155/2023/8862598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/20/2023] [Accepted: 03/30/2023] [Indexed: 07/25/2023] Open
Abstract
Objective The present study was designed to establish and evaluate an early prediction model of epilepsy after encephalitis in childhood based on electroencephalogram (ECG) and clinical features. Methods 255 patients with encephalitis were randomly divided into training and verification sets and were divided into postencephalitic epilepsy (PE) and no postencephalitic epilepsy (no-PE) according to whether epilepsy occurred one year after discharge. Univariate and multivariate logistic regression analyses were used to screen the risk factors for PE. The identified risk factors were used to establish and verify a model. Results This study included 255 patients with encephalitis, including 209 in the non-PE group and 46 in the PE group. Univariate and multiple logistic regression analysis showed that hemoglobin (OR = 0.968, 95% CI = 0.951-0.958), epilepsy frequency (OR = 0.968, 95% CI = 0.951-0.958), and ECG slow wave/fast wave frequency (S/F) in the occipital region were independent influencing factors for PE (P < 0.05).The prediction model is based on the above factors: -0.031 × hemoglobin -2.113 × epilepsy frequency + 7.836 × occipital region S/F + 1.595. In the training set and the validation set, the area under the ROC curve (AUC) of the model for the diagnosis of PE was 0.835 and 0.712, respectively. Conclusion The peripheral blood hemoglobin, the number of epileptic seizures in the acute stage of encephalitis, and EEG slow wave/fast wave frequencies can be used as predictors of epilepsy after encephalitis.
Collapse
Affiliation(s)
- Xiaojuan Sun
- Department of Pediatrics, The Second Affiliated Hospital of Nantong University, Nantong First People's Hospital, Nantong, Jiangsu, China
| | - Jinhua Zhao
- Department of Pediatrics, The Second Affiliated Hospital of Nantong University, Nantong First People's Hospital, Nantong, Jiangsu, China
| | - Chunyun Guo
- Department of Pediatrics, The Second Affiliated Hospital of Nantong University, Nantong First People's Hospital, Nantong, Jiangsu, China
| | - Xiaoxiao Zhu
- Department of Pediatrics, The Second Affiliated Hospital of Nantong University, Nantong First People's Hospital, Nantong, Jiangsu, China
| |
Collapse
|
28
|
Maghrabi MMT, Swaminathan H, Kumar S, Bakr MH, Ali SM. Enhanced Performance of Artificial-Neural-Network-Based Equalization for Short-Haul Fiber-Optic Communications. SENSORS (BASEL, SWITZERLAND) 2023; 23:5952. [PMID: 37447800 DOI: 10.3390/s23135952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 07/15/2023]
Abstract
This work proposes an efficient and easy-to-implement single-layer artificial neural network (ANN)-based equalizer with improved compensation performance. The proposed equalizer is used for effectively mitigating the distortions induced in the short-haul fiber-optic communication systems based on intensity modulation and direct detection (IMDD). The compensation performance of the ANN equalizer is significantly improved, exploiting an introduced advanced training scheme. The efficiency and robustness of the proposed ANN equalizer are illustrated through 10- and 28-Gbaud short-reach optical-fiber communication systems. Compared to the efficient but computationally expensive maximum likelihood sequence estimator (MLSE), the proposed ANN equalizer not only significantly reduces its computational equalization cost and storage memory requirements, but it also outperforms its bit error rate performance.
Collapse
Affiliation(s)
- Mahmoud M T Maghrabi
- Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, Canada
- Department of Engineering Mathematics and Physics, Cairo University, Giza 12613, Egypt
| | - Hariharan Swaminathan
- Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Shiva Kumar
- Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Mohamed H Bakr
- Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Shirook M Ali
- School of Mechanical and Electrical Engineering Technology, Sheridan College, Brampton, ON L6Y 5H9, Canada
| |
Collapse
|
29
|
Zhou H, Tan W, Shi S. DeepGpgs: a novel deep learning framework for predicting arginine methylation sites combined with Gaussian prior and gated self-attention mechanism. Brief Bioinform 2023; 24:7000314. [PMID: 36694944 DOI: 10.1093/bib/bbad018] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/26/2022] [Accepted: 01/04/2023] [Indexed: 01/26/2023] Open
Abstract
Protein arginine methylation is an important posttranslational modification (PTM) associated with protein functional diversity and pathological conditions including cancer. Identification of methylation binding sites facilitates a better understanding of the molecular function of proteins. Recent developments in the field of deep neural networks have led to a proliferation of deep learning-based methylation identification studies because of their fast and accurate prediction. In this paper, we propose DeepGpgs, an advanced deep learning model incorporating Gaussian prior and gated attention mechanism. We introduce a residual network channel to extract the evolutionary information of proteins. Then we combine the adaptive embedding with bidirectional long short-term memory networks to form a context-shared encoder layer. A gated multi-head attention mechanism is followed to obtain the global information about the sequence. A Gaussian prior is injected into the sequence to assist in predicting PTMs. We also propose a weighted joint loss function to alleviate the false negative problem. We empirically show that DeepGpgs improves Matthews correlation coefficient by 6.3% on the arginine methylation independent test set compared with the existing state-of-the-art methylation site prediction methods. Furthermore, DeepGpgs has good robustness in phosphorylation site prediction of SARS-CoV-2, which indicates that DeepGpgs has good transferability and the potential to be extended to other modification sites prediction. The open-source code and data of the DeepGpgs can be obtained from https://github.com/saizhou1/DeepGpgs.
Collapse
Affiliation(s)
- Haiwei Zhou
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Wenxi Tan
- School of Mathematical Sciences, Fudan University, Shanghai 200433, China
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
| |
Collapse
|
30
|
Single-cell and bulk RNA sequencing identifies T cell marker genes score to predict the prognosis of pancreatic ductal adenocarcinoma. Sci Rep 2023; 13:3684. [PMID: 36878969 PMCID: PMC9988929 DOI: 10.1038/s41598-023-30972-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 03/03/2023] [Indexed: 03/08/2023] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is one of the lethal malignancies, with limited biomarkers identified to predict its prognosis and treatment response of immune checkpoint blockade (ICB). This study aimed to explore the predictive ability of T cell marker genes score (TMGS) to predict their overall survival (OS) and treatment response to ICB by integrating single-cell RNA sequencing (scRNA-seq) and bulk RNA-seq data. Multi-omics data of PDAC were applied in this study. The uniform manifold approximation and projection (UMAP) was utilized for dimensionality reduction and cluster identification. The non-negative matrix factorization (NMF) algorithm was applied to molecular subtypes clustering. The Least Absolute Shrinkage and Selection Operator (LASSO)-Cox regression was adopted for TMGS construction. The prognosis, biological characteristics, mutation profile, and immune function status between different groups were compared. Two molecular subtypes were identified via NMF: proliferative PDAC (C1) and immune PDAC (C2). Distinct prognoses and biological characteristics were observed between them. TMGS was developed based on 10 T cell marker genes (TMGs) through LASSO-Cox regression. TMGS is an independent prognostic factor of OS in PDAC. Enrichment analysis indicated that cell cycle and cell proliferation-related pathways are significantly enriched in the high-TMGS group. Besides, high-TMGS is related to more frequent KRAS, TP53, and CDKN2A germline mutations than the low-TMGS group. Furthermore, high-TMGS is significantly associated with attenuated antitumor immunity and reduced immune cell infiltration compared to the low-TMGS group. However, high TMGS is correlated to higher tumor mutation burden (TMB), a low expression level of inhibitory immune checkpoint molecules, and a low immune dysfunction score, thus having a higher ICB response rate. On the contrary, low TMGS is related to a favorable response rate to chemotherapeutic agents and targeted therapy. By combining scRNA-seq and bulk RNA-seq data, we identified a novel biomarker, TMGS, which has remarkable performance in predicting the prognosis and guiding the treatment pattern for patients with PDAC.
Collapse
|
31
|
Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease. Biomedicines 2023; 11:biomedicines11020581. [PMID: 36831118 PMCID: PMC9953600 DOI: 10.3390/biomedicines11020581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/18/2023] Open
Abstract
There has been a sharp increase in liver disease globally, and many people are dying without even knowing that they have it. As a result of its limited symptoms, it is extremely difficult to detect liver disease until the very last stage. In the event of early detection, patients can begin treatment earlier, thereby saving their lives. It has become increasingly popular to use ensemble learning algorithms since they perform better than traditional machine learning algorithms. In this context, this paper proposes a novel architecture based on ensemble learning and enhanced preprocessing to predict liver disease using the Indian Liver Patient Dataset (ILPD). Six ensemble learning algorithms are applied to the ILPD, and their results are compared to those obtained with existing studies. The proposed model uses several data preprocessing methods, such as data balancing, feature scaling, and feature selection, to improve the accuracy with appropriate imputations. Multivariate imputation is applied to fill in missing values. On skewed columns, log1p transformation was applied, along with standardization, min-max scaling, maximum absolute scaling, and robust scaling techniques. The selection of features is carried out based on several methods including univariate selection, feature importance, and correlation matrix. These enhanced preprocessed data are trained on Gradient boosting, XGBoost, Bagging, Random Forest, Extra Tree, and Stacking ensemble learning algorithms. The results of the six models were compared with each other, as well as with the models used in other research works. The proposed model using extra tree classifier and random forest, outperformed the other methods with the highest testing accuracy of 91.82% and 86.06%, respectively, portraying our method as a real-world solution for detecting liver disease.
Collapse
|
32
|
Pavón-Pulido N, Blasco-García JD, López-Riquelme JA, Feliu-Batlle J, Oterino-Bono R, Herrero MT. JUNO Project: Deployment and Validation of a Low-Cost Cloud-Based Robotic Platform for Reliable Smart Navigation and Natural Interaction with Humans in an Elderly Institution. SENSORS (BASEL, SWITZERLAND) 2023; 23:483. [PMID: 36617079 PMCID: PMC9824260 DOI: 10.3390/s23010483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 12/23/2022] [Accepted: 12/29/2022] [Indexed: 06/17/2023]
Abstract
This paper describes the main results of the JUNO project, a proof of concept developed in the Region of Murcia in Spain, where a smart assistant robot with capabilities for smart navigation and natural human interaction has been developed and deployed, and it is being validated in an elderly institution with real elderly users. The robot is focused on helping people carry out cognitive stimulation exercises and other entertainment activities since it can detect and recognize people, safely navigate through the residence, and acquire information about attention while users are doing the mentioned exercises. All the information could be shared through the Cloud, if needed, and health professionals, caregivers and relatives could access such information by considering the highest standards of privacy required in these environments. Several tests have been performed to validate the system, which combines classic techniques and new Deep Learning-based methods to carry out the requested tasks, including semantic navigation, face detection and recognition, speech to text and text to speech translation, and natural language processing, working both in a local and Cloud-based environment, obtaining an economically affordable system. The paper also discusses the limitations of the platform and proposes several solutions to the detected drawbacks in this kind of complex environment, where the fragility of users should be also considered.
Collapse
Affiliation(s)
- Nieves Pavón-Pulido
- Automation, Electrical Engineering and Electronic Technology Department, Industrial Engineering Technical School, Technical University of Cartagena, 30202 Cartagena, Spain
| | - Jesús Damián Blasco-García
- Clinical and Experimental Neuroscience (NiCE), Institute for Aging Research, Biomedical Institute for Bio-Health Research of Murcia (IMIB-Arrixaca), School of Medicine, University of Murcia, Campus Mare Nostrum, 30120 Murcia, Spain
| | - Juan Antonio López-Riquelme
- Automation, Electrical Engineering and Electronic Technology Department, Industrial Engineering Technical School, Technical University of Cartagena, 30202 Cartagena, Spain
| | - Jorge Feliu-Batlle
- Automation, Electrical Engineering and Electronic Technology Department, Industrial Engineering Technical School, Technical University of Cartagena, 30202 Cartagena, Spain
| | - Roberto Oterino-Bono
- Automation, Electrical Engineering and Electronic Technology Department, Industrial Engineering Technical School, Technical University of Cartagena, 30202 Cartagena, Spain
| | - María Trinidad Herrero
- Clinical and Experimental Neuroscience (NiCE), Institute for Aging Research, Biomedical Institute for Bio-Health Research of Murcia (IMIB-Arrixaca), School of Medicine, University of Murcia, Campus Mare Nostrum, 30120 Murcia, Spain
| |
Collapse
|