1
|
Khandelwal M, Kumar Rout R. DeepPRMS: advanced deep learning model to predict protein arginine methylation sites. Brief Funct Genomics 2024:elae001. [PMID: 38267081 DOI: 10.1093/bfgp/elae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 11/17/2023] [Accepted: 01/03/2024] [Indexed: 01/26/2024] Open
Abstract
Protein methylation is a form of post-translational modifications of protein, which is crucial for various cellular processes, including transcription activity and DNA repair. Correctly predicting protein methylation sites is fundamental for research and drug discovery. Some experimental techniques, such as methyl-specific antibodies, chromatin immune precipitation and mass spectrometry, exist for predicting protein methylation sites, but these techniques are time-consuming and costly. The ability to predict methylation sites using in silico techniques may help researchers identify potential candidate sites for future examination and make it easier to carry out site-specific investigations and downstream characterizations. In this research, we proposed a novel deep learning-based predictor, named DeepPRMS, to identify protein methylation sites in primary sequences. The DeepPRMS utilizes the gated recurrent unit (GRU) and convolutional neural network (CNN) algorithms to extract the sequential and spatial information from the primary sequences. GRU is used to extract sequential information, while CNN is used for spatial information. We combined the latent representation of GRU and CNN models to have a better interaction among them. Based on the independent test data set, DeepPRMS obtained an accuracy of 85.32%, a specificity of 84.94%, Matthew's correlation coefficient of 0.71 and a sensitivity of 85.80%. The results indicate that DeepPRMS can predict protein methylation sites with high accuracy and outperform the state-of-the-art models. The DeepPRMS is expected to effectively guide future research experiments for identifying potential methylated protein sites. The web server is available at http://deepprms.nitsri.ac.in/.
Collapse
Affiliation(s)
- Monika Khandelwal
- Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal, Srinagar 190006, Jammu and Kashmir, India
| | - Ranjeet Kumar Rout
- Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal, Srinagar 190006, Jammu and Kashmir, India
| |
Collapse
|
2
|
Khanna NN, Singh M, Maindarkar M, Kumar A, Johri AM, Mentella L, Laird JR, Paraskevas KI, Ruzsa Z, Singh N, Kalra MK, Fernandes JFE, Chaturvedi S, Nicolaides A, Rathore V, Singh I, Teji JS, Al-Maini M, Isenovic ER, Viswanathan V, Khanna P, Fouda MM, Saba L, Suri JS. Polygenic Risk Score for Cardiovascular Diseases in Artificial Intelligence Paradigm: A Review. J Korean Med Sci 2023; 38:e395. [PMID: 38013648 PMCID: PMC10681845 DOI: 10.3346/jkms.2023.38.e395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 10/15/2023] [Indexed: 11/29/2023] Open
Abstract
Cardiovascular disease (CVD) related mortality and morbidity heavily strain society. The relationship between external risk factors and our genetics have not been well established. It is widely acknowledged that environmental influence and individual behaviours play a significant role in CVD vulnerability, leading to the development of polygenic risk scores (PRS). We employed the PRISMA search method to locate pertinent research and literature to extensively review artificial intelligence (AI)-based PRS models for CVD risk prediction. Furthermore, we analyzed and compared conventional vs. AI-based solutions for PRS. We summarized the recent advances in our understanding of the use of AI-based PRS for risk prediction of CVD. Our study proposes three hypotheses: i) Multiple genetic variations and risk factors can be incorporated into AI-based PRS to improve the accuracy of CVD risk predicting. ii) AI-based PRS for CVD circumvents the drawbacks of conventional PRS calculators by incorporating a larger variety of genetic and non-genetic components, allowing for more precise and individualised risk estimations. iii) Using AI approaches, it is possible to significantly reduce the dimensionality of huge genomic datasets, resulting in more accurate and effective disease risk prediction models. Our study highlighted that the AI-PRS model outperformed traditional PRS calculators in predicting CVD risk. Furthermore, using AI-based methods to calculate PRS may increase the precision of risk predictions for CVD and have significant ramifications for individualized prevention and treatment plans.
Collapse
Affiliation(s)
- Narendra N Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi, India
- Asia Pacific Vascular Society, New Delhi, India
| | - Manasvi Singh
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, USA
- Bennett University, Greater Noida, India
| | - Mahesh Maindarkar
- Asia Pacific Vascular Society, New Delhi, India
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, USA
- School of Bioengineering Sciences and Research, Maharashtra Institute of Technology's Art, Design and Technology University, Pune, India
| | | | - Amer M Johri
- Department of Medicine, Division of Cardiology, Queen's University, Kingston, Canada
| | - Laura Mentella
- Department of Medicine, Division of Cardiology, University of Toronto, Toronto, Canada
| | - John R Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St. Helena, CA, USA
| | | | - Zoltan Ruzsa
- Invasive Cardiology Division, University of Szeged, Szeged, Hungary
| | - Narpinder Singh
- Department of Food Science and Technology, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | | | | | - Seemant Chaturvedi
- Department of Neurology & Stroke Program, University of Maryland, Baltimore, MD, USA
| | - Andrew Nicolaides
- Vascular Screening and Diagnostic Centre and University of Nicosia Medical School, Cyprus
| | - Vijay Rathore
- Nephrology Department, Kaiser Permanente, Sacramento, CA, USA
| | - Inder Singh
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, USA
| | - Jagjit S Teji
- Ann and Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | - Mostafa Al-Maini
- Allergy, Clinical Immunology and Rheumatology Institute, Toronto, ON, Canada
| | - Esma R Isenovic
- Department of Radiobiology and Molecular Genetics, National Institute of The Republic of Serbia, University of Belgrade, Beograd, Serbia
| | | | - Puneet Khanna
- Department of Anaesthesiology, AIIMS, New Delhi, India
| | - Mostafa M Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID, USA
| | - Luca Saba
- Department of Radiology, Azienda Ospedaliero Universitaria, Cagliari, Italy
| | - Jasjit S Suri
- Asia Pacific Vascular Society, New Delhi, India
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, USA
- Department of Computer Engineering, Graphic Era Deemed to be University, Dehradun, India.
| |
Collapse
|
3
|
Khandelwal M, Rout RK. PRMxAI: protein arginine methylation sites prediction based on amino acid spatial distribution using explainable artificial intelligence. BMC Bioinformatics 2023; 24:376. [PMID: 37794362 PMCID: PMC10548713 DOI: 10.1186/s12859-023-05491-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 09/21/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Protein methylation, a post-translational modification, is crucial in regulating various cellular functions. Arginine methylation is required to understand crucial biochemical activities and biological functions, like gene regulation, signal transduction, etc. However, some experimental methods, including Chip-Chip, mass spectrometry, and methylation-specific antibodies, exist for the prediction of methylated proteins. These experimental methods are expensive and tedious. As a result, computational methods based on machine learning play an efficient role in predicting arginine methylation sites. RESULTS In this research, a novel method called PRMxAI has been proposed to predict arginine methylation sites. The proposed PRMxAI extract sequence-based features, such as dipeptide composition, physicochemical properties, amino acid composition, and information theory-based features (Arimoto, Havrda-Charvat, Renyi, and Shannon entropy), to represent the protein sequences into numerical format. Various machine learning algorithms are implemented to select the better classifier, such as Decision trees, Naive Bayes, Random Forest, Support vector machines, and K-nearest neighbors. The random forest algorithm is selected as the underlying classifier for the PRMxAI model. The performance of PRMxAI is evaluated by employing 10-fold cross-validation, and it yields 87.17% and 90.40% accuracy on mono-methylarginine and di-methylarginine data sets, respectively. This research also examines the impact of various features on both data sets using explainable artificial intelligence. CONCLUSIONS The proposed PRMxAI shows the effectiveness of the features for predicting arginine methylation sites. Additionally, the SHapley Additive exPlanation method is used to interpret the predictive mechanism of the proposed model. The results indicate that the proposed PRMxAI model outperforms other state-of-the-art predictors.
Collapse
Affiliation(s)
- Monika Khandelwal
- Computer Science and Engineering Department, National Institute of Technology Srinagar, Hazratbal, Srinagar, J&K 190006 India
| | - Ranjeet Kumar Rout
- Computer Science and Engineering Department, National Institute of Technology Srinagar, Hazratbal, Srinagar, J&K 190006 India
| |
Collapse
|
4
|
Rout RK, Umer S, Khandelwal M, Pati S, Mallik S, Balabantaray BK, Qin H. Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification. Front Genet 2023; 14:1154120. [PMID: 37152988 PMCID: PMC10156977 DOI: 10.3389/fgene.2023.1154120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 04/04/2023] [Indexed: 05/09/2023] Open
Abstract
Introduction: Essential genes are essential for the survival of various species. These genes are a family linked to critical cellular activities for species survival. These genes are coded for proteins that regulate central metabolism, gene translation, deoxyribonucleic acid replication, and fundamental cellular structure and facilitate intracellular and extracellular transport. Essential genes preserve crucial genomics information that may hold the key to a detailed knowledge of life and evolution. Essential gene studies have long been regarded as a vital topic in computational biology due to their relevance. An essential gene is composed of adenine, guanine, cytosine, and thymine and its various combinations. Methods: This paper presents a novel method of extracting information on the stationary patterns of nucleotides such as adenine, guanine, cytosine, and thymine in each gene. For this purpose, some co-occurrence matrices are derived that provide the statistical distribution of stationary patterns of nucleotides in the genes, which is helpful in establishing the relationship between the nucleotides. For extracting discriminant features from each co-occurrence matrix, energy, entropy, homogeneity, contrast, and dissimilarity features are computed, which are extracted from all co-occurrence matrices and then concatenated to form a feature vector representing each essential gene. Finally, supervised machine learning algorithms are applied for essential gene classification based on the extracted fixed-dimensional feature vectors. Results: For comparison, some existing state-of-the-art feature representation techniques such as Shannon entropy (SE), Hurst exponent (HE), fractal dimension (FD), and their combinations have been utilized. Discussion: An extensive experiment has been performed for classifying the essential genes of five species that show the robustness and effectiveness of the proposed methodology.
Collapse
Affiliation(s)
- Ranjeet Kumar Rout
- National Institute of Technology Srinagar, Hazratbal, Jammu and Kashmir, India
| | - Saiyed Umer
- Aliah University, Kolkata, West Bengal, India
| | - Monika Khandelwal
- National Institute of Technology Srinagar, Hazratbal, Jammu and Kashmir, India
| | - Smitarani Pati
- Dr. B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India
| | - Saurav Mallik
- Harvard T H Chan School of Public Health, Boston, United States
- Department of Pharmacology and Toxicology, University of Arizona, Tucson, AZ, United States
- *Correspondence: Saurav Mallik, , ; Hong Qin,
| | | | - Hong Qin
- Department of Computer Science and Engineering, University of Tennessee at Chattanooga, Chattanooga, TN, United States
- *Correspondence: Saurav Mallik, , ; Hong Qin,
| |
Collapse
|