1
|
Guo C, Wang X, Ren H. Databases and computational methods for the identification of piRNA-related molecules: A survey. Comput Struct Biotechnol J 2024; 23:813-833. [PMID: 38328006 PMCID: PMC10847878 DOI: 10.1016/j.csbj.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/31/2023] [Accepted: 01/15/2024] [Indexed: 02/09/2024] Open
Abstract
Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs (ncRNAs) that plays important roles in many biological processes and major cancer diagnosis and treatment, thus becoming a hot research topic. This study aims to provide an in-depth review of computational piRNA-related research, including databases and computational models. Herein, we perform literature analysis and use comparative evaluation methods to summarize and analyze three aspects of computational piRNA-related research: (i) computational models for piRNA-related molecular identification tasks, (ii) computational models for piRNA-disease association prediction tasks, and (iii) computational resources and evaluation metrics for these tasks. This study shows that computational piRNA-related research has significantly progressed, exhibiting promising performance in recent years, whereas they also suffer from the emerging challenges of inconsistent naming systems and the lack of data. Different from other reviews on piRNA-related identification tasks that focus on the organization of datasets and computational methods, we pay more attention to the analysis of computational models, algorithms, and performances that aim to provide valuable references for computational piRNA-related identification tasks. This study will benefit the theoretical development and practical application of piRNAs by better understanding computational models and resources to investigate the biological functions and clinical implications of piRNA.
Collapse
Affiliation(s)
- Chang Guo
- Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510420, China
| | - Xiaoli Wang
- Institute of Reproductive Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Han Ren
- Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510420, China
- Laboratory of Language and Artificial Intelligence, Guangdong University of Foreign Studies, Guangzhou 510420, China
| |
Collapse
|
2
|
Chaudhary U, Banerjee S. Decoding the Non-coding: Tools and Databases Unveiling the Hidden World of "Junk" RNAs for Innovative Therapeutic Exploration. ACS Pharmacol Transl Sci 2024; 7:1901-1915. [PMID: 39022352 PMCID: PMC11249652 DOI: 10.1021/acsptsci.3c00388] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 05/15/2024] [Accepted: 05/27/2024] [Indexed: 07/20/2024]
Abstract
Non-coding RNAs are pivotal regulators of gene and protein expression, exerting crucial influences on diverse biological processes. Their dysregulation is frequently implicated in the onset and progression of diseases, notably cancer. A profound comprehension of the intricate mechanisms governing ncRNAs is imperative for devising innovative therapeutic interventions against these debilitating conditions. Significantly, nearly 80% of our genome comprises ncRNAs, underscoring their centrality in cellular processes. The elucidation of ncRNA functions is pivotal for grasping the complexities of gene regulation and its implications for human health. Modern genome sequencing techniques yield vast datasets, stored in specialized databases. To harness this wealth of information and to understand the crosstalk of non-coding RNAs, knowledge of available databases is required, and many new sophisticated computational tools have emerged. These tools play a pivotal role in the identification, prediction, and annotation of ncRNAs, thereby facilitating their experimental validation. This Review succinctly outlines the current understanding of ncRNAs, emphasizing their involvement in disease development. It also highlights the databases and tools instrumental in classifying, annotating, and evaluating ncRNAs. By extracting meaningful biological insights from seemingly "junk" data, these tools empower scientists to unravel the intricate roles of ncRNAs in shaping human health.
Collapse
Affiliation(s)
- Uma Chaudhary
- Department of Biotechnology,
School of Biosciences and Technology, Vellore
Institute of Technology (VIT), Vellore, Tamil Nadu 632014, India
| | - Satarupa Banerjee
- Department of Biotechnology,
School of Biosciences and Technology, Vellore
Institute of Technology (VIT), Vellore, Tamil Nadu 632014, India
| |
Collapse
|
3
|
Adnan A, Hongya W, Ali F, Khalid M, Alghushairy O, Alsini R. A bi-layer model for identification of piwiRNA using deep neural learning. J Biomol Struct Dyn 2024; 42:5725-5733. [PMID: 37608578 DOI: 10.1080/07391102.2023.2243523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/15/2023] [Indexed: 08/24/2023]
Abstract
piwiRNA is a kind of non-coding RNA (ncRNA) that cannot be translated into proteins. It helps in understanding the study of gametes generation and regulation of gene expression over both transcriptional and post-transcriptional levels. piwiRNA has the function of instructing deadenylation, animal fertility, silencing transposons, fighting viruses, and regulating endogenous genes. Due to the great significance of piwiRNA, prediction of piwiRNA is essential for crucial cellular functions. Several predictors were established for prediction of piwiRNA. However, improving the prediction of piwiRNA is highly desirable. In the current study, we developed a more promising predictor named, BLP-piwiRNA. The features are explored by reverse complement k-mer, gapped-k-mer composition, and k-mer composition. The feature set of all descriptors is fused and the best features are selected by cascade and relief feature selection strategies. The best feature sets are provided to random forest (RF), deep neural network (DNN), and support vector machine (SVM). The models validation are examined by 10-fold test. DNN with optimal features of Cascade feature selection approach secured the highest prediction results. The results illustrate that BLP-piwiRNA effectively outperforms the existing studies. The proposed approach would be beneficial for both research community and drug development industry. BLP-piwiRNA would serve as novel biomarkers and therapeutic targets for tumor diagnostics and treatment.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Adnan Adnan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Wang Hongya
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology, Peshawar, Pakistan
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
4
|
Khan S, Uddin I, Khan M, Iqbal N, Alshanbari HM, Ahmad B, Khan DM. Sequence based model using deep neural network and hybrid features for identification of 5-hydroxymethylcytosine modification. Sci Rep 2024; 14:9116. [PMID: 38643305 DOI: 10.1038/s41598-024-59777-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/15/2024] [Indexed: 04/22/2024] Open
Abstract
RNA modifications are pivotal in the development of newly synthesized structures, showcasing a vast array of alterations across various RNA classes. Among these, 5-hydroxymethylcytosine (5HMC) stands out, playing a crucial role in gene regulation and epigenetic changes, yet its detection through conventional methods proves cumbersome and costly. To address this, we propose Deep5HMC, a robust learning model leveraging machine learning algorithms and discriminative feature extraction techniques for accurate 5HMC sample identification. Our approach integrates seven feature extraction methods and various machine learning algorithms, including Random Forest, Naive Bayes, Decision Tree, and Support Vector Machine. Through K-fold cross-validation, our model achieved a notable 84.07% accuracy rate, surpassing previous models by 7.59%, signifying its potential in early cancer and cardiovascular disease diagnosis. This study underscores the promise of Deep5HMC in offering insights for improved medical assessment and treatment protocols, marking a significant advancement in RNA modification analysis.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Islam Uddin
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mukhtaj Khan
- Department of Information Technology, The University of Haripur, Haripur, Pakistan
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Huda M Alshanbari
- Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Bakhtiyar Ahmad
- Higher Education Department Afghanistan, Kabul, Afghanistan.
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| |
Collapse
|
5
|
Ahmed SH, Bose DB, Khandoker R, Rahman MS. StackDPP: a stacking ensemble based DNA-binding protein prediction model. BMC Bioinformatics 2024; 25:111. [PMID: 38486135 PMCID: PMC10941422 DOI: 10.1186/s12859-024-05714-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 02/20/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND DNA-binding proteins (DNA-BPs) are the proteins that bind and interact with DNA. DNA-BPs regulate and affect numerous biological processes, such as, transcription and DNA replication, repair, and organization of the chromosomal DNA. Very few proteins, however, are DNA-binding in nature. Therefore, it is necessary to develop an efficient predictor for identifying DNA-BPs. RESULT In this work, we have proposed new benchmark datasets for the DNA-binding protein prediction problem. We discovered several quality concerns with the widely used benchmark datasets, PDB1075 (for training) and PDB186 (for independent testing), which necessitated the preparation of new benchmark datasets. Our proposed datasets UNIPROT1424 and UNIPROT356 can be used for model training and independent testing respectively. We have retrained selected state-of-the-art DNA-BP predictors in the new dataset and reported their performance results. We also trained a novel predictor using the new benchmark dataset. We extracted features from various feature categories, then used a Random Forest classifier and Recursive Feature Elimination with Cross-validation (RFECV) to select the optimal set of 452 features. We then proposed a stacking ensemble architecture as our final prediction model. Named Stacking Ensemble Model for DNA-binding Protein Prediction, or StackDPP in short, our model achieved 0.92, 0.92 and 0.93 accuracy in 10-fold cross-validation, jackknife and independent testing respectively. CONCLUSION StackDPP has performed very well in cross-validation testing and has outperformed all the state-of-the-art prediction models in independent testing. Its performance scores in cross-validation testing generalized very well in the independent test set. The source code of the model is publicly available at https://github.com/HasibAhmed1624/StackDPP . Therefore, we expect this generalized model can be adopted by researchers and practitioners to identify novel DNA-binding proteins.
Collapse
Affiliation(s)
- Sheikh Hasib Ahmed
- Department of CSE, BUET, ECE Building, West Palashi, Dhaka, 1000, Bangladesh
| | | | - Rafi Khandoker
- Department of CSE, BUET, ECE Building, West Palashi, Dhaka, 1000, Bangladesh
| | - M Saifur Rahman
- Department of CSE, BUET, ECE Building, West Palashi, Dhaka, 1000, Bangladesh.
| |
Collapse
|
6
|
Shomali A, Vafaei Sadi MS, Bakhtiarizadeh MR, Aliniaeifard S, Trewavas A, Calvo P. Identification of intelligence-related proteins through a robust two-layer predictor. Commun Integr Biol 2022; 15:253-264. [DOI: 10.1080/19420889.2022.2143101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Affiliation(s)
- Aida Shomali
- Department of Horticulture, College of Aburaihan, University of Tehran, Tehran, Iran
| | | | | | - Sasan Aliniaeifard
- Department of Horticulture, College of Aburaihan, University of Tehran, Tehran, Iran
| | - Anthony Trewavas
- School of Biological Sciences, Institute of Molecular Plant Science, University of Edinburgh, UK
| | - Paco Calvo
- Minimal Intelligence Lab, University of Murcia, Spain
| |
Collapse
|
7
|
Gu X, Ding Y, Xiao P, He T. A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins. Front Genet 2022; 13:935717. [PMID: 36506312 PMCID: PMC9727185 DOI: 10.3389/fgene.2022.935717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/02/2022] [Indexed: 11/24/2022] Open
Abstract
There is a great deal of importance to SNARE proteins, and their absence from function can lead to a variety of diseases. The SNARE protein is known as a membrane fusion protein, and it is crucial for mediating vesicle fusion. The identification of SNARE proteins must therefore be conducted with an accurate method. Through extensive experiments, we have developed a model based on graph-regularized k-local hyperplane distance nearest neighbor model (GHKNN) binary classification. In this, the model uses the physicochemical property extraction method to extract protein sequence features and the SMOTE method to upsample protein sequence features. The combination achieves the most accurate performance for identifying all protein sequences. Finally, we compare the model based on GHKNN binary classification with other classifiers and measure them using four different metrics: SN, SP, ACC, and MCC. In experiments, the model performs significantly better than other classifiers.
Collapse
Affiliation(s)
- Xingyue Gu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pengfeng Xiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
8
|
Chen B, Shi Y, Li J, Zhai J, Liu L, Liu W, Hu L, Zhao Y. Tissue Recognition Based on Electrical Impedance Classified by Support Vector Machine in Spinal Operation Area. Orthop Surg 2022; 14:2276-2285. [PMID: 35913262 PMCID: PMC9483044 DOI: 10.1111/os.13406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 06/24/2022] [Accepted: 06/25/2022] [Indexed: 11/26/2022] Open
Abstract
OBJECTIVE One of the major difficulties in spinal surgery is the injury of important tissues caused by tissue misclassification, which is the source of surgical complications. Accurate recognization of the tissues is the key to increase safety and effect as well as to reduce the complications of spinal surgery. The study aimed at tissue recognition in the spinal operation area based on electrical impedance and the boundaries of electrical impedance between cortical bone, cancellous bone, spinal cord, muscle, and nucleus pulposus. METHODS Two female white swines with body weight of 40 kg were used to expose cortical bone, cancellous bone, spinal cord, muscle, and nucleus pulposus under general anesthesia and aseptic conditions. The electrical impedance of these tissues at 12 frequencies (in the range of 10-100 kHz) was measured by electrochemical analyzer with a specially designed probe, at 22.0-25.0°C and 50%-60% humidity. Two types of tissue recognition models - one combines principal component analysis (PCA) and support vector machine (SVM) and the other combines combines SVM and ensemble learning - were constructed, and the boundaries of electrical impedance of the five tissues at 12 frequencies of current were figured out. Linear correlation, two-way ANOVA, and paired T-test were conducted to analyze the relationship between the electrical impedance of different tissues at different frequencies. RESULTS The results suggest that the differences of electrical impedance mainly came from tissue type (p < 0.0001), the electrical impedance of five kinds of tissue was statistically different from each other (p < 0.0001). The tissue recognition accuracy of the algorithm based on principal component analysis and support vector machine ranged from 83%-100%, and the overall accuracy was 95.83%. The classification accuracy of the algorithm based on support vector machine and ensemble learning was 100%, and the boundaries of electrical impedance of five tissues at various frequencies were calculated. CONCLUSION The electrical impedance of cortical bone, cancellous bone, spinal cord, muscle, and nucleus pulposus had significant differences in 10-100 kHz frequency. The application of support vector machine realized the accurate tissue recognition in the spinal operation area based on electrical impedance, which is expected to be translated and applied to tissue recognition during spinal surgery.
Collapse
Affiliation(s)
- Bingrong Chen
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yongwang Shi
- MD Program, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jiahao Li
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiliang Zhai
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Liang Liu
- China Astronaut Research and Training Center, Beijing, China
| | - Wenyong Liu
- School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Lei Hu
- School of Mechanical Engineering and Automation, Beihang University, Beijing, China
| | - Yu Zhao
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
9
|
Zhuang Y, Liu X, Zhong Y, Wu L. A Deep Ensemble Predictor for Identifying Anti-Hypertensive Peptides Using Pretrained Protein Embedding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1986-1992. [PMID: 33760739 DOI: 10.1109/tcbb.2021.3068381] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Hypertension (HT), or high blood pressure is one of the most common and main causes in cardiovascular diseases, which is also related to a series of detrimental diseases in humans. Deficiencies in effective treatment in HT are often associated with a series of diseases including multi-infarct dementia, amputation, and renal failure. Therefore, identifying anti-hypertension peptides has the vital realistic significance. Although many bioactive peptides have been developed to reduce blood pressure, they are time-consuming and laborious. In views of the obstacles of the intrinsic methods in antihypertensive peptide (AHTP) classification, computational methods are suggested as a supplement to identify AHTPs. In this study, we develop a comprehensive feature representation algorithm based on pretrained model and convolutional neural network and apply the deep ensemble model to construct the prediction model. The new predictor is used to identify AHTPs in benchmark and independent datasets. It has been shown in the independent test set that the performance is better than the recent methods. Comparative results indicate that our model can shed some light on hypertension therapy and gains more insights of classifying AHTPs. The implements and codes can be found in https://github.com/yuanying566/AHPred-DE.
Collapse
|
10
|
Zhang T, Chen L, Li R, Liu N, Huang X, Wong G. PIWI-interacting RNAs in human diseases: databases and computational models. Brief Bioinform 2022; 23:6603448. [PMID: 35667080 DOI: 10.1093/bib/bbac217] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/24/2022] [Accepted: 05/09/2022] [Indexed: 11/12/2022] Open
Abstract
PIWI-interacting RNAs (piRNAs) are short 21-35 nucleotide molecules that comprise the largest class of non-coding RNAs and found in a large diversity of species including yeast, worms, flies, plants and mammals including humans. The most well-understood function of piRNAs is to monitor and protect the genome from transposons particularly in germline cells. Recent data suggest that piRNAs may have additional functions in somatic cells although they are expressed there in far lower abundance. Compared with microRNAs (miRNAs), piRNAs have more limited bioinformatics resources available. This review collates 39 piRNA specific and non-specific databases and bioinformatics resources, describes and compares their utility and attributes and provides an overview of their place in the field. In addition, we review 33 computational models based upon function: piRNA prediction, transposon element and mRNA-related piRNA prediction, cluster prediction, signature detection, target prediction and disease association. Based on the collection of databases and computational models, we identify trends and potential gaps in tool development. We further analyze the breadth and depth of piRNA data available in public sources, their contribution to specific human diseases, particularly in cancer and neurodegenerative conditions, and highlight a few specific piRNAs that appear to be associated with these diseases. This briefing presents the most recent and comprehensive mapping of piRNA bioinformatics resources including databases, models and tools for disease associations to date. Such a mapping should facilitate and stimulate further research on piRNAs.
Collapse
Affiliation(s)
- Tianjiao Zhang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Liang Chen
- Department of Computer Science, School of Engineering, Shantou University, Shantou, China
| | - Rongzhen Li
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Ning Liu
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Xiaobing Huang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| |
Collapse
|
11
|
Ali SD, Alam W, Tayara H, Chong KT. Identification of Functional piRNAs Using a Convolutional Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1661-1669. [PMID: 33119510 DOI: 10.1109/tcbb.2020.3034313] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Piwi-interacting RNAs (piRNAs) are a distinct sub-class of small non-coding RNAs that are mainly responsible for germline stem cell maintenance, gene stability, and maintaining genome integrity by repression of transposable elements. piRNAs are also expressed aberrantly and associated with various kinds of cancers. To identify piRNAs and their role in guiding target mRNA deadenylation, the currently available computational methods require urgent improvements in performance. To facilitate this, we propose a robust predictor based on a lightweight and simplified deep learning architecture using a convolutional neural network (CNN) to extract significant features from raw RNA sequences without the need for more customized features. The proposed model's performance is comprehensively evaluated using k-fold cross-validation on a benchmark dataset. The proposed model significantly outperforms existing computational methods in the prediction of piRNAs and their role in target mRNA deadenylation. In addition, a user-friendly and publicly-accessible web server is available at http://nsclbio.jbnu.ac.kr/tools/2S-piRCNN/.
Collapse
|
12
|
Samami E, Pourali G, Arabpour M, Fanipakdel A, Shahidsales S, Javadinia SA, Hassanian SM, Mohammadparast S, Avan A. The Potential Diagnostic and Prognostic Value of Circulating MicroRNAs in the Assessment of Patients With Prostate Cancer: Rational and Progress. Front Oncol 2022; 11:716831. [PMID: 35186706 PMCID: PMC8855122 DOI: 10.3389/fonc.2021.716831] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 12/31/2021] [Indexed: 12/20/2022] Open
Abstract
Prostate cancer (P.C.) is one of the most frequent diagnosed cancers among men and the first leading cause of death with an annual incidence of 1.4 million worldwide. Prostate-specific antigen is being used for screening/diagnosis of prostate disease, although it is associated with several limitations. Thus, identification of novel biomarkers is warranted for diagnosis of patients at earlier stages. MicroRNAs (miRNAs) are recently being emerged as potential biomarkers. It has been shown that these small molecules can be circulated in body fluids and prognosticate the risk of developing P.C. Several miRNAs, including MiR-20a, MiR-21, miR-375, miR-378, and miR-141, have been proposed to be expressed in prostate cancer. This review summarizes the current knowledge about possible molecular mechanisms and potential application of tissue specific and circulating microRNAs as diagnosis, prognosis, and therapeutic targets in prostate cancer.
Collapse
Affiliation(s)
- Elham Samami
- Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Tehran University of Medical Sciences, Tehran, Iran
| | - Ghazaleh Pourali
- Cancer Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahla Arabpour
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Azar Fanipakdel
- Cancer Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | | | - Seyed Alireza Javadinia
- Vasei Clinical Research Development Unit, Sabzevar University of Medical Sciences, Sabzevar, Iran
| | - Seyed Mahdi Hassanian
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Saeid Mohammadparast
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Amir Avan
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
- Basic Medical Sciences Institute, Mashhad University of Medical Sciences, Mashhad, Iran
- *Correspondence: Amir Avan,
| |
Collapse
|
13
|
da Costa AH, Santos RACD, Cerri R. Investigating deep feedforward neural networks for classification of transposon-derived piRNAs. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-021-00531-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractPIWI-interacting RNAs (piRNAS) form an important class of non-coding RNAs that play a key role in gene expression regulation and genome integrity by silencing transposable elements. However, despite the importance of piRNAs and the large application of deep learning in computational biology, there are few studies of deep learning for piRNAs prediction. Still, current methods focus on using advanced architectures like CNN and variations. This paper presents an investigation on deep feedforward network models for classification of human transposon-derived piRNAs. We developed a lightweight predictor (when compared to other deep learning methods) and we show by practical evidence that simple neural networks can perform as well as better than complex neural networks when using the appropriate hyperparameters. For that, we train, analyze and compare the results of a multilayer perceptron with different hyperparameter choices, such as numbers of hidden layers, activation functions and optimizers, clarifying the advantages and disadvantages of each choice. Our proposed predictor reached a F-score of 0.872, outperforming other state-of-the-art methods for human transposon-derived piRNAs classification. In addition, to better access the generalization of our proposal, we also showed it achieved competitive results when classifying piRNAs of other species.
Collapse
|
14
|
ASRmiRNA: Abiotic Stress-Responsive miRNA Prediction in Plants by Using Machine Learning Algorithms with Pseudo K-Tuple Nucleotide Compositional Features. Int J Mol Sci 2022; 23:ijms23031612. [PMID: 35163534 PMCID: PMC8835813 DOI: 10.3390/ijms23031612] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 01/23/2022] [Accepted: 01/26/2022] [Indexed: 02/04/2023] Open
Abstract
MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational method for prediction of miRNAs associated with abiotic stresses. Three types of datasets were used for prediction, i.e., miRNA, Pre-miRNA, and Pre-miRNA + miRNA. The pseudo K-tuple nucleotide compositional features were generated for each sequence to transform the sequence data into numeric feature vectors. Support vector machine (SVM) was employed for prediction. The area under receiver operating characteristics curve (auROC) of 70.21, 69.71, 77.94 and area under precision-recall curve (auPRC) of 69.96, 65.64, 77.32 percentages were obtained for miRNA, Pre-miRNA, and Pre-miRNA + miRNA datasets, respectively. Overall prediction accuracies for the independent test set were 62.33, 64.85, 69.21 percentages, respectively, for the three datasets. The SVM also achieved higher accuracy than other learning methods such as random forest, extreme gradient boosting, and adaptive boosting. To implement our method with ease, an online prediction server “ASRmiRNA” has been developed. The proposed approach is believed to supplement the existing effort for identification of abiotic stress-responsive miRNAs and Pre-miRNAs.
Collapse
|
15
|
Hanusek K, Poletajew S, Kryst P, Piekiełko-Witkowska A, Bogusławska J. piRNAs and PIWI Proteins as Diagnostic and Prognostic Markers of Genitourinary Cancers. Biomolecules 2022; 12:biom12020186. [PMID: 35204687 PMCID: PMC8869487 DOI: 10.3390/biom12020186] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 01/14/2022] [Accepted: 01/18/2022] [Indexed: 12/30/2022] Open
Abstract
piRNAs (PIWI-interacting RNAs) are small non-coding RNAs capable of regulation of transposon and gene expression. piRNAs utilise multiple mechanisms to affect gene expression, which makes them potentially more powerful regulators than microRNAs. The mechanisms by which piRNAs regulate transposon and gene expression include DNA methylation, histone modifications, and mRNA degradation. Genitourinary cancers (GC) are a large group of neoplasms that differ by their incidence, clinical course, biology, and prognosis for patients. Regardless of the GC type, metastatic disease remains a key therapeutic challenge, largely affecting patients’ survival rates. Recent studies indicate that piRNAs could serve as potentially useful biomarkers allowing for early cancer detection and therapeutic interventions at the stage of non-advanced tumour, improving patient’s outcomes. Furthermore, studies in prostate cancer show that piRNAs contribute to cancer progression by affecting key oncogenic pathways such as PI3K/AKT. Here, we discuss recent findings on biogenesis, mechanisms of action and the role of piRNAs and the associated PIWI proteins in GC. We also present tools that may be useful for studies on the functioning of piRNAs in cancers.
Collapse
Affiliation(s)
- Karolina Hanusek
- Centre of Postgraduate Medical Education, Department of Biochemistry and Molecular Biology, 01-813 Warsaw, Poland;
| | - Sławomir Poletajew
- Centre of Postgraduate Medical Education, II Department of Urology, 01-813 Warsaw, Poland; (S.P.); (P.K.)
| | - Piotr Kryst
- Centre of Postgraduate Medical Education, II Department of Urology, 01-813 Warsaw, Poland; (S.P.); (P.K.)
| | - Agnieszka Piekiełko-Witkowska
- Centre of Postgraduate Medical Education, Department of Biochemistry and Molecular Biology, 01-813 Warsaw, Poland;
- Correspondence: (A.P.-W.); (J.B.)
| | - Joanna Bogusławska
- Centre of Postgraduate Medical Education, Department of Biochemistry and Molecular Biology, 01-813 Warsaw, Poland;
- Correspondence: (A.P.-W.); (J.B.)
| |
Collapse
|
16
|
Khan S, Khan M, Iqbal N, Amiruddin Abd Rahman M, Khalis Abdul Karim M. Deep-piRNA: Bi-Layered Prediction Model for PIWI-Interacting RNA Using Discriminative Features. COMPUTERS, MATERIALS & CONTINUA 2022; 72:2243-2258. [DOI: 10.32604/cmc.2022.022901] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 11/11/2021] [Indexed: 09/02/2023]
|
17
|
Ao C, Jiao S, Wang Y, Yu L, Zou Q. Biological Sequence Classification: A Review on Data and General Methods. RESEARCH 2022. [DOI: 10.34133/research.0011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website (
http://lab.malab.cn/~acy/BioseqData/home.html
), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi’an, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
18
|
Zheng Y, Wang H, Ding Y, Guo F. CEPZ: A Novel Predictor for Identification of DNase I Hypersensitive Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2768-2774. [PMID: 33481716 DOI: 10.1109/tcbb.2021.3053661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
DNase I hypersensitive sites (DHSs) have proven to be tightly associated with cis-regulatory elements, commonly indicating specific function on the chromatin structure. Thus, identifying DHSs plays a fundamental role in decoding gene regulatory behavior. While traditional experimental methods turn to be time-consuming and expensive, computational techniques promise to be practical to discovering and analyzing regulatory factors. In this study, we applied an efficient model that considered composition information and physicochemical properties and effectively selected features with a boosting algorithm. CEPZ, our predictor, greatly improved a Matthews correlation coefficient and accuracy of 0.7740 and 0.9113 respectively, more competitive than any predictor before. This result suggests that it may become a useful tool for DHSs research in the human and other complex genomes. Our research was anchored on the properties of dinucleotides and we identified several dinucleotides with significant differences in the distribution of DHS and non-DHS samples, which are likely to have a special meaning in the chromatin structure. The datasets, feature sets and the relevant algorithm are available at https://github.com/YanZheng-16/CEPZ_DHS/.
Collapse
|
19
|
Huang S, Yoshitake K, Asakawa S. A Review of Discovery Profiling of PIWI-Interacting RNAs and Their Diverse Functions in Metazoans. Int J Mol Sci 2021; 22:ijms222011166. [PMID: 34681826 PMCID: PMC8538981 DOI: 10.3390/ijms222011166] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/14/2021] [Indexed: 12/16/2022] Open
Abstract
PIWI-interacting RNAs (piRNAs) are a class of small non-coding RNAs (sncRNAs) that perform crucial biological functions in metazoans and defend against transposable elements (TEs) in germ lines. Recently, ubiquitously expressed piRNAs were discovered in soma and germ lines using small RNA sequencing (sRNA-seq) in humans and animals, providing new insights into the diverse functions of piRNAs. However, the role of piRNAs has not yet been fully elucidated, and sRNA-seq studies continue to reveal different piRNA activities in the genome. In this review, we summarize a set of simplified processes for piRNA analysis in order to provide a useful guide for researchers to perform piRNA research suitable for their study objectives. These processes can help expand the functional research on piRNAs from previously reported sRNA-seq results in metazoans. Ubiquitously expressed piRNAs have been discovered in the soma and germ lines in Annelida, Cnidaria, Echinodermata, Crustacea, Arthropoda, and Mollusca, but they are limited to germ lines in Chordata. The roles of piRNAs in TE silencing, gene expression regulation, epigenetic regulation, embryonic development, immune response, and associated diseases will continue to be discovered via sRNA-seq.
Collapse
Affiliation(s)
- Songqian Huang
- Correspondence: (S.H.); (S.A.); Tel.: +81-3-5841-5296 (S.A.); Fax: +81-3-5841-8166 (S.A.)
| | | | - Shuichi Asakawa
- Correspondence: (S.H.); (S.A.); Tel.: +81-3-5841-5296 (S.A.); Fax: +81-3-5841-8166 (S.A.)
| |
Collapse
|
20
|
Alghamdi W, Alzahrani E, Ullah MZ, Khan YD. 4mC-RF: Improving the prediction of 4mC sites using composition and position relative features and statistical moment. Anal Biochem 2021; 633:114385. [PMID: 34571005 DOI: 10.1016/j.ab.2021.114385] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 01/28/2023]
Abstract
N4-methylcytosine (4 mC) is an important epigenetic modification that occurs enzymatically by the action of DNA methyltransferases. 4 mC sites exist in prokaryotes and eukaryotes while playing a vital role in regulating gene expression, DNA replication, and cell cycle. The efficient and accurate prediction of 4 mC sites has a significant role in the insight of 4 mC biological properties and functions. Therefore, a sequence-based predictor is proposed, namely 4 mC-RF, for identifying 4 mC sites through the integration of statistical moments along with position, and composition-dependent features. Relative and absolute position-based features are computed to extract optimal features. A popular machine learning classifier Random Forest was used for training the model. Validation results were obtained through rigorous processes of self-consistency, 10-fold cross-validation, Independent set testing, and Jackknife yielding 95.1%, 95.2%, 97.0%, and 94.7% accuracies, respectively. Our proposed model depicts the highest prediction accuracies as compared to existing models. Subsequently, the developed 4 mC-RF model was constructed into a web server. A significant and more accurate predictor of 4 mC Methylcytosine sites helps experimental scientists to gather faster, efficient, and cost-effective results.
Collapse
Affiliation(s)
- Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P. O. Box 80221, Jeddah 21589, Saudi Arabia.
| | - Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah 21589, Saudi Arabia.
| | - Malik Zaka Ullah
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah 21589, Saudi Arabia.
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore 54770, Pakistan.
| |
Collapse
|
21
|
Bioinformatics and Machine Learning Approaches to Understand the Regulation of Mobile Genetic Elements. BIOLOGY 2021; 10:biology10090896. [PMID: 34571773 PMCID: PMC8465862 DOI: 10.3390/biology10090896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/06/2021] [Accepted: 09/07/2021] [Indexed: 11/22/2022]
Abstract
Simple Summary Transposable elements (TEs) are DNA sequences that are, or were, able to move (transpose) within the genome of a single cell. They were first discovered by Barbara McClintock while working on maize, and they make up a large fraction of the genome. Transpositions can result in mutations and they can alter the genome size. Cells regulate the activity of TEs using a variety of mechanisms, such as chemical modifications of DNA and small RNAs. Machine learning (ML) is an interdisciplinary subject that studies computer algorithms that can improve through experience and by the use of data. ML has been successfully applied to a variety of problems in bioinformatics and has exhibited favorable precision and speed. Here, we provide a systematic and guided review on the ML and bioinformatic methods and tools that are used for the analysis of the regulation of TEs. Abstract Transposable elements (TEs, or mobile genetic elements, MGEs) are ubiquitous genetic elements that make up a substantial proportion of the genome of many species. The recent growing interest in understanding the evolution and function of TEs has revealed that TEs play a dual role in genome evolution, development, disease, and drug resistance. Cells regulate TE expression against uncontrolled activity that can lead to developmental defects and disease, using multiple strategies, such as DNA chemical modification, small RNA (sRNA) silencing, chromatin modification, as well as sequence-specific repressors. Advancements in bioinformatics and machine learning approaches are increasingly contributing to the analysis of the regulation mechanisms. A plethora of tools and machine learning approaches have been developed for prediction, annotation, and expression profiling of sRNAs, for methylation analysis of TEs, as well as for genome-wide methylation analysis through bisulfite sequencing data. In this review, we provide a guided overview of the bioinformatic and machine learning state of the art of fields closely associated with TE regulation and function.
Collapse
|
22
|
Akbar S, Ahmad A, Hayat M, Rehman AU, Khan S, Ali F. iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Comput Biol Med 2021; 137:104778. [PMID: 34481183 DOI: 10.1016/j.compbiomed.2021.104778] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 08/16/2021] [Accepted: 08/17/2021] [Indexed: 11/26/2022]
Abstract
Tuberculosis (TB) is a worldwide illness caused by the bacteria Mycobacterium tuberculosis. Owing to the high prevalence of multidrug-resistant tuberculosis, numerous traditional strategies for developing novel alternative therapies have been presented. The effectiveness and dependability of these procedures are not always consistent. Peptide-based therapy has recently been regarded as a preferable alternative due to its excellent selectivity in targeting specific cells without affecting the normal cells. However, due to the rapid growth of the peptide samples, predicting TB accurately has become a challenging task. To effectively identify antitubercular peptides, an intelligent and reliable prediction model is indispensable. An ensemble learning approach was used in this study to improve expected results by compensating for the shortcomings of individual classification algorithms. Initially, three distinct representation approaches were used to formulate the training samples: k-space amino acid composition, composite physiochemical properties, and one-hot encoding. The feature vectors of the applied feature extraction methods are then combined to generate a heterogeneous vector. Finally, utilizing individual and heterogeneous vectors, five distinct nature classification models were used to evaluate prediction rates. In addition, a genetic algorithm-based ensemble model was used to improve the suggested model's prediction and training capabilities. Using Training and independent datasets, the proposed ensemble model achieved an accuracy of 94.47% and 92.68%, respectively. It was observed that our proposed "iAtbP-Hyb-EnC" model outperformed and reported ~10% highest training accuracy than existing predictors. The "iAtbP-Hyb-EnC" model is suggested to be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and all datasets are publicly available at https://github.com/Farman335/iAtbP-Hyb-EnC.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ashfaq Ahmad
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ateeq Ur Rehman
- Department of Information Technology, The University of Haripur, KP, Pakistan.
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
23
|
Akmal MA, Hussain W, Rasool N, Khan YD, Khan SA, Chou KC. Using CHOU'S 5-Steps Rule to Predict O-Linked Serine Glycosylation Sites by Blending Position Relative Features and Statistical Moment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2045-2056. [PMID: 31985438 DOI: 10.1109/tcbb.2020.2968441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.
Collapse
|
24
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs. Int J Mol Sci 2021; 22:8719. [PMID: 34445436 PMCID: PMC8395733 DOI: 10.3390/ijms22168719] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 02/06/2023] Open
Abstract
Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- National Center for Artificial Intelligence (NCAI), National University of Sciences and Technology, Islamabad 44000, Pakistan;
- School of Electrical Engineering & Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
25
|
Zhang J, Chen Q, Liu B. DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1451-1463. [PMID: 31722485 DOI: 10.1109/tcbb.2019.2952338] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two kinds of crucial proteins, which are associated with various cellule activities and some important diseases. Accurate identification of DBPs and RBPs facilitate both theoretical research and real world application. Existing sequence-based DBP predictors can accurately identify DBPs but incorrectly predict many RBPs as DBPs, and vice versa, resulting in low prediction precision. Moreover, some proteins (DRBPs) interacting with both DNA and RNA play important roles in gene expression and cannot be identified by existing computational methods. In this study, a two-level predictor named DeepDRBP-2L was proposed by combining Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM). It is the first computational method that is able to identify DBPs, RBPs and DRBPs. Rigorous cross-validations and independent tests showed that DeepDRBP-2L is able to overcome the shortcoming of the existing methods and can go one further step to identify DRBPs. Application of DeepDRBP-2L to tomato genome further demonstrated its performance. The webserver of DeepDRBP-2L is freely available at http://bliulab.net/DeepDRBP-2L.
Collapse
|
26
|
Feng P, Feng L, Tang C. Comparison and Analysis of Computational Methods for Identifying N6-Methyladenosine Sites in Saccharomyces cerevisiae. Curr Pharm Des 2021; 27:1219-1229. [PMID: 33167827 DOI: 10.2174/1381612826666201109110703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 07/20/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND N6-methyladenosine (m6A) plays critical roles in a broad range of biological processes. Knowledge about the precise location of m6A site in the transcriptome is vital for deciphering its biological functions. Although experimental techniques have made substantial contributions to identify m6A, they are still labor intensive and time consuming. As complement to experimental methods, in the past few years, a series of computational approaches have been proposed to identify m6A sites. METHODS In order to facilitate researchers to select appropriate methods for identifying m6A sites, it is necessary to conduct a comprehensive review and comparison of existing methods. RESULTS Since research works on m6A in Saccharomyces cerevisiae are relatively clear, in this review, we summarized recent progress of computational prediction of m6A sites in S. cerevisiae and assessed the performance of existing computational methods. Finally, future directions of computationally identifying m6A sites are presented. CONCLUSION Taken together, we anticipate that this review will serve as an important guide for computational analysis of m6A modifications.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Lijing Feng
- School of Sciences, North China University of Science and Technology, Tangshan 063000, China
| | - Chaohui Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| |
Collapse
|
27
|
Chen S, Ben S, Xin J, Li S, Zheng R, Wang H, Fan L, Du M, Zhang Z, Wang M. The biogenesis and biological function of PIWI-interacting RNA in cancer. J Hematol Oncol 2021; 14:93. [PMID: 34118972 PMCID: PMC8199808 DOI: 10.1186/s13045-021-01104-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 06/03/2021] [Indexed: 02/07/2023] Open
Abstract
Small non-coding RNAs (ncRNAs) are vital regulators of biological activities, and aberrant levels of small ncRNAs are commonly found in precancerous lesions and cancer. PIWI-interacting RNAs (piRNAs) are a novel type of small ncRNA initially discovered in germ cells that have a specific length (24-31 nucleotides), bind to PIWI proteins, and show 2'-O-methyl modification at the 3'-end. Numerous studies have revealed that piRNAs can play important roles in tumorigenesis via multiple biological regulatory mechanisms, including silencing transcriptional and posttranscriptional gene processes and accelerating multiprotein interactions. piRNAs are emerging players in the malignant transformation of normal cells and participate in the regulation of cancer hallmarks. Most of the specific cancer hallmarks regulated by piRNAs are involved in sustaining proliferative signaling, resistance to cell death or apoptosis, and activation of invasion and metastasis. Additionally, piRNAs have been used as biomarkers for cancer diagnosis and prognosis and have great potential for clinical utility. However, research on the underlying mechanisms of piRNAs in cancer is limited. Here, we systematically reviewed recent advances in the biogenesis and biological functions of piRNAs and relevant bioinformatics databases with the aim of providing insights into cancer diagnosis and clinical applications. We also focused on some cancer hallmarks rarely reported to be related to piRNAs, which can promote in-depth research of piRNAs in molecular biology and facilitate their clinical translation into cancer treatment.
Collapse
Affiliation(s)
- Silu Chen
- Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, People's Republic of China.,Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Shuai Ben
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Junyi Xin
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Shuwei Li
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Rui Zheng
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Hao Wang
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Lulu Fan
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Mulong Du
- Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China.,Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Zhengdong Zhang
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Meilin Wang
- Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, People's Republic of China. .,Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China. .,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China. .,Suzhou Municipal Hospital, Gusu School, The Affiliated Suzhou Hospital of Nanjing Medical University, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
28
|
Computational Methods and Online Resources for Identification of piRNA-Related Molecules. Interdiscip Sci 2021; 13:176-191. [PMID: 33886096 DOI: 10.1007/s12539-021-00428-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 03/26/2021] [Accepted: 03/29/2021] [Indexed: 02/07/2023]
Abstract
piRNAs are a class of small non-coding RNA molecules, which interact with the PIWI family and have many important and diverse biological functions. The present review is aimed to provide guidelines and contribute to piRNA research. We focused on the four types of identification models on piRNA-related molecules, including piRNA, piRNA cluster, piRNA target, and disease-related piRNA. We evaluated the types of tools for the identification of piRNAs based on five aspects: datasets, features, classifiers, performance, and usability. We found the precision of 2lpiRNApred was the highest in datasets of model organisms, piRNN had a better performance of datasets of non-model organisms, and 2L-piRNA had the fastest recognition speed of all tools. In addition, we presented an overview of piRNA databases. The databases were divided into six categories: basic annotation, comprehensive annotation, isoform, cluster, target, and disease. We found that piRNA data of non-model organisms, piRNA target data, and piRNA-disease-associated data should be strengthened. Our review might assist researchers in selecting appropriate tools or datasets for their studies, reveal potential problems and shed light on future bioinformatics studies.
Collapse
|
29
|
Yao Y, Zhang S, Liang Y. iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:317-331. [PMID: 33730950 DOI: 10.1080/1062936x.2021.1895884] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 02/23/2021] [Indexed: 06/12/2023]
Abstract
DNA replication is not only the basis of biological inheritance but also the most fundamental process in all living organisms. It plays a crucial role in the cell-division cycle and gene expression regulation. Hence, the accurate identification of the origin of replication sites (ORIs) has a great meaning for further understanding the regulatory mechanism of gene expression and treating genic diseases. In this paper, a novel, feasible and powerful model, namely, iORI-ENST is designed for identifying ORIs. Firstly, we extract the different features by incorporating mono-nucleotide binary encoding and dinucleotide-based spatial autocorrelation. Subsequently, elastic net is utilized as the feature selection method to select the optimal feature set. And then stacking learning is employed to predict ORIs and non-ORIs, which contains random forest, adaboost, gradient boosting decision tree, extra trees and support vector machine. Finally, the ORI sites are identified on the benchmark datasets S1 and S2 with their accuracies of 91.41% and 95.07%, respectively. Meanwhile, an independent dataset S3 is employed to verify the validation and transferability of our model and its accuracy reaches 91.10%. Comparing with state-of-the-art methods, our model achieves more remarkable performance. The results show our model is a feasible, effective and powerful tool for identifying ORIs. The source code and datasets are available at https://github.com/YingyingYao/iORI-ENST.
Collapse
Affiliation(s)
- Y Yao
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - S Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Y Liang
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| |
Collapse
|
30
|
|
31
|
Awais M, Hussain W, Khan YD, Rasool N, Khan SA, Chou KC. iPhosH-PseAAC: Identify Phosphohistidine Sites in Proteins by Blending Statistical Moments and Position Relative Features According to the Chou's 5-Step Rule and General Pseudo Amino Acid Composition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:596-610. [PMID: 31144645 DOI: 10.1109/tcbb.2019.2919025] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein phosphorylation is one of the key mechanism in prokaryotes and eukaryotes and is responsible for various biological functions such as protein degradation, intracellular localization, the multitude of cellular processes, molecular association, cytoskeletal dynamics, and enzymatic inhibition/activation. Phosphohistidine (PhosH) has a key role in a number of biological processes, including central metabolism to signalling in eukaryotes and bacteria. Thus, identification of phosphohistidine sites in a protein sequence is crucial, and experimental identification can be expensive, time-taking, and laborious. To address this problem, here, we propose a novel computational model namely iPhosH-PseAAC for prediction of phosphohistidine sites in a given protein sequence using pseudo amino acid composition (PseAAC), statistical moments, and position relative features. The results of the proposed predictor are validated through self-consistency testing, 10-fold cross-validation, and jackknife testing. The self-consistency validation gave the 100 percent accuracy, whereas, for cross-validation, the accuracy achieved is 94.26 percent. Moreover, jackknife testing gave 97.07 percent accuracy for the proposed model. Thus, the proposed model iPhosH-PseAAC for prediction of iPhosH site has the great ability to predict the PhosH sites in given proteins.
Collapse
|
32
|
Khan YD, Alzahrani E, Alghamdi W, Ullah MZ. Sequence-based Identification of Allergen Proteins Developed by Integration of PseAAC and Statistical Moments via 5-Step Rule. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200424085947] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Background:
Allergens are antigens that can stimulate an atopic type I human
hypersensitivity reaction by an immunoglobulin E (IgE) reaction. Some proteins are naturally
allergenic than others. The challenge for toxicologists is to identify properties that allow proteins
to cause allergic sensitization and allergic diseases. The identification of allergen proteins is a very
critical and pivotal task. The experimental identification of protein functions is a hectic, laborious
and costly task; therefore, computer scientists have proposed various methods in the field of
computational biology and bioinformatics using various data science approaches. Objectives:
Herein, we report a novel predictor for the identification of allergen proteins.
Methods:
For feature extraction, statistical moments and various position-based features have been
incorporated into Chou’s pseudo amino acid composition (PseAAC), and are used for training of a
neural network.
Results:
The predictor is validated through 10-fold cross-validation and Jackknife testing, which
gave 99.43% and 99.87% accurate results.
Conclusions:
Thus, the proposed predictor can help in predicting the Allergen proteins in an
efficient and accurate way and can provide baseline data for the discovery of new drugs and
biomarkers.
Collapse
Affiliation(s)
- Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, C II Johar Town, Lahore 54770, Pakistan
| | - Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 80221, Jeddah, Saudi Arabia
| | - Malik Zaka Ullah
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| |
Collapse
|
33
|
Yang XF, Zhou YK, Zhang L, Gao Y, Du PF. Predicting LncRNA Subcellular Localization Using Unbalanced Pseudo-k Nucleotide Compositions. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190902151038] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Background:
Long non-coding RNAs (lncRNAs) are transcripts with a length more
than 200 nucleotides, functioning in the regulation of gene expression. More evidence has shown
that the biological functions of lncRNAs are intimately related to their subcellular localizations.
Therefore, it is very important to confirm the lncRNA subcellular localization.
Methods:
In this paper, we proposed a novel method to predict the subcellular localization of
lncRNAs. To more comprehensively utilize lncRNA sequence information, we exploited both kmer
nucleotide composition and sequence order correlated factors of lncRNA to formulate
lncRNA sequences. Meanwhile, a feature selection technique which was based on the Analysis Of
Variance (ANOVA) was applied to obtain the optimal feature subset. Finally, we used the support
vector machine (SVM) to perform the prediction.
Results:
The AUC value of the proposed method can reach 0.9695, which indicated the proposed
predictor is an efficient and reliable tool for determining lncRNA subcellular localization. Furthermore,
the predictor can reach the maximum overall accuracy of 90.37% in leave-one-out cross
validation, which clearly outperforms the existing state-of- the-art method.
Conclusion:
It is demonstrated that the proposed predictor is feasible and powerful for the prediction
of lncRNA subcellular. To facilitate subsequent genetic sequence research, we shared the
source code at https://github.com/NicoleYXF/lncRNA.
Collapse
Affiliation(s)
- Xiao-Fei Yang
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Yuan-Ke Zhou
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Lin Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Yang Gao
- School of Medicine, Nankai University, Tianjin 300071, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|
34
|
Zhang J, Chen Q, Liu B. iDRBP_MMC: Identifying DNA-Binding Proteins and RNA-Binding Proteins Based on Multi-Label Learning Model and Motif-Based Convolutional Neural Network. J Mol Biol 2020; 432:5860-5875. [DOI: 10.1016/j.jmb.2020.09.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/12/2020] [Accepted: 09/04/2020] [Indexed: 11/28/2022]
|
35
|
Liu GH, Zhang BW, Qian G, Wang B, Mao B, Bichindaritz I. Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1966-1980. [PMID: 31107658 DOI: 10.1109/tcbb.2019.2917429] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Prediction of protein subcellular location has currently become a hot topic because it has been proven to be useful for understanding both the disease mechanisms and novel drug design. With the rapid development of automated microscopic imaging technology in recent years, classification methods of bioimage-based protein subcellular location have attracted considerable attention for images can describe the protein distribution intuitively and in detail. In the current study, a prediction method of protein subcellular location was proposed based on multi-view image features that are extracted from three different views, including the four texture features of the original image, the global and local features of the protein extracted from the protein channel images after color segmentation, and the global features of DNA extracted from the DNA channel image. Finally, the extracted features were combined together to improve the performance of subcellular localization prediction. From the performance comparison of different combination features under the same classifier, the best ensemble features could be obtained. In this work, a classifier based on Stacked Auto-encoders and the random forest was also put forward. To improve the prediction results, the deep network was combined with the traditional statistical classification methods. Stringent cross-validation and independent validation tests on the benchmark dataset demonstrated the efficacy of the proposed method.
Collapse
|
36
|
Amanat S, Ashraf A, Hussain W, Rasool N, Khan YD. Identification of Lysine Carboxylation Sites in Proteins by Integrating Statistical Moments and Position Relative Features via General PseAAC. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190723114923] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Background:
Carboxylation is one of the most biologically important post-translational
modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these
three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent
and biologically important type of carboxylation. For studying such biological functions, it is essential
to correctly determine the lysine sites sensitive to carboxylation.
Objective:
Herein, we present a computational model for the prediction of the carboxylysine site
which is based on machine learning.
Methods:
Various position and composition relative features have been incorporated into the Pse-
AAC for construction of feature vectors and a neural network is employed as a classifier. The
model is validated by jackknife, cross-validation, self-consistency, and independent testing.
Results:
The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp,
99.76% Sp, and 0.99 MCC. Using the jackknife method, prediction model validation gave 97.07%
Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc.
Conclusion:
The results of independent dataset testing were 94.3% which illustrated that the proposed
model has better performance as compared to the existing model PreLysCar; however, the
accuracy can be improved further, in the future, due to the increasing number of carboxylysine
sites in proteins.
Collapse
Affiliation(s)
- Saba Amanat
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Adeel Ashraf
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Waqar Hussain
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Nouman Rasool
- Department of Life Sciences, School of Science University of Management and Technology, Lahore, Pakistan
| | - Yaser D. Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
37
|
Wang L, Yang L, Feng YL, Zhang H. Evolutionary insights into the active-site structures of the metallo-β-lactamase superfamily from a classification study with support vector machine. J Biol Inorg Chem 2020; 25:1023-1034. [PMID: 32945939 DOI: 10.1007/s00775-020-01822-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 09/05/2020] [Indexed: 12/01/2022]
Abstract
The metallo-β-lactamase (MβL) superfamily, which is intriguing due to its enzyme promiscuity, is a good model enzyme superfamily for studies of catalytic function evolution. Our previous study traced the evolution of the phosphotriesterase activity of the MβL superfamily and found that MβLs go through three typical active-site structures in the development of phosphotriesterase activity. In the present study, taking the three typical active-site structures as class labels, the classification and prediction models, which were established by support vector machine and amino acid composition, classified the MβL members into three classes. The indispensable amino acid compositions showed a surprising performance that was remarkably better than the performance of the dispensable amino acid compositions and even equal to the performance of the 20 native amino acids. We further traced the origin of the classification error and found that there was one subclass adopting a type of active-site structure that was the evolutionary transition between these classes. After that, our classification and prediction models were successfully used to predict several MβL active-site structures that lost the dinuclear structures during crystallization. In summary, our studies established a classification and prediction system for active-site structures that well compensated for experimental methods that recognize protein structure details and suggest that the indispensable amino acids contain much more protein structure information than the dispensable amino acids.
Collapse
Affiliation(s)
- Lili Wang
- College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou, 730070, People's Republic of China
| | - Ling Yang
- MIIT Key Laboratory of Critical Materials Technology for New Energy Conversion and Storage, Institute of Theoretical and Simulation Chemistry, School of Chemistry and Chemical Engineering, Harbin Institute of Technology, Harbin, 150080, People's Republic of China
| | - Yu-Lan Feng
- Biomedical Research Center, College of Life Science and Engineering, Northwest Minzu University, Lanzhou, 730030, People's Republic of China
| | - Hao Zhang
- Biomedical Research Center, College of Life Science and Engineering, Northwest Minzu University, Lanzhou, 730030, People's Republic of China.
| |
Collapse
|
38
|
Khan F, Khan M, Iqbal N, Khan S, Muhammad Khan D, Khan A, Wei DQ. Prediction of Recombination Spots Using Novel Hybrid Feature Extraction Method via Deep Learning Approach. Front Genet 2020; 11:539227. [PMID: 33093842 PMCID: PMC7527634 DOI: 10.3389/fgene.2020.539227] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 08/13/2020] [Indexed: 01/20/2023] Open
Abstract
Meiotic recombination is the driving force of evolutionary development and an important source of genetic variation. The meiotic recombination does not take place randomly in a chromosome but occurs in some regions of the chromosome. A region in chromosomes with higher rate of meiotic recombination events are considered as hotspots and a region where frequencies of the recombination events are lower are called coldspots. Prediction of meiotic recombination spots provides useful information about the basic functionality of inheritance and genome diversity. This study proposes an intelligent computational predictor called iRSpots-DNN for the identification of recombination spots. The proposed predictor is based on a novel feature extraction method and an optimized deep neural network (DNN). The DNN was employed as a classification engine whereas, the novel features extraction method was developed to extract meaningful features for the identification of hotspots and coldspots across the yeast genome. Unlike previous algorithms, the proposed feature extraction avoids bias among different selected features and preserved the sequence discriminant properties along with the sequence-structure information simultaneously. This study also considered other effective classifiers named support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to predict recombination spots. Experimental results on a benchmark dataset with 10-fold cross-validation showed that iRSpots-DNN achieved the highest accuracy, i.e., 95.81%. Additionally, the performance of the proposed iRSpots-DNN is significantly better than the existing predictors on a benchmark dataset. The relevant benchmark dataset and source code are freely available at: https://github.com/Fatima-Khan12/iRspot_DNN/tree/master/iRspot_DNN.
Collapse
Affiliation(s)
- Fatima Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mukhtaj Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Abbas Khan
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Ministry of Education, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
39
|
Lin Y, Zheng J, Lin D. PIWI-interacting RNAs in human cancer. Semin Cancer Biol 2020; 75:15-28. [PMID: 32877760 DOI: 10.1016/j.semcancer.2020.08.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/16/2020] [Accepted: 08/23/2020] [Indexed: 12/11/2022]
Abstract
P-element-induced wimpy testis (PIWI) interacting RNAs (piRNAs) are a class of small regulatory RNAs mechanistically similar to but much less studied than microRNAs and small interfering RNAs. Today the best understood function of piRNAs is transposon control in animal germ cells, which has earned them the name 'guardians of the germline'. Several molecular/cellular characteristics of piRNAs, including high sequence diversity, lack of secondary structures, and target-oriented generation seem to serve this purpose. Recently, aberrant expressions of piRNAs and PIWI proteins have been implicated in a variety of malignant tumors and associated with cancer hallmarks such as cell proliferation, inhibited apoptosis, invasion, metastasis and increased stemness. Researchers have also demonstrated multiple mechanisms of piRNA-mediated target deregulation associated with cancer initiation, progression or dissemination. We review current research findings on the biogenesis, normal functions and cancer associations of piRNAs, highlighting their potentials as cancer diagnostic/prognostic biomarkers and therapeutic tools. Whenever applicable, we draw connections with other research fields to encourage intercommunity conversations. We also offer recommendations and cautions regarding the general process of cancer-related piRNA studies and the methods/tools used at each step. Finally, we call attention to some issues that, if left unsolved, might impede the future development of this field.
Collapse
Affiliation(s)
- Yuan Lin
- Beijing Advanced Innovation Center for Genomics (ICG), Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, 100871, China.
| | - Jian Zheng
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China and Collaborative Innovation Center for Cancer Medicine, Guangzhou, 510060, China
| | - Dongxin Lin
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China and Collaborative Innovation Center for Cancer Medicine, Guangzhou, 510060, China; Department of Etiology and Carcinogenesis, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| |
Collapse
|
40
|
Gachpazan M, Kashani H, Khazaei M, Hassanian SM, Rezayi M, Asgharzadeh F, Ghayour-Mobarhan M, Ferns GA, Avan A. The Impact of Statin Therapy on the Survival of Patients with Gastrointestinal Cancer. Curr Drug Targets 2020; 20:738-747. [PMID: 30539694 DOI: 10.2174/1389450120666181211165449] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 10/25/2018] [Accepted: 12/05/2018] [Indexed: 12/13/2022]
Abstract
Statins are 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase inhibitors that may play an important role in the evolution of cancers, due to their effects on cancer cell metabolism. Statins affect several potential pathways, including cell proliferation, angiogenesis, apoptosis and metastasis. The number of trials assessing the putative clinical benefits of statins in cancer is increasing. Currently, there are several trials listed on the global trial identifier website clinicaltrials.gov. Given the compelling evidence from these trials in a variety of clinical settings, there have been calls for a clinical trial of statins in the adjuvant gastrointestinal cancer setting. However, randomized controlled trials on specific cancer types in relation to statin use, as well as studies on populations without a clinical indication for using statins, have elucidated some potential underlying biological mechanisms, and the investigation of different statins is probably warranted. It would be useful for these trials to incorporate the assessment of tumour biomarkers predictive of statin response in their design. This review summarizes the recent preclinical and clinical studies that assess the application of statins in the treatment of gastrointestinal cancers with particular emphasize on their association with cancer risk.
Collapse
Affiliation(s)
- Meysam Gachpazan
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Department of Modern Sciences and Technologies; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Hoda Kashani
- Department of Modern Sciences and Technologies; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Khazaei
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Seyed Mahdi Hassanian
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Department of Medical Biochemistry; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Rezayi
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Department of Modern Sciences and Technologies; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Fereshteh Asgharzadeh
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Ghayour-Mobarhan
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Brighton & Sussex Medical School, Division of Medical Education, Falmer, Brighton, Sussex BN1 9PH, United Kingdom
| | - Amir Avan
- Metabolic syndrome Research center, Mashhad University of Medical Sciences, Mashhad, Iran.,Department of Modern Sciences and Technologies; Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.,Cancer Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
41
|
Guay C, Jacovetti C, Bayazit MB, Brozzi F, Rodriguez-Trejo A, Wu K, Regazzi R. Roles of Noncoding RNAs in Islet Biology. Compr Physiol 2020; 10:893-932. [PMID: 32941685 DOI: 10.1002/cphy.c190032] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The discovery that most mammalian genome sequences are transcribed to ribonucleic acids (RNA) has revolutionized our understanding of the mechanisms governing key cellular processes and of the causes of human diseases, including diabetes mellitus. Pancreatic islet cells were found to contain thousands of noncoding RNAs (ncRNAs), including micro-RNAs (miRNAs), PIWI-associated RNAs, small nucleolar RNAs, tRNA-derived fragments, long non-coding RNAs, and circular RNAs. While the involvement of miRNAs in islet function and in the etiology of diabetes is now well documented, there is emerging evidence indicating that other classes of ncRNAs are also participating in different aspects of islet physiology. The aim of this article will be to provide a comprehensive and updated view of the studies carried out in human samples and rodent models over the past 15 years on the role of ncRNAs in the control of α- and β-cell development and function and to highlight the recent discoveries in the field. We not only describe the role of ncRNAs in the control of insulin and glucagon secretion but also address the contribution of these regulatory molecules in the proliferation and survival of islet cells under physiological and pathological conditions. It is now well established that most cells release part of their ncRNAs inside small extracellular vesicles, allowing the delivery of genetic material to neighboring or distantly located target cells. The role of these secreted RNAs in cell-to-cell communication between β-cells and other metabolic tissues as well as their potential use as diabetes biomarkers will be discussed. © 2020 American Physiological Society. Compr Physiol 10:893-932, 2020.
Collapse
Affiliation(s)
- Claudiane Guay
- Department of Fundamental Neurosciences, University of Lausanne, Lausanne, Switzerland.,Department of Biomedical Sciences, University of Lausanne, Lausanne, Switzerland
| | - Cécile Jacovetti
- Department of Fundamental Neurosciences, University of Lausanne, Lausanne, Switzerland.,Department of Biomedical Sciences, University of Lausanne, Lausanne, Switzerland
| | - Mustafa Bilal Bayazit
- Department of Fundamental Neurosciences, University of Lausanne, Lausanne, Switzerland.,Department of Biomedical Sciences, University of Lausanne, Lausanne, Switzerland
| | - Flora Brozzi
- Department of Fundamental Neurosciences, University of Lausanne, Lausanne, Switzerland.,Department of Biomedical Sciences, University of Lausanne, Lausanne, Switzerland
| | - Adriana Rodriguez-Trejo
- Department of Fundamental Neurosciences, University of Lausanne, Lausanne, Switzerland.,Department of Biomedical Sciences, University of Lausanne, Lausanne, Switzerland
| | - Kejing Wu
- Department of Fundamental Neurosciences, University of Lausanne, Lausanne, Switzerland.,Department of Biomedical Sciences, University of Lausanne, Lausanne, Switzerland
| | - Romano Regazzi
- Department of Fundamental Neurosciences, University of Lausanne, Lausanne, Switzerland.,Department of Biomedical Sciences, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
42
|
Chou KC. An Insightful 10-year Recollection Since the Emergence of the 5-steps Rule. Curr Pharm Des 2020; 25:4223-4234. [PMID: 31782354 DOI: 10.2174/1381612825666191129164042] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/25/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts 02478, United States.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
43
|
Hu Y, Lu Y, Wang S, Zhang M, Qu X, Niu B. Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs. Curr Drug Targets 2020; 20:488-500. [PMID: 30091413 DOI: 10.2174/1389450119666180809122244] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 06/19/2018] [Accepted: 06/25/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. OBJECTIVE In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. RESULTS Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. CONCLUSION This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.
Collapse
Affiliation(s)
- Yan Hu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yi Lu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Shuo Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Mengying Zhang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Xiaosheng Qu
- National Engineering Laboratory of Southwest Endangered Medicinal Resources Development, Guangxi Botanical Garden of Medicinal Plants, 530023,Nanning, China
| | - Bing Niu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
44
|
Feng P, Wang Z. Recent Advances in Computational Methods for Identifying Anticancer Peptides. Curr Drug Targets 2020; 20:481-487. [PMID: 30068270 DOI: 10.2174/1389450119666180801121548] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 05/28/2018] [Accepted: 05/28/2018] [Indexed: 01/10/2023]
Abstract
Anticancer peptide (ACP) is a kind of small peptides that can kill cancer cells without damaging normal cells. In recent years, ACP has been pre-clinically used for cancer treatment. Therefore, accurate identification of ACPs will promote their clinical applications. In contrast to labor-intensive experimental techniques, a series of computational methods have been proposed for identifying ACPs. In this review, we briefly summarized the current progress in computational identification of ACPs. The challenges and future perspectives in developing reliable methods for identification of ACPs were also discussed. We anticipate that this review could provide novel insights into future researches on anticancer peptides.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan, 063000, China
| | - Zhenyi Wang
- Center for Genomics and Computational Biology, School of Life Science, North China University of Science and Technology, Tangshan, 063000, China
| |
Collapse
|
45
|
Zuo Y, Zou Q, Lin J, Jiang M, Liu X. 2lpiRNApred: a two-layered integrated algorithm for identifying piRNAs and their functions based on LFE-GM feature selection. RNA Biol 2020; 17:892-902. [PMID: 32138598 PMCID: PMC7549647 DOI: 10.1080/15476286.2020.1734382] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 12/16/2019] [Accepted: 02/18/2020] [Indexed: 12/18/2022] Open
Abstract
Piwi-interacting RNAs (piRNAs) are indispensable in the transposon silencing, including in germ cell formation, germline stem cell maintenance, spermatogenesis, and oogenesis. piRNA pathways are amongst the major genome defence mechanisms, which maintain genome integrity. They also have important functions in tumorigenesis, as indicated by aberrantly expressed piRNAs being recently shown to play roles in the process of cancer development. A number of computational methods for this have recently been proposed, but they still have not yielded satisfactory predictive performance. Moreover, only one computational method that identifies whether piRNAs function in inducting target mRNA deadenylation been reported in the literature. In this study, we developed a two-layered integrated classifier algorithm, 2lpiRNApred. It identifies piRNAs in the first layer and determines whether they function in inducting target mRNA deadenylation in the second layer. A new feature selection algorithm, which was based on Luca fuzzy entropy and Gaussian membership function (LFE-GM), was proposed to reduce the dimensionality of the features. Five feature extraction strategies, namely, Kmer, General parallel correlation pseudo-dinucleotide composition, General series correlation pseudo-dinucleotide composition, Normalized Moreau-Broto autocorrelation, and Geary autocorrelation, and two types of classifier, Sparse Representation Classifier (SRC) and support vector machine with Mahalanobis distance-based radial basis function (SVMMDRBF), were used to construct a two-layered integrated classifier algorithm, 2lpiRNApred. The results indicate that 2lpiRNApred performs significantly better than six other existing prediction tools.
Collapse
Affiliation(s)
- Yun Zuo
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China
| | - Jianyuan Lin
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Min Jiang
- Department of Cognitive Science and Technology, Xiamen University, Xiamen, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen, China
| |
Collapse
|
46
|
|
47
|
|
48
|
Wang J, Zhang P, Lu Y, Li Y, Zheng Y, Kan Y, Chen R, He S. piRBase: a comprehensive database of piRNA sequences. Nucleic Acids Res 2020; 47:D175-D180. [PMID: 30371818 PMCID: PMC6323959 DOI: 10.1093/nar/gky1043] [Citation(s) in RCA: 137] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 10/22/2018] [Indexed: 12/25/2022] Open
Abstract
PIWI-interacting RNAs are a class of small RNAs that is most abundantly expressed in animal germline. Substantial research is going on to reveal the functions of piRNAs in the epigenetic and post-transcriptional regulation of transposons and genes. To collect and annotate these data, we developed piRBase, a database assisting piRNA functional study. Since its launch in 2014, piRBase has integrated 264 data sets from 21 organisms, and the number of collected piRNAs has reached 173 million. The latest piRBase release (v2.0, 2018) was more focused on the comprehensive annotation of piRNA sequences, as well as the increasing number of piRNAs. In addition, piRBase release v2.0 also contained the potential information of piRNA targets and disease related piRNA. All datasets in piRBase is free to access, and available for browse, search and bulk downloads at http://www.regulatoryrna.org/database/piRNA/.
Collapse
Affiliation(s)
- Jiajia Wang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yiping Lu
- School of Life Sciences, Zhengzhou University, Zhengzhou 450001,China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Science, Beijing 100049, China
| | - Yu Zheng
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Science, Beijing 100049, China
| | - Yunchao Kan
- China-UK-NYNU-RRes Joint Laboratory of insect biology, Henan Key Laboratory of Insect Biology in Funiu Mountain, Nanyang Normal University, Nanyang, Henan 473061,China
| | - Runsheng Chen
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
49
|
Wei H, Xu Y, Liu B. iPiDi-PUL: identifying Piwi-interacting RNA-disease associations based on positive unlabeled learning. Brief Bioinform 2020; 22:5829704. [PMID: 32393982 DOI: 10.1093/bib/bbaa058] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 03/15/2020] [Accepted: 03/24/2020] [Indexed: 12/20/2022] Open
Abstract
Accumulated researches have revealed that Piwi-interacting RNAs (piRNAs) are regulating the development of germ and stem cells, and they are closely associated with the progression of many diseases. As the number of the detected piRNAs is increasing rapidly, it is important to computationally identify new piRNA-disease associations with low cost and provide candidate piRNA targets for disease treatment. However, it is a challenging problem to learn effective association patterns from the positive piRNA-disease associations and the large amount of unknown piRNA-disease pairs. In this study, we proposed a computational predictor called iPiDi-PUL to identify the piRNA-disease associations. iPiDi-PUL extracted the features of piRNA-disease associations from three biological data sources, including piRNA sequence information, disease semantic terms and the available piRNA-disease association network. Principal component analysis (PCA) was then performed on these features to extract the key features. The training datasets were constructed based on known positive associations and the negative associations selected from the unknown pairs. Various random forest classifiers trained with these different training sets were merged to give the predictive results via an ensemble learning approach. Finally, the web server of iPiDi-PUL was established at http://bliulab.net/iPiDi-PUL to help the researchers to explore the associated diseases for newly discovered piRNAs.
Collapse
|
50
|
Bekhouche S, Mohamed Ben Ali Y. Feature Selection in GPCR Classification Using BAT Algorithm. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2020. [DOI: 10.1142/s1469026820500066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
G-Protein-Coupled Receptors (GPCR) are the large family of protein membrane; and until now some of them still remain orphans. Predicting GPCR functions is a challenging task, it depends closely to their classification, which requires a digital representation of each protein chain as an attribute vector. A major problem of GPCR databases is their great number of features which can produce combinatorial explosion and increase the complexity of classification algorithms. Feature selection techniques are used to deal with this problem by minimizing features space dimension, and keeping the most relevant ones. In this paper, we propose to use the BAT algorithm for extracting the pertinent features and to improve the classification results. We compared the results obtained by our system with two other bio-inspired algorithms, Evolutionary Algorithm and PSO search. Metrics quality measures used for comparison are Error Rate, Accuracy, MCC and [Formula: see text]-measure. Experimental results indicate that our system is more efficient.
Collapse
Affiliation(s)
- Safia Bekhouche
- Department of Computer Science, Badji Mokhtar University, Annaba 23000, Algeria
| | - Yamina Mohamed Ben Ali
- Lboratory of Research in Informatics (LRI), Badji Mokhtar University, Annaba 23000, Algeria
| |
Collapse
|