1
|
Sinha T, Sadhukhan S, Panda AC. Computational Prediction of Gene Regulation by lncRNAs. Methods Mol Biol 2025; 2883:343-362. [PMID: 39702716 DOI: 10.1007/978-1-0716-4290-0_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
High-throughput sequencing technologies and innovative bioinformatics tools discovered that most of the genome is transcribed into RNA. However, only a fraction of the RNAs in cell translates into proteins, while the majority of them are categorized as noncoding RNAs (ncRNAs). The ncRNAs with more than 200 nt without protein-coding ability are termed long noncoding RNAs (lncRNAs). Hundreds of studies established that lncRNAs are a crucial RNA family regulating gene expression. Regulatory RNAs, including lncRNAs, modulate gene expression by interacting with RNA, DNA, and proteins. Several databases and computational tools have been developed to explore the functions of lncRNAs in cellular physiology. This chapter discusses the tools available for lncRNA functional analysis and provides a detailed workflow for the computational analysis of lncRNAs.
Collapse
Affiliation(s)
- Tanvi Sinha
- Institute of Life Sciences, Nalco Square, Bhubaneswar, Odisha, India
| | - Susovan Sadhukhan
- Institute of Life Sciences, Nalco Square, Bhubaneswar, Odisha, India
| | - Amaresh C Panda
- Institute of Life Sciences, Nalco Square, Bhubaneswar, Odisha, India.
| |
Collapse
|
2
|
Tan L, Mengshan L, Yu F, Yelin L, Jihong Z, Lixin G. Predicting lncRNA-protein interactions using a hybrid deep learning model with dinucleotide-codon fusion feature encoding. BMC Genomics 2024; 25:1253. [PMID: 39732642 DOI: 10.1186/s12864-024-11168-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 12/18/2024] [Indexed: 12/30/2024] Open
Abstract
Long non-coding RNAs (lncRNAs) play crucial roles in numerous biological processes and are involved in complex human diseases through interactions with proteins. Accurate identification of lncRNA-protein interactions (LPI) can help elucidate the functional mechanisms of lncRNAs and provide scientific insights into the molecular mechanisms underlying related diseases. While many sequence-based methods have been developed to predict LPIs, efficiently extracting and effectively integrating potential feature information that reflects functional attributes from lncRNA and protein sequences remains a significant challenge. This paper proposes a Dinucleotide-Codon Fusion Feature encoding (DNCFF) and constructs an LPI prediction model based on deep learning, termed LPI-DNCFF. The Dual Nucleotide Visual Fusion Feature encoding (DNVFF) incorporates positional information of single nucleotides with subsequent nucleotide connections, while Codon Fusion Feature encoding (CFF) considers the specificity, molecular weight, and physicochemical properties of each amino acid. These encoding methods encapsulate rich and intuitive sequence information in limited encoding dimensions. The model comprehensively predicts LPIs by integrating global, local, and structural features, and inputs them into BiLSTM and attention layers to form a hybrid deep learning model. Experimental results demonstrate that LPI-DNCFF effectively predicts LPIs. The BiLSTM layer and attention mechanism can learn long-term dependencies and identify weighted key features, enhancing model performance. Compared to one-hot encoding, DNCFF more efficiently and thoroughly extracts potential sequence features. Compared to other existing methods, LPI-DNCFF achieved the best performance on the RPI1847 and ATH948 datasets, with MCC values of approximately 97.84% and 84.58%, respectively, outperforming the state-of-the-art method by about 1.44% and 3.48%.
Collapse
Affiliation(s)
- Li Tan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| | - Li Mengshan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China.
- Ganzhou Power Supply Branch of State Grid Jiangxi Electric Power Co., Ltd, Ganzhou, 341000, Jiangxi, China.
| | - Fu Yu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
- Ganzhou Power Supply Branch of State Grid Jiangxi Electric Power Co., Ltd, Ganzhou, 341000, Jiangxi, China
| | - Li Yelin
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| | - Zhu Jihong
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| | - Guan Lixin
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, 341000, Jiangxi, China
| |
Collapse
|
3
|
Dang Q. LncRNA DARS-AS1 in human cancers: A comprehensive review of its potency as a biomarker and therapeutic target. Gene 2024; 923:148566. [PMID: 38762015 DOI: 10.1016/j.gene.2024.148566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/08/2024] [Accepted: 05/13/2024] [Indexed: 05/20/2024]
Abstract
Long non-coding RNAs have emerged as important players in cancer biology. Increasing evidence has uncovered their potency in improving cancer management as they can be used as a credible prognostic and diagnostic biomarker. Recently, DARS-AS1 has gained significant attention for its involvement in facilitating tumor progression. So far, numerous research has been reported its upregulation in different malignancies of human body systems and revealed its association with cancer hallmarks as well as clinicopathological characteristics. Importantly, targeting DARS-AS1 holds promise in cancer therapy. In the current study, we provide an in-depth analysis of its expression status and explore the underlying mechanisms through which DARS-AS1 contributes to tumor initiation, growth, invasion, and metastasis. Additionally, we examine the correlation between DARS-AS1 expression and clinicopathological features of cancer patients, shedding light on its potential as a cancer biomarker. Furthermore, we discuss the therapeutic potential of targeting DARS-AS1 in cancer treatment, highlighting emerging strategies, such as RNA interference and small molecule inhibitors. Boosting the understanding of its functional role can open new avenues for precision medicine, thus resulting in better outcomes for cancer patients.
Collapse
Affiliation(s)
- Qiucai Dang
- Zhumadian Preschool Education College, Zhumadian, Henan Province 463000, China.
| |
Collapse
|
4
|
Sharma S, Houfani AA, Foster LJ. Pivotal functions and impact of long con-coding RNAs on cellular processes and genome integrity. J Biomed Sci 2024; 31:52. [PMID: 38745221 PMCID: PMC11092263 DOI: 10.1186/s12929-024-01038-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 04/30/2024] [Indexed: 05/16/2024] Open
Abstract
Recent advances in uncovering the mysteries of the human genome suggest that long non-coding RNAs (lncRNAs) are important regulatory components. Although lncRNAs are known to affect gene transcription, their mechanisms and biological implications are still unclear. Experimental research has shown that lncRNA synthesis, subcellular localization, and interactions with macromolecules like DNA, other RNAs, or proteins can all have an impact on gene expression in various biological processes. In this review, we highlight and discuss the major mechanisms through which lncRNAs function as master regulators of the human genome. Specifically, the objective of our review is to examine how lncRNAs regulate different processes like cell division, cell cycle, and immune responses, and unravel their roles in maintaining genomic architecture and integrity.
Collapse
Affiliation(s)
- Siddhant Sharma
- Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Aicha Asma Houfani
- Michael Smith Laboratories and Department of Biochemistry and Molecular Biology, University of British Columbia, 2185 E Mall, Vancouver, BC, V6T 1Z4, Canada
| | - Leonard J Foster
- Michael Smith Laboratories and Department of Biochemistry and Molecular Biology, University of British Columbia, 2185 E Mall, Vancouver, BC, V6T 1Z4, Canada.
| |
Collapse
|
5
|
Das G, Das T, Parida S, Ghosh Z. LncRTPred: Predicting RNA-RNA mode of interaction mediated by lncRNA. IUBMB Life 2024; 76:53-68. [PMID: 37606159 DOI: 10.1002/iub.2778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 07/19/2023] [Indexed: 08/23/2023]
Abstract
Long non-coding RNAs (lncRNAs) play a significant role in various biological processes. Hence, it is utmost important to elucidate their functions in order to understand the molecular mechanism of a complex biological system. This versatile RNA molecule has diverse modes of interaction, one of which constitutes lncRNA-mRNA interaction. Hence, identifying its target mRNA is essential to understand the function of an lncRNA explicitly. Existing lncRNA target prediction tools mainly adopt thermodynamics approach. Large execution time and inability to perform real-time prediction limit their usage. Further, lack of negative training dataset has been a hindrance in the path of developing machine learning (ML) based lncRNA target prediction tools. In this work, we have developed a ML-based lncRNA-mRNA target prediction model- 'LncRTPred'. Here we have addressed the existing problems by generating reliable negative dataset and creating robust ML models. We have identified the non-interacting lncRNA and mRNAs from the unlabelled dataset using BLAT. It is further filtered to get a reliable set of outliers. LncRTPred provides a cumulative_model_score as the final output against each query. In terms of prediction accuracy, LncRTPred outperforms other popular target prediction protocols like LncTar. Further, we have tested its performance against experimentally validated disease-specific lncRNA-mRNA interactions. Overall, performance of LncRTPred is heavily dependent on the size of the training dataset, which is highly reflected by the difference in its performance for human and mouse species. Its performance for human species shows better as compared to that for mouse when applied on an unknown data due to smaller size of the training dataset in case of mouse compared to that of human. Availability of increased number of lncRNA-mRNA interaction data for mouse will improve the performance of LncRTPred in future. Both webserver and standalone versions of LncRTPred are available. Web server link: http://bicresources.jcbose.ac.in/zhumur/lncrtpred/index.html. Github Link: https://github.com/zglabDIB/LncRTPred.
Collapse
Affiliation(s)
- Gourab Das
- Division of Bioinformatics, Bose Institute, Kolkata, India
| | - Troyee Das
- Division of Bioinformatics, Bose Institute, Kolkata, India
| | - Sibun Parida
- Division of Bioinformatics, Bose Institute, Kolkata, India
| | - Zhumur Ghosh
- Division of Bioinformatics, Bose Institute, Kolkata, India
| |
Collapse
|
6
|
Neatu R, Enekwa I, Thompson DJ, Schwalbe EC, Fois G, Abdelaal G, Veuger S, Frick M, Braubach P, Moschos SA. The Idiopathic Pulmonary Fibrosis-Associated Single Nucleotide Polymorphism RS35705950 Is Transcribed in a MUC5B Promoter Associated Long Non-Coding RNA (AC061979.1). Noncoding RNA 2022; 8:ncrna8060083. [PMID: 36548182 PMCID: PMC9781688 DOI: 10.3390/ncrna8060083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/25/2022] [Accepted: 11/30/2022] [Indexed: 12/13/2022] Open
Abstract
LncRNAs are involved in regulatory processes in the human genome, including gene expression. The rs35705950 SNP, previously associated with IPF, overlaps with the recently annotated lncRNA AC061979.1, a 1712 nucleotide transcript located within the MUC5B promoter at chromosome 11p15.5. To document the expression pattern of the transcript, we processed 3.9 TBases of publicly available RNA-SEQ data across 27 independent studies involving lung airway epithelial cells. Epithelial lung cells showed expression of this putative pancRNA. The findings were independently validated in cell lines and primary cells. The rs35705950 is found within a conserved region (from fish to primates) within the expressed sequence indicating functional importance. These results implicate the rs35705950-containing AC061979.1 pancRNA as a novel component of the MUC5B expression control minicircuitry.
Collapse
Affiliation(s)
- Ruxandra Neatu
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Ellison Building, Newcastle-Upon-Tyne NE1 8ST, UK
- Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Central Parkway, Newcastle-Upon-Tyne NE1 3BZ, UK
| | - Ifeanyi Enekwa
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Ellison Building, Newcastle-Upon-Tyne NE1 8ST, UK
| | - Dean J. Thompson
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Ellison Building, Newcastle-Upon-Tyne NE1 8ST, UK
| | - Edward C. Schwalbe
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Ellison Building, Newcastle-Upon-Tyne NE1 8ST, UK
| | - Giorgio Fois
- Institue of General Physiology, University of Ulm, Albert-Einstein-Allee 11, D89081 Ulm, Germany
| | - Gina Abdelaal
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Ellison Building, Newcastle-Upon-Tyne NE1 8ST, UK
| | - Stephany Veuger
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Ellison Building, Newcastle-Upon-Tyne NE1 8ST, UK
| | - Manfred Frick
- Institue of General Physiology, University of Ulm, Albert-Einstein-Allee 11, D89081 Ulm, Germany
| | - Peter Braubach
- Institute of Pathology, MHH Hannover, 30625 Hannover, Germany
| | - Sterghios A. Moschos
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Ellison Building, Newcastle-Upon-Tyne NE1 8ST, UK
- Correspondence:
| |
Collapse
|
7
|
Jafari-Raddani F, Davoodi-Moghaddam Z, Yousefi AM, Ghaffari SH, Bashash D. An overview of long noncoding RNAs: Biology, functions, therapeutics, analysis methods, and bioinformatics tools. Cell Biochem Funct 2022; 40:800-825. [PMID: 36111699 DOI: 10.1002/cbf.3748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 09/05/2022] [Accepted: 09/06/2022] [Indexed: 12/15/2022]
Abstract
Long noncoding RNAs (lncRNAs) are a diverse class of RNAs whose functions are widespread in all branches of life and have been the focus of attention in the last decade. While a huge number of lncRNAs have been identified, there is still much work to be done and plenty to be learned. In the current review, we begin with the biogenesis and function of lncRNAs as they are involved in the different cellular processes from regulating the architecture of chromosomes to controlling translation and post-translation modifications. Questions on how overexpression, mutations, or deficiency of lncRNAs can affect the cellular status and result in the pathogenesis of various human diseases are responded to. Besides, we allocate an overview of several studies, concerning the application of lncRNAs either as diagnostic and prognostic biomarkers or novel therapeutics. We also introduce the currently available techniques to explore details of lncRNAs such as their function, cellular localization, and structure. In the last section, as exponentially growing data in this area need to be gathered and organized in comprehensive databases, we have a particular focus on presenting general and specialized databases. Taken together, with this review, we aim to provide the latest information on different aspects of lncRNAs to highlight their importance in physiopathologic states and take a step towards helping future studies.
Collapse
Affiliation(s)
- Farideh Jafari-Raddani
- Department of Hematology and Blood Banking, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Zeinab Davoodi-Moghaddam
- Department of Hematology and Blood Banking, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amir-Mohammad Yousefi
- Department of Hematology and Blood Banking, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed H Ghaffari
- Hematology, Oncology and Stem Cell Transplantation Research Center, Shariati Hospital, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Davood Bashash
- Department of Hematology and Blood Banking, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
8
|
Peng L, Yang J, Wang M, Zhou L. Editorial: Machine learning-based methods for RNA data analysis—Volume II. Front Genet 2022; 13:1010089. [DOI: 10.3389/fgene.2022.1010089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/20/2022] [Indexed: 12/02/2022] Open
|
9
|
Peng L, Wang C, Tian X, Zhou L, Li K. Finding lncRNA-Protein Interactions Based on Deep Learning With Dual-Net Neural Architecture. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3456-3468. [PMID: 34587091 DOI: 10.1109/tcbb.2021.3116232] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The identification of lncRNA-protein interactions (LPIs) is important to understand the biological functions and molecular mechanisms of lncRNAs. However, most computational models are evaluated on a unique dataset, thereby resulting in prediction bias. Furthermore, previous models have not uncovered potential proteins (or lncRNAs) interacting with a new lncRNA (or protein). Finally, the performance of these models can be improved. In this study, we develop a Deep Learning framework with Dual-net Neural architecture to find potential LPIs (LPI-DLDN). First, five LPI datasets are collected. Second, the features of lncRNAs and proteins are extracted by Pyfeat and BioTriangle, respectively. Third, these features are concatenated as a vector after dimension reduction. Finally, a deep learning model with dual-net neural architecture is designed to classify lncRNA-protein pairs. LPI-DLDN is compared with six state-of-the-art LPI prediction methods (LPI-XGBoost, LPI-HeteSim, LPI-NRLMF, PLIPCOM, LPI-CNNCP, and Capsule-LPI) under four cross validations. The results demonstrate the powerful LPI classification performance of LPI-DLDN. Case study analyses show that there may be interactions between RP11-439E19.10 and Q15717, and between RP11-196G18.22 and Q9NUL5. The novelty of LPI-DLDN remains, integrating various biological features, designing a novel deep learning-based LPI identification framework, and selecting the optimal LPI feature subset based on feature importance ranking.
Collapse
|
10
|
Shaath H, Vishnubalaji R, Elango R, Kardousha A, Islam Z, Qureshi R, Alam T, Kolatkar PR, Alajez NM. Long non-coding RNA and RNA-binding protein interactions in cancer: Experimental and machine learning approaches. Semin Cancer Biol 2022; 86:325-345. [PMID: 35643221 DOI: 10.1016/j.semcancer.2022.05.013] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 05/16/2022] [Accepted: 05/20/2022] [Indexed: 01/27/2023]
Abstract
Understanding the complex and specific roles played by non-coding RNAs (ncRNAs), which comprise the bulk of the genome, is important for understanding virtually every hallmark of cancer. This large group of molecules plays pivotal roles in key regulatory mechanisms in various cellular processes. Regulatory mechanisms, mediated by long non-coding RNA (lncRNA) and RNA-binding protein (RBP) interactions, are well documented in several types of cancer. Their effects are enabled through networks affecting lncRNA and RBP stability, RNA metabolism including N6-methyladenosine (m6A) and alternative splicing, subcellular localization, and numerous other mechanisms involved in cancer. In this review, we discuss the reciprocal interplay between lncRNAs and RBPs and their involvement in epigenetic regulation via histone modifications, as well as their key role in resistance to cancer therapy. Other aspects of RBPs including their structural domains, provide a deeper knowledge on how lncRNAs and RBPs interact and exert their biological functions. In addition, current state-of-the-art knowledge, facilitated by machine and deep learning approaches, unravels such interactions in better details to further enhance our understanding of the field, and the potential to harness RNA-based therapeutics as an alternative treatment modality for cancer are discussed.
Collapse
Affiliation(s)
- Hibah Shaath
- Translational Cancer and Immunity Center (TCIC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar
| | - Radhakrishnan Vishnubalaji
- Translational Cancer and Immunity Center (TCIC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar
| | - Ramesh Elango
- Translational Cancer and Immunity Center (TCIC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar
| | - Ahmed Kardousha
- College of Health & Life Sciences, Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar
| | - Zeyaul Islam
- Diabetes Research Center (DRC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation, PO Box 34110, Doha, Qatar
| | - Rizwan Qureshi
- College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Qatar Foundation, PO Box 34110, Doha, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Qatar Foundation, PO Box 34110, Doha, Qatar
| | - Prasanna R Kolatkar
- College of Health & Life Sciences, Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar; Diabetes Research Center (DRC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation, PO Box 34110, Doha, Qatar
| | - Nehad M Alajez
- Translational Cancer and Immunity Center (TCIC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar; College of Health & Life Sciences, Hamad Bin Khalifa University (HBKU), Qatar Foundation (QF), PO Box 34110, Doha, Qatar.
| |
Collapse
|
11
|
Guo Z, Hui Y, Kong F, Lin X. Finding Lung-Cancer-Related lncRNAs Based on Laplacian Regularized Least Squares With Unbalanced Bi-Random Walk. Front Genet 2022; 13:933009. [PMID: 35938010 PMCID: PMC9355720 DOI: 10.3389/fgene.2022.933009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 06/03/2022] [Indexed: 11/13/2022] Open
Abstract
Lung cancer is one of the leading causes of cancer-related deaths. Thus, it is important to find its biomarkers. Furthermore, there is an increasing number of studies reporting that long noncoding RNAs (lncRNAs) demonstrate dense linkages with multiple human complex diseases. Inferring new lncRNA-disease associations help to identify potential biomarkers for lung cancer and further understand its pathogenesis, design new drugs, and formulate individualized therapeutic options for lung cancer patients. This study developed a computational method (LDA-RLSURW) by integrating Laplacian regularized least squares and unbalanced bi-random walk to discover possible lncRNA biomarkers for lung cancer. First, the lncRNA and disease similarities were computed. Second, unbalanced bi-random walk was, respectively, applied to the lncRNA and disease networks to score associations between diseases and lncRNAs. Third, Laplacian regularized least squares were further used to compute the association probability between each lncRNA-disease pair based on the computed random walk scores. LDA-RLSURW was compared using 10 classical LDA prediction methods, and the best AUC value of 0.9027 on the lncRNADisease database was obtained. We found the top 30 lncRNAs associated with lung cancers and inferred that lncRNAs TUG1, PTENP1, and UCA1 may be biomarkers of lung neoplasms, non-small–cell lung cancer, and LUAD, respectively.
Collapse
|
12
|
Li S, Wang B, Chang M, Hou R, Tian G, Tong L. A Novel Algorithm for Detecting Microsatellite Instability Based on Next-Generation Sequencing Data. Front Oncol 2022; 12:916379. [PMID: 35847873 PMCID: PMC9280483 DOI: 10.3389/fonc.2022.916379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 05/27/2022] [Indexed: 11/25/2022] Open
Abstract
Objectives Microsatellite instability (MSI) is the condition of genetic hypermutability caused by spontaneous acquisition or loss of nucleotides during the DNA replication. MSI has been discovered to be a useful immunotherapy biomarker clinically. The main DNA-based method for MSI detection is polymerase chain reaction (PCR) amplification and fragment length analysis, which are costly and laborious. Thus, we developed a novel method to detect MSI based on next-generation sequencing (NGS) data. Methods We chose six markers of MSI. After alignment and reads counting, a histogram was plotted showing the counts of different lengths for each marker. We then designed an algorithm to discover peaks in the generated histograms so that the peak numbers discovered in NGS data resembled that in PCR-based method. Results We selected nine samples as the training dataset, 101 samples for validation, and 68 samples as the test dataset from Chifeng Municipal Hospital, Inner Mongolia, China. The NGS-based method achieved 100% accuracy for the validation dataset and 98.53% accuracy for the test dataset, in which only one false positive was detected. Conclusions Accurate MSI judgments were achieved using NGS data, which could provide comparable MSI detection with the gold standard, PCR-based methods.
Collapse
Affiliation(s)
- Shijun Li
- Pathology Department, Chifeng Municipal Hospital, Chifeng, China
| | - Bo Wang
- Science Department, Geneis Beijing Co., Ltd., Beijing, China
| | - Miaomiao Chang
- Pathology Department, Chifeng Municipal Hospital, Chifeng, China
| | - Rui Hou
- Science Department, Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Science Department, Geneis Beijing Co., Ltd., Beijing, China
- *Correspondence: Geng Tian, ; Ling Tong,
| | - Ling Tong
- Pathology Department, Chifeng Municipal Hospital, Chifeng, China
- *Correspondence: Geng Tian, ; Ling Tong,
| |
Collapse
|
13
|
Zhang T, Chen L, Li R, Liu N, Huang X, Wong G. PIWI-interacting RNAs in human diseases: databases and computational models. Brief Bioinform 2022; 23:6603448. [PMID: 35667080 DOI: 10.1093/bib/bbac217] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/24/2022] [Accepted: 05/09/2022] [Indexed: 11/12/2022] Open
Abstract
PIWI-interacting RNAs (piRNAs) are short 21-35 nucleotide molecules that comprise the largest class of non-coding RNAs and found in a large diversity of species including yeast, worms, flies, plants and mammals including humans. The most well-understood function of piRNAs is to monitor and protect the genome from transposons particularly in germline cells. Recent data suggest that piRNAs may have additional functions in somatic cells although they are expressed there in far lower abundance. Compared with microRNAs (miRNAs), piRNAs have more limited bioinformatics resources available. This review collates 39 piRNA specific and non-specific databases and bioinformatics resources, describes and compares their utility and attributes and provides an overview of their place in the field. In addition, we review 33 computational models based upon function: piRNA prediction, transposon element and mRNA-related piRNA prediction, cluster prediction, signature detection, target prediction and disease association. Based on the collection of databases and computational models, we identify trends and potential gaps in tool development. We further analyze the breadth and depth of piRNA data available in public sources, their contribution to specific human diseases, particularly in cancer and neurodegenerative conditions, and highlight a few specific piRNAs that appear to be associated with these diseases. This briefing presents the most recent and comprehensive mapping of piRNA bioinformatics resources including databases, models and tools for disease associations to date. Such a mapping should facilitate and stimulate further research on piRNAs.
Collapse
Affiliation(s)
- Tianjiao Zhang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Liang Chen
- Department of Computer Science, School of Engineering, Shantou University, Shantou, China
| | - Rongzhen Li
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Ning Liu
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Xiaobing Huang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| |
Collapse
|
14
|
Song J, Tian S, Yu L, Yang Q, Dai Q, Wang Y, Wu W, Duan X. RLF-LPI: An ensemble learning framework using sequence information for predicting lncRNA-protein interaction based on AE-ResLSTM and fuzzy decision. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:4749-4764. [PMID: 35430839 DOI: 10.3934/mbe.2022222] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Long non-coding RNAs (lncRNAs) play a regulatory role in many biological cells, and the recognition of lncRNA-protein interactions is helpful to reveal the functional mechanism of lncRNAs. Identification of lncRNA-protein interaction by biological techniques is costly and time-consuming. Here, an ensemble learning framework, RLF-LPI is proposed, to predict lncRNA-protein interactions. The RLF-LPI of the residual LSTM autoencoder module with fusion attention mechanism can extract the potential representation of features and capture the dependencies between sequences and structures by k-mer method. Finally, the relationship between lncRNA and protein is learned through the method of fuzzy decision. The experimental results show that the ACC of RLF-LPI is 0.912 on ATH948 dataset and 0.921 on ZEA22133 dataset. Thus, it is demonstrated that our proposed method performed better in predicting lncRNA-protein interaction than other methods.
Collapse
Affiliation(s)
- Jinmiao Song
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830008, China
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| | - Shengwei Tian
- Department of Software, Xinjiang University, Urumqi 830008, China
- Key Laboratory of Signal and Information Processing, Xinjiang University, Urumqi 830008, China
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi 830008, China
| | - Long Yu
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830008, China
| | - Qimeng Yang
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830008, China
| | - Qiguo Dai
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| | - Yuanxu Wang
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| | - Weidong Wu
- Center for Science Education, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi 830001, China
| | - Xiaodong Duan
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| |
Collapse
|
15
|
Chen M, Deng Y, Li A, Tan Y. Inferring Latent Disease-lncRNA Associations by Label-Propagation Algorithm and Random Projection on a Heterogeneous Network. Front Genet 2022; 13:798632. [PMID: 35186029 PMCID: PMC8854791 DOI: 10.3389/fgene.2022.798632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 01/18/2022] [Indexed: 11/13/2022] Open
Abstract
Long noncoding RNA (lncRNA), a type of more than 200 nucleotides non-coding RNA, is related to various complex diseases. To precisely identify the potential lncRNA–disease association is important to understand the disease pathogenesis, to develop new drugs, and to design individualized diagnosis and treatment methods for different human diseases. Compared with the complexity and high cost of biological experiments, computational methods can quickly and effectively predict potential lncRNA–disease associations. Thus, it is a promising avenue to develop computational methods for lncRNA-disease prediction. However, owing to the low prediction accuracy ofstate of the art methods, it is vastly challenging to accurately and effectively identify lncRNA-disease at present. This article proposed an integrated method called LPARP, which is based on label-propagation algorithm and random projection to address the issue. Specifically, the label-propagation algorithm is initially used to obtain the estimated scores of lncRNA–disease associations, and then random projections are used to accurately predict disease-related lncRNAs.The empirical experiments showed that LAPRP achieved good prediction on three golddatasets, which is superior to existing state-of-the-art prediction methods. It can also be used to predict isolated diseases and new lncRNAs. Case studies of bladder cancer, esophageal squamous-cell carcinoma, and colorectal cancer further prove the reliability of the method. The proposed LPARP algorithm can predict the potential lncRNA–disease interactions stably and effectively with fewer data. LPARP can be used as an effective and reliable tool for biomedical research.
Collapse
|
16
|
Peng L, Yuan R, Shen L, Gao P, Zhou L. LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification. BioData Min 2021; 14:50. [PMID: 34861891 PMCID: PMC8642957 DOI: 10.1186/s13040-021-00277-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/22/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias. RESULTS In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717. CONCLUSIONS Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.,College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ruya Yuan
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Pengfei Gao
- College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
| |
Collapse
|
17
|
LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification. BMC Bioinformatics 2021; 22:568. [PMID: 34836494 PMCID: PMC8620196 DOI: 10.1186/s12859-021-04485-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 11/09/2021] [Indexed: 12/03/2022] Open
Abstract
Background Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate. Results Under a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717. Conclusions Integrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04485-x.
Collapse
|
18
|
LGFC-CNN: Prediction of lncRNA-Protein Interactions by Using Multiple Types of Features through Deep Learning. Genes (Basel) 2021; 12:genes12111689. [PMID: 34828296 PMCID: PMC8621699 DOI: 10.3390/genes12111689] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 10/11/2021] [Accepted: 10/22/2021] [Indexed: 12/12/2022] Open
Abstract
Long noncoding RNA (lncRNA) plays a crucial role in many critical biological processes and participates in complex human diseases through interaction with proteins. Considering that identifying lncRNA–protein interactions through experimental methods is expensive and time-consuming, we propose a novel method based on deep learning that combines raw sequence composition features, hand-designed features and structure features, called LGFC-CNN, to predict lncRNA–protein interactions. The two sequence preprocessing methods and CNN modules (GloCNN and LocCNN) are utilized to extract the raw sequence global and local features. Meanwhile, we select hand-designed features by comparing the predictive effect of different lncRNA and protein features combinations. Furthermore, we obtain the structure features and unifying the dimensions through Fourier transform. In the end, the four types of features are integrated to comprehensively predict the lncRNA–protein interactions. Compared with other state-of-the-art methods on three lncRNA–protein interaction datasets, LGFC-CNN achieves the best performance with an accuracy of 94.14%, on RPI21850; an accuracy of 92.94%, on RPI7317; and an accuracy of 98.19% on RPI1847. The results show that our LGFC-CNN can effectively predict the lncRNA–protein interactions by combining raw sequence composition features, hand-designed features and structure features.
Collapse
|
19
|
Sabol M, Calleja-Agius J, Di Fiore R, Suleiman S, Ozcan S, Ward MP, Ozretić P. (In)Distinctive Role of Long Non-Coding RNAs in Common and Rare Ovarian Cancers. Cancers (Basel) 2021; 13:cancers13205040. [PMID: 34680193 PMCID: PMC8534192 DOI: 10.3390/cancers13205040] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 10/04/2021] [Accepted: 10/06/2021] [Indexed: 02/05/2023] Open
Abstract
Rare ovarian cancers (ROCs) are OCs with an annual incidence of fewer than 6 cases per 100,000 women. They affect women of all ages, but due to their low incidence and the potential clinical inexperience in management, there can be a delay in diagnosis, leading to a poor prognosis. The underlying causes for these tumors are varied, but generally, the tumors arise due to alterations in gene/protein expression in cellular processes that regulate normal proliferation and its checkpoints. Dysregulation of the cellular processes that lead to cancer includes gene mutations, epimutations, non-coding RNA (ncRNA) regulation, posttranscriptional and posttranslational modifications. Long non-coding RNA (lncRNA) are defined as transcribed RNA molecules, more than 200 nucleotides in length which are not translated into proteins. They regulate gene expression through several mechanisms and therefore add another level of complexity to the regulatory mechanisms affecting tumor development. Since few studies have been performed on ROCs, in this review we summarize the mechanisms of action of lncRNA in OC, with an emphasis on ROCs.
Collapse
Affiliation(s)
- Maja Sabol
- Laboratory for Hereditary Cancer, Division of Molecular Medicine, Ruđer Bošković Institute, HR-10000 Zagreb, Croatia;
| | - Jean Calleja-Agius
- Department of Anatomy, Faculty of Medicine and Surgery, University of Malta, MSD 2080 Msida, Malta; (J.C.-A.); (R.D.F.); (S.S.)
| | - Riccardo Di Fiore
- Department of Anatomy, Faculty of Medicine and Surgery, University of Malta, MSD 2080 Msida, Malta; (J.C.-A.); (R.D.F.); (S.S.)
- Sbarro Institute for Cancer Research and Molecular Medicine, Center for Biotechnology, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA
| | - Sherif Suleiman
- Department of Anatomy, Faculty of Medicine and Surgery, University of Malta, MSD 2080 Msida, Malta; (J.C.-A.); (R.D.F.); (S.S.)
| | - Sureyya Ozcan
- Department of Chemistry, Middle East Technical University (METU), 06800 Ankara, Turkey;
- Cancer Systems Biology Laboratory (CanSyl), Middle East Technical University (METU), 06800 Ankara, Turkey
| | - Mark P. Ward
- Department of Histopathology, Trinity St James’s Cancer Institute, Emer Casey Molecular Pathology Laboratory, Trinity College Dublin and Coombe Women’s and Infants University Hospital, D08 RX0X Dublin, Ireland;
| | - Petar Ozretić
- Laboratory for Hereditary Cancer, Division of Molecular Medicine, Ruđer Bošković Institute, HR-10000 Zagreb, Croatia;
- Correspondence: ; Tel.: +385-(1)-4571292
| |
Collapse
|
20
|
Zhou L, Wang Z, Tian X, Peng L. LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA-protein interaction identification. BMC Bioinformatics 2021; 22:479. [PMID: 34607567 PMCID: PMC8489074 DOI: 10.1186/s12859-021-04399-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/14/2021] [Indexed: 12/31/2022] Open
Abstract
Background Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA–protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins. Results Under the feed-forward deep architecture based on gradient boosting decision trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA–protein pair. Finally, a deep architecture composed of forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with five classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, LPI-SKF, and LPI-HNM) under three cross validations on lncRNAs, proteins, lncRNA–protein pairs, respectively. It obtains the best average AUC and AUPR values under the majority of situations, significantly outperforming other five LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637. Conclusions Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Zhao Wang
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China. .,College of Life Sciences and Chemistry, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China.
| |
Collapse
|
21
|
Tian X, Shen L, Wang Z, Zhou L, Peng L. A novel lncRNA-protein interaction prediction method based on deep forest with cascade forest structure. Sci Rep 2021; 11:18881. [PMID: 34556758 PMCID: PMC8460650 DOI: 10.1038/s41598-021-98277-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 08/18/2021] [Indexed: 02/08/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) regulate many biological processes by interacting with corresponding RNA-binding proteins. The identification of lncRNA-protein Interactions (LPIs) is significantly important to well characterize the biological functions and mechanisms of lncRNAs. Existing computational methods have been effectively applied to LPI prediction. However, the majority of them were evaluated only on one LPI dataset, thereby resulting in prediction bias. More importantly, part of models did not discover possible LPIs for new lncRNAs (or proteins). In addition, the prediction performance remains limited. To solve with the above problems, in this study, we develop a Deep Forest-based LPI prediction method (LPIDF). First, five LPI datasets are obtained and the corresponding sequence information of lncRNAs and proteins are collected. Second, features of lncRNAs and proteins are constructed based on four-nucleotide composition and BioSeq2vec with encoder-decoder structure, respectively. Finally, a deep forest model with cascade forest structure is developed to find new LPIs. We compare LPIDF with four classical association prediction models based on three fivefold cross validations on lncRNAs, proteins, and LPIs. LPIDF obtains better average AUCs of 0.9012, 0.6937 and 0.9457, and the best average AUPRs of 0.9022, 0.6860, and 0.9382, respectively, for the three CVs, significantly outperforming other methods. The results show that the lncRNA FTX may interact with the protein P35637 and needs further validation.
Collapse
Affiliation(s)
- Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China
| | - Zhenwu Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China.
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China.
| |
Collapse
|
22
|
Yao Y, Ji B, Lv Y, Li L, Xiang J, Liao B, Gao W. Predicting LncRNA-Disease Association by a Random Walk With Restart on Multiplex and Heterogeneous Networks. Front Genet 2021; 12:712170. [PMID: 34490041 PMCID: PMC8417042 DOI: 10.3389/fgene.2021.712170] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 07/23/2021] [Indexed: 02/05/2023] Open
Abstract
Studies have found that long non-coding RNAs (lncRNAs) play important roles in many human biological processes, and it is critical to explore potential lncRNA–disease associations, especially cancer-associated lncRNAs. However, traditional biological experiments are costly and time-consuming, so it is of great significance to develop effective computational models. We developed a random walk algorithm with restart on multiplex and heterogeneous networks of lncRNAs and diseases to predict lncRNA–disease associations (MHRWRLDA). First, multiple disease similarity networks are constructed by using different approaches to calculate similarity scores between diseases, and multiple lncRNA similarity networks are also constructed by using different approaches to calculate similarity scores between lncRNAs. Then, a multiplex and heterogeneous network was constructed by integrating multiple disease similarity networks and multiple lncRNA similarity networks with the lncRNA–disease associations, and a random walk with restart on the multiplex and heterogeneous network was performed to predict lncRNA–disease associations. The results of Leave-One-Out cross-validation (LOOCV) showed that the value of Area under the curve (AUC) was 0.68736, which was improved compared with the classical algorithm in recent years. Finally, we confirmed a few novel predicted lncRNAs associated with specific diseases like colon cancer by literature mining. In summary, MHRWRLDA contributes to predict lncRNA–disease associations.
Collapse
Affiliation(s)
- Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Ministry of Education, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou, China
| | - Binbin Ji
- Geneis Beijing Co., Ltd., Beijing, China
| | - Yaping Lv
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ling Li
- Basic Courses Department, Zhejiang Shuren University, Hangzhou, China
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha, China.,Department of Basic Medical Sciences, Changsha Medical University, Changsha, China.,Department of Computer Science, Changsha Medical University, Changsha, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Wei Gao
- Departments of Internal Medicine-Oncology, Fujian Cancer Hospital & Fujian Medical University Cancer Hospital, Fuzhou, China
| |
Collapse
|
23
|
Subramaniam N, Nair R, Marsden PA. Epigenetic Regulation of the Vascular Endothelium by Angiogenic LncRNAs. Front Genet 2021; 12:668313. [PMID: 34512715 PMCID: PMC8427604 DOI: 10.3389/fgene.2021.668313] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 05/17/2021] [Indexed: 12/15/2022] Open
Abstract
The functional properties of the vascular endothelium are diverse and heterogeneous between vascular beds. This is especially evident when new blood vessels develop from a pre-existing closed cardiovascular system, a process termed angiogenesis. Endothelial cells are key drivers of angiogenesis as they undergo a highly choreographed cascade of events that has both exogenous (e.g., hypoxia and VEGF) and endogenous regulatory inputs. Not surprisingly, angiogenesis is critical in health and disease. Diverse therapeutics target proteins involved in coordinating angiogenesis with varying degrees of efficacy. It is of great interest that recent work on non-coding RNAs, especially long non-coding RNAs (lncRNAs), indicates that they are also important regulators of the gene expression paradigms that underpin this cellular cascade. The protean effects of lncRNAs are dependent, in part, on their subcellular localization. For instance, lncRNAs enriched in the nucleus can act as epigenetic modifiers of gene expression in the vascular endothelium. Of great interest to genetic disease, they are undergoing rapid evolution and show extensive inter- and intra-species heterogeneity. In this review, we describe endothelial-enriched lncRNAs that have robust effects in angiogenesis.
Collapse
Affiliation(s)
- Noeline Subramaniam
- Marsden Lab, Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
- Marsden Lab, Keenan Research Centre in the Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON, Canada
| | - Ranju Nair
- Marsden Lab, Keenan Research Centre in the Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON, Canada
- Marsden Lab, Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
| | - Philip A. Marsden
- Marsden Lab, Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
- Marsden Lab, Keenan Research Centre in the Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON, Canada
- Marsden Lab, Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Medicine, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
24
|
Yao Y, Ji B, Lv Y, Li L, Xiang J, Liao B, Gao W. Predicting LncRNA–Disease Association by a Random Walk With Restart on Multiplex and Heterogeneous Networks. Front Genet 2021. [DOI: https:/doi.org/10.3389/fgene.2021.712170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Studies have found that long non-coding RNAs (lncRNAs) play important roles in many human biological processes, and it is critical to explore potential lncRNA–disease associations, especially cancer-associated lncRNAs. However, traditional biological experiments are costly and time-consuming, so it is of great significance to develop effective computational models. We developed a random walk algorithm with restart on multiplex and heterogeneous networks of lncRNAs and diseases to predict lncRNA–disease associations (MHRWRLDA). First, multiple disease similarity networks are constructed by using different approaches to calculate similarity scores between diseases, and multiple lncRNA similarity networks are also constructed by using different approaches to calculate similarity scores between lncRNAs. Then, a multiplex and heterogeneous network was constructed by integrating multiple disease similarity networks and multiple lncRNA similarity networks with the lncRNA–disease associations, and a random walk with restart on the multiplex and heterogeneous network was performed to predict lncRNA–disease associations. The results of Leave-One-Out cross-validation (LOOCV) showed that the value of Area under the curve (AUC) was 0.68736, which was improved compared with the classical algorithm in recent years. Finally, we confirmed a few novel predicted lncRNAs associated with specific diseases like colon cancer by literature mining. In summary, MHRWRLDA contributes to predict lncRNA–disease associations.
Collapse
|
25
|
Sun X, Cheng L, Liu J, Xie C, Yang J, Li F. Predicting lncRNA-Protein Interaction With Weighted Graph-Regularized Matrix Factorization. Front Genet 2021; 12:690096. [PMID: 34335693 PMCID: PMC8322775 DOI: 10.3389/fgene.2021.690096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/21/2021] [Indexed: 11/13/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are widely concerned because of their close associations with many key biological activities. Though precise functions of most lncRNAs are unknown, research works show that lncRNAs usually exert biological function by interacting with the corresponding proteins. The experimental validation of interactions between lncRNAs and proteins is costly and time-consuming. In this study, we developed a weighted graph-regularized matrix factorization (LPI-WGRMF) method to find unobserved lncRNA-protein interactions (LPIs) based on lncRNA similarity matrix, protein similarity matrix, and known LPIs. We compared our proposed LPI-WGRMF method with five classical LPI prediction methods, that is, LPBNI, LPI-IBNRA, LPIHN, RWR, and collaborative filtering (CF). The results demonstrate that the LPI-WGRMF method can produce high-accuracy performance, obtaining an AUC score of 0.9012 and AUPR of 0.7324. The case study showed that SFPQ, SNHG3, and PRPF31 may associate with Q9NUL5, Q9NUL5, and Q9UKV8 with the highest linking probabilities and need to further experimental validation.
Collapse
Affiliation(s)
- Xibo Sun
- Yidu Central Hospital of Weifang, Weifang, China
| | | | - Jinyang Liu
- Geneis Beijing Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Cuinan Xie
- Geneis Beijing Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Jiasheng Yang
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Fu Li
- Department of Thoracic Surgery, The Second Affiliated Hospital of Hainan Medical University, Haikou, China
| |
Collapse
|
26
|
Decoding LncRNAs. Cancers (Basel) 2021; 13:cancers13112643. [PMID: 34072257 PMCID: PMC8199187 DOI: 10.3390/cancers13112643] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/23/2021] [Accepted: 05/25/2021] [Indexed: 02/07/2023] Open
Abstract
Non-coding RNAs (ncRNAs) have been considered as unimportant additions to the transcriptome. Yet, in light of numerous studies, it has become clear that ncRNAs play important roles in development, health and disease. Long-ignored, long non-coding RNAs (lncRNAs), ncRNAs made of more than 200 nucleotides have gained attention due to their involvement as drivers or suppressors of a myriad of tumours. The detailed understanding of some of their functions, structures and interactomes has been the result of interdisciplinary efforts, as in many cases, new methods need to be created or adapted to characterise these molecules. Unlike most reviews on lncRNAs, we summarize the achievements on lncRNA studies by taking into consideration the approaches for identification of lncRNA functions, interactomes, and structural arrangements. We also provide information about the recent data on the involvement of lncRNAs in diseases and present applications of these molecules, especially in medicine.
Collapse
|
27
|
Pinkney HR, Wright BM, Diermeier SD. The lncRNA Toolkit: Databases and In Silico Tools for lncRNA Analysis. Noncoding RNA 2020; 6:E49. [PMID: 33339309 PMCID: PMC7768357 DOI: 10.3390/ncrna6040049] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 12/14/2020] [Accepted: 12/15/2020] [Indexed: 02/07/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are a rapidly expanding field of research, with many new transcripts identified each year. However, only a small subset of lncRNAs has been characterized functionally thus far. To aid investigating the mechanisms of action by which new lncRNAs act, bioinformatic tools and databases are invaluable. Here, we review a selection of computational tools and databases for the in silico analysis of lncRNAs, including tissue-specific expression, protein coding potential, subcellular localization, structural conformation, and interaction partners. The assembled lncRNA toolkit is aimed primarily at experimental researchers as a useful starting point to guide wet-lab experiments, mainly containing multi-functional, user-friendly interfaces. With more and more new lncRNA analysis tools available, it will be essential to provide continuous updates and maintain the availability of key software in the future.
Collapse
Affiliation(s)
| | | | - Sarah D. Diermeier
- Department of Biochemistry, University of Otago, Dunedin 9016, New Zealand; (H.R.P.); (B.M.W.)
| |
Collapse
|
28
|
Peng L, Shen L, Liao L, Liu G, Zhou L. RNMFMDA: A Microbe-Disease Association Identification Method Based on Reliable Negative Sample Selection and Logistic Matrix Factorization With Neighborhood Regularization. Front Microbiol 2020; 11:592430. [PMID: 33193260 PMCID: PMC7652725 DOI: 10.3389/fmicb.2020.592430] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 09/17/2020] [Indexed: 12/22/2022] Open
Abstract
Microbes with abnormal levels have important impacts on the formation and development of various complex diseases. Identifying possible Microbe-Disease Associations (MDAs) helps to understand the mechanisms of complex diseases. However, experimental methods for MDA identification are costly and time-consuming. In this study, a new computational model, RNMFMDA, was developed to find possible MDAs. RNMFMDA contains two main processes. First, Reliable Negative MDA samples were selected based on Positive-Unlabeled (PU) learning and random walk with restart on the heterogeneous microbe-disease network. Second, Logistic Matrix Factorization with Neighborhood Regularization (LMFNR) was developed to compute the association probabilities for all microbe-disease pairs. To evaluate the performance of the proposed RNMFMDA method, we compared RNMFMDA with five state-of-the-art MDA prediction methods based on five-fold cross-validations on microbes, diseases, and MDAs. As a result, RNMFMDA obtained the best AUCs of 0.6332, 0.8669, and 0.9081, respectively for the three five-fold cross validations, significantly outperforming other models. The promising prediction performance may be attributed to the following three features: highly quality negative MDA sample selection, LMFNR-based MDA prediction model, and various biological information integration. In addition, a few predicted microbe-disease pairs with high association scores are worthy of further experimental validation.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Longjie Liao
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Guangyi Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
29
|
Bian W, Chen W, Jiang X, Qu H, Jiang J, Yang J, Liang X, Zhao B, Sun Y, Zhang C. Downregulation of Long Non-coding RNA Nuclear Paraspeckle Assembly Transcript 1 Inhibits MEG-01 Differentiation and Platelet-Like Particles Activity. Front Genet 2020; 11:571467. [PMID: 33193674 PMCID: PMC7596361 DOI: 10.3389/fgene.2020.571467] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 09/22/2020] [Indexed: 01/22/2023] Open
Abstract
Platelets are derived from megakaryocytes and play an important role in blood coagulation. By using high throughput sequencing, we have found that the long non-coding RNA (lncRNA) nuclear paraspeckle assembly transcript 1 (NEAT1) is abundant in platelets (GEO ID: 200097348). However, little is known about its role in regulating megakaryocyte differentiation and platelet activity. This study aims to clarify the effect of NEAT1 on MEG-01 differentiation and platelet-like particle (PLP) activity. NEAT1 in MEG-01 cells was knocked down by siRNA transfection. The adhesion of MEG-01 and PLP to collagen-coated coverslips was observed under a fluorescence microscope. Flow cytometry was used to investigate cell apoptosis, cell cycle, the levels of D41/CD42b on MEG-01 cells and CD62P on PLPs. Quantitative real-time polymerase chain reaction was used to detect NEAT1 and IL-8 expression levels. Western blot was used to measure the protein levels of Bcl-2, Bax, cleaved caspase-3, and IL-8. RNA-binding protein immunoprecipitation was used to detect the interaction of NEAT1 and splicing factor proline/glutamine-rich (SFPQ). Results showed that NEAT1 knockdown decreased the adhesion ability of thrombin-stimulated MEG-01 and PLP. The expression of CD62P on PLPs and CD41/CD42b on MEG-01 cells was inhibited by NEAT1 knockdown. In addition, NEAT1 knockdown inhibited cell apoptosis with increased Bcl2/Bax ratio and decreased cleaved caspase-3, and reduced the percentage of cells in the G0/G1 phase. Meanwhile, NEAT1 knockdown inhibited the expression of IL-8. A strong interaction of NEAT1 and SFPQ, a transcriptional repressor of IL-8, was identified. NEAT1 knockdown reduced the interaction between SFPQ and NEAT1.The results suggest that lncRNA NEAT1 knockdown decreases MEG-01 differentiation, PLP activity, and IL-8 level. The results also indicate that the regulation of NEAT1 on IL-8 may be realized via a direct interaction between NEAT1 and SFPQ.
Collapse
Affiliation(s)
- Weihua Bian
- School of Pharmacy, Binzhou Medical University, Yantai, China
| | - Wangping Chen
- Department of Cardiovascular Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Xiaoli Jiang
- School of Pharmacy, Binzhou Medical University, Yantai, China
| | - Huiqing Qu
- Department of Blood Transfusion, Affiliated Hospital of Binzhou Medical University, Binzhou, China
| | - Jing Jiang
- School of Pharmacy, Binzhou Medical University, Yantai, China
| | - Jinfu Yang
- Department of Cardiovascular Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Xinyue Liang
- School of Pharmacy, Binzhou Medical University, Yantai, China
| | - Bingrui Zhao
- School of Pharmacy, Binzhou Medical University, Yantai, China
| | - Yeying Sun
- School of Pharmacy, Binzhou Medical University, Yantai, China
| | - Chunxiang Zhang
- School of Pharmacy, Binzhou Medical University, Yantai, China
| |
Collapse
|
30
|
Wekesa JS, Meng J, Luan Y. A deep learning model for plant lncRNA-protein interaction prediction with graph attention. Mol Genet Genomics 2020; 295:1091-1102. [DOI: 10.1007/s00438-020-01682-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 05/01/2020] [Indexed: 02/06/2023]
|