1
|
Dunkel H, Wehrmann H, Jensen LR, Kuss AW, Simm S. MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding. Int J Mol Sci 2023; 24:8884. [PMID: 37240230 PMCID: PMC10218863 DOI: 10.3390/ijms24108884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/11/2023] [Accepted: 05/13/2023] [Indexed: 05/28/2023] Open
Abstract
Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.
Collapse
Affiliation(s)
- Heiko Dunkel
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| | - Henning Wehrmann
- Department of Biosciences, Molecular Cell Biology of Plants, Goethe University, 60438 Frankfurt am Main, Germany
| | - Lars R. Jensen
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Andreas W. Kuss
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Stefan Simm
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| |
Collapse
|
2
|
Qian F, Guo J, Jiang Z, Shen B. Translational Bioinformatics for Cholangiocarcinoma: Opportunities and Challenges. Int J Biol Sci 2018; 14:920-929. [PMID: 29989102 PMCID: PMC6036745 DOI: 10.7150/ijbs.24622] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 02/02/2018] [Indexed: 02/07/2023] Open
Abstract
Translational bioinformatics is becoming a driven force and a new scientific paradigm for cancer research in the era of big data. To promote the cross-disciplinary communication and research, we take cholangiocarcinoma as an example to review the present status and the future perspectives of the bioinformatics models applied in cancer study. We first summarize the present application of computational methods to the study of cholangiocarcinoma ranged from pattern recognition of biological data, knowledge based data annotation to systems biological level modeling and clinical translation. Then the future opportunities and challenges about database or knowledge base building, novel model developing and molecular mechanism exploring as well as the intelligent decision supporting system construction for the precision diagnosis, prognosis and treatment of cholangiocarcinoma are discussed.
Collapse
Affiliation(s)
- Fuliang Qian
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Junping Guo
- The Affiliated Yixing Hospital of Jiangsu University, Yixing, 214200, China
| | - Zhi Jiang
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Bairong Shen
- Center for Systems Biology, Soochow University, Suzhou 215006, China.,Guizhou University School of Medicine, Guiyang, 550025, China.,Institute for Systems Genetics, West China Hospital, Sichuan University, Chengdu, 610041, China
| |
Collapse
|
3
|
Genomic Insight into the Role of lncRNA in Cancer Susceptibility. Int J Mol Sci 2017; 18:ijms18061239. [PMID: 28598379 PMCID: PMC5486062 DOI: 10.3390/ijms18061239] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Revised: 06/06/2017] [Accepted: 06/07/2017] [Indexed: 12/13/2022] Open
Abstract
With the development of advanced genomic methods, a large amount of long non-coding RNAs (lncRNAs) has been found to be important for cancer initiation and progression. Given that most of the genome-wide association study (GWAS)-identified cancer risk SNPs are located in the noncoding region, the expression and function of lncRNAs are more likely to be affected by the SNPs. The SNPs may affect the expression of lncRNAs directly through disrupting the binding of transcription factors or indirectly by affecting the expression of regulatory factors. Moreover, SNPs may disrupt the interaction between lncRNAs and other RNAs or proteins. Unveiling the relationship of lncRNA, protein-coding genes, transcription factors and miRNAs from the angle of genomics will improve the accuracy of disease prediction and help find new therapeutic targets.
Collapse
|
4
|
Identification and analysis of the metacaspase gene family in tomato. Biochem Biophys Res Commun 2016; 479:523-529. [DOI: 10.1016/j.bbrc.2016.09.103] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 09/20/2016] [Indexed: 11/23/2022]
|
5
|
Singh NK. microRNAs Databases: Developmental Methodologies, Structural and Functional Annotations. Interdiscip Sci 2016; 9:357-377. [PMID: 27021491 DOI: 10.1007/s12539-016-0166-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 02/08/2016] [Accepted: 03/11/2016] [Indexed: 12/31/2022]
Abstract
microRNA (miRNA) is an endogenous and evolutionary conserved non-coding RNA, involved in post-transcriptional process as gene repressor and mRNA cleavage through RNA-induced silencing complex (RISC) formation. In RISC, miRNA binds in complementary base pair with targeted mRNA along with Argonaut proteins complex, causes gene repression or endonucleolytic cleavage of mRNAs and results in many diseases and syndromes. After the discovery of miRNA lin-4 and let-7, subsequently large numbers of miRNAs were discovered by low-throughput and high-throughput experimental techniques along with computational process in various biological and metabolic processes. The miRNAs are important non-coding RNA for understanding the complex biological phenomena of organism because it controls the gene regulation. This paper reviews miRNA databases with structural and functional annotations developed by various researchers. These databases contain structural and functional information of animal, plant and virus miRNAs including miRNAs-associated diseases, stress resistance in plant, miRNAs take part in various biological processes, effect of miRNAs interaction on drugs and environment, effect of variance on miRNAs, miRNAs gene expression analysis, sequence of miRNAs, structure of miRNAs. This review focuses on the developmental methodology of miRNA databases such as computational tools and methods used for extraction of miRNAs annotation from different resources or through experiment. This study also discusses the efficiency of user interface design of every database along with current entry and annotations of miRNA (pathways, gene ontology, disease ontology, etc.). Here, an integrated schematic diagram of construction process for databases is also drawn along with tabular and graphical comparison of various types of entries in different databases. Aim of this paper is to present the importance of miRNAs-related resources at a single place.
Collapse
Affiliation(s)
- Nagendra Kumar Singh
- Department of Biological Science and Engineering, Maulana Azad National Institute of Technology, Bhopal, M.P., 462003, India.
| |
Collapse
|
6
|
Pal JK, Ray SS, Pal SK. Identifying relevant group of miRNAs in cancer using fuzzy mutual information. Med Biol Eng Comput 2015; 54:701-10. [PMID: 26264058 DOI: 10.1007/s11517-015-1360-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 07/21/2015] [Indexed: 12/17/2022]
Abstract
MicroRNAs (miRNAs) act as a major biomarker of cancer. All miRNAs in human body are not equally important for cancer identification. We propose a methodology, called FMIMS, which automatically selects the most relevant miRNAs for a particular type of cancer. In FMIMS, miRNAs are initially grouped by using a SVM-based algorithm; then the group with highest relevance is determined and the miRNAs in that group are finally ranked for selection according to their redundancy. Fuzzy mutual information is used in computing the relevance of a group and the redundancy of miRNAs within it. Superiority of the most relevant group to all others, in deciding normal or cancer, is demonstrated on breast, renal, colorectal, lung, melanoma and prostate data. The merit of FMIMS as compared to several existing methods is established. While 12 out of 15 selected miRNAs by FMIMS corroborate with those of biological investigations, three of them viz., "hsa-miR-519," "hsa-miR-431" and "hsa-miR-320c" are possible novel predictions for renal cancer, lung cancer and melanoma, respectively. The selected miRNAs are found to be involved in disease-specific pathways by targeting various genes. The method is also able to detect the responsible miRNAs even at the primary stage of cancer. The related code is available at http://www.jayanta.droppages.com/FMIMS.html .
Collapse
Affiliation(s)
- Jayanta Kumar Pal
- Center for Soft Computing Research, Indian Statistical Institute, Kolkata, India.
| | - Shubhra Sankar Ray
- Center for Soft Computing Research, Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| | - Sankar K Pal
- Center for Soft Computing Research, Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
7
|
Zhang C, Gong P, Wei R, Li S, Zhang X, Yu Y, Wang Y. The metacaspase gene family of Vitis vinifera L.: Characterization and differential expression during ovule abortion in stenospermocarpic seedless grapes. Gene 2013; 528:267-76. [DOI: 10.1016/j.gene.2013.06.062] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Revised: 06/06/2013] [Accepted: 06/14/2013] [Indexed: 01/12/2023]
|
8
|
Reineke AR, Bornberg-Bauer E, Gu J. Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes. Nucleic Acids Res 2011; 39:6029-43. [PMID: 21470961 PMCID: PMC3152334 DOI: 10.1093/nar/gkr179] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2010] [Revised: 02/22/2011] [Accepted: 03/15/2011] [Indexed: 12/17/2022] Open
Abstract
The discovery of regulatory motifs embedded in upstream regions of plants is a particularly challenging bioinformatics task. Previous studies have shown that motifs in plants are short compared with those found in vertebrates. Furthermore, plant genomes have undergone several diversification mechanisms such as genome duplication events which impact the evolution of regulatory motifs. In this article, a systematic phylogenomic comparison of upstream regions is conducted to further identify features of the plant regulatory genomes, the component of genomes regulating gene expression, to enable future de novo discoveries. The findings highlight differences in upstream region properties between major plant groups and the effects of divergence times and duplication events. First, clear differences in upstream region evolution can be detected between monocots and dicots, thus suggesting that a separation of these groups should be made when searching for novel regulatory motifs, particularly since universal motifs such as the TATA box are rare. Second, investigating the decay rate of significantly aligned regions suggests that a divergence time of ~100 mya sets a limit for reliable conserved non-coding sequence (CNS) detection. Insights presented here will set a framework to help identify embedded motifs of functional relevance by understanding the limits of bioinformatics detection for CNSs.
Collapse
Affiliation(s)
| | | | - Jenny Gu
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, 48149, Münster, Germany
| |
Collapse
|
9
|
Tian F, Shah PK, Liu X, Negre N, Chen J, Karpenko O, White KP, Grossman RL. Flynet: a genomic resource for Drosophila melanogaster transcriptional regulatory networks. Bioinformatics 2009; 25:3001-4. [PMID: 19656951 PMCID: PMC2773252 DOI: 10.1093/bioinformatics/btp469] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Motivation: The highly coordinated expression of thousands of genes in an organism is regulated by the concerted action of transcription factors, chromatin proteins and epigenetic mechanisms. High-throughput experimental data for genome wide in vivo protein–DNA interactions and epigenetic marks are becoming available from large projects, such as the model organism ENCyclopedia Of DNA Elements (modENCODE) and from individual labs. Dissemination and visualization of these datasets in an explorable form is an important challenge. Results: To support research on Drosophila melanogaster transcription regulation and make the genome wide in vivo protein–DNA interactions data available to the scientific community as a whole, we have developed a system called Flynet. Currently, Flynet contains 101 datasets for 38 transcription factors and chromatin regulator proteins in different experimental conditions. These factors exhibit different types of binding profiles ranging from sharp localized peaks to broad binding regions. The protein–DNA interaction data in Flynet was obtained from the analysis of chromatin immunoprecipitation experiments on one color and two color genomic tiling arrays as well as chromatin immunoprecipitation followed by massively parallel sequencing. A web-based interface, integrated with an AJAX based genome browser, has been built for queries and presenting analysis results. Flynet also makes available the cis-regulatory modules reported in literature, known and de novo identified sequence motifs across the genome, and other resources to study gene regulation. Contact:grossman@uic.edu Availability: Flynet is available at https://www.cistrack.org/flynet/. Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Feng Tian
- School of Medicine, Tsinghua University, Beijing, China 100084
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Johnson LA, Zhao Y, Golden K, Barolo S. Reverse-engineering a transcriptional enhancer: a case study in Drosophila. Tissue Eng Part A 2009; 14:1549-59. [PMID: 18687053 DOI: 10.1089/ten.tea.2008.0074] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Enhancers, or cis-regulatory elements, are the principal determinants of spatiotemporal patterning of gene expression. For reasons of clinical and research utility, it is desirable to build customized enhancers that drive novel gene expression patterns, but currently, we largely rely on "found" genomic elements. Synthetic enhancers, assembled from transcription factor binding sites taken from natural signal-regulated enhancers, generally fail to behave like their wild-type counterparts when placed in transgenic animals, suggesting that important aspects of enhancer function are still unexplored. As a step toward the creation of a truly synthetic regulatory element, we have undertaken an extensive structure-function study of an enhancer of the Drosophila decapentaplegic (dpp) gene that drives expression in the developing visceral mesoderm (VM). Although considerable past efforts have been made to dissect the dppVM enhancer, transgenic experiments presented here indicate that its activity cannot be explained by the known regulators alone. dppVM contains multiple, previously uncharacterized, regulatory sites, some of which exhibit functional redundancy. The results presented here suggest that even the best-studied enhancers must be further dissected before they can be fully understood, and before faithful synthetic elements based on them can be created. Implications for developmental genetics, mathematical modeling, and therapeutic applications are discussed.
Collapse
Affiliation(s)
- Lisa A Johnson
- Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | | | | | | |
Collapse
|
11
|
|