1
|
Wu W, Pang CNI, Mediati DG, Tree JJ. The functional small RNA interactome reveals targets for the vancomycin-responsive sRNA RsaOI in vancomycin-tolerant Staphylococcus aureus. mSystems 2024; 9:e0097123. [PMID: 38534138 PMCID: PMC11019875 DOI: 10.1128/msystems.00971-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
Small RNAs have been found to control a broad range of bacterial phenotypes including tolerance to antibiotics. Vancomycin tolerance in multidrug resistance Staphylococcus aureus is correlated with dysregulation of small RNAs although their contribution to antibiotic tolerance is poorly understood. RNA-RNA interactome profiling techniques are expanding our understanding of sRNA-mRNA interactions in bacteria; however, determining the function of these interactions for hundreds of sRNA-mRNA pairs is a major challenge. At steady-state, protein and mRNA abundances are often highly correlated and lower than expected protein abundance may indicate translational repression of an mRNA. To identify sRNA-mRNA interactions that regulate mRNA translation, we examined the correlation between gene transcript abundance, ribosome occupancy, and protein levels. We used the machine learning technique self-organizing maps (SOMs) to cluster genes with similar transcription and translation patterns and identified a cluster of mRNAs that appeared to be post-transcriptionally repressed. By integrating our clustering with sRNA-mRNA interactome data generated in vancomycin-tolerant S. aureus by RNase III-CLASH, we identified sRNAs that may be mediating translational repression. We have confirmed sRNA-dependant post-transcriptional repression of several mRNAs in this cluster. Two of these interactions are mediated by RsaOI, a sRNA that is highly upregulated by vancomycin. We demonstrate the regulation of HPr and the cell-wall autolysin Atl. These findings suggest that RsaOI coordinates carbon metabolism and cell wall turnover during vancomycin treatment. IMPORTANCE The emergence of multidrug-resistant Staphylococcus aureus (MRSA) is a major public health concern. Current treatment is dependent on the efficacy of last-line antibiotics like vancomycin. The most common cause of vancomycin treatment failure is strains with intermediate resistance or tolerance that arise through the acqusition of a diverse repertoire of point mutations. These strains have been shown to altered small RNA (sRNA) expression in response to antibiotic treatment. Here, we have used a technique termed RNase III-CLASH to capture sRNA interactions with their target mRNAs. To understand the function of these interactions, we have looked at RNA and protein abundance for mRNAs targeted by sRNAs. Messenger RNA and protein levels are generally well correlated and we use deviations from this correlation to infer post-transcriptional regulation and the function of individual sRNA-mRNA interactions. Using this approach we identify mRNA targets of the vancomycin-induced sRNA, RsaOI, that are repressed at the translational level. We find that RsaOI represses the cell wall autolysis Atl and carbon transporter HPr suggestion a link between vancomycin treatment and suppression of cell wall turnover and carbon metabolism.
Collapse
Affiliation(s)
- Winton Wu
- School of Biotechnology and Biomolecular Sciences, Sydney, New South Wales, Australia
| | | | - Daniel G. Mediati
- School of Biotechnology and Biomolecular Sciences, Sydney, New South Wales, Australia
| | - Jai Justin Tree
- School of Biotechnology and Biomolecular Sciences, Sydney, New South Wales, Australia
| |
Collapse
|
2
|
Stegmayer G, Di Persia LE, Rubiolo M, Gerard M, Pividori M, Yones C, Bugnon LA, Rodriguez T, Raad J, Milone DH. Predicting novel microRNA: a comprehensive comparison of machine learning approaches. Brief Bioinform 2020; 20:1607-1620. [PMID: 29800232 DOI: 10.1093/bib/bby037] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 03/26/2018] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION The importance of microRNAs (miRNAs) is widely recognized in the community nowadays because these short segments of RNA can play several roles in almost all biological processes. The computational prediction of novel miRNAs involves training a classifier for identifying sequences having the highest chance of being precursors of miRNAs (pre-miRNAs). The big issue with this task is that well-known pre-miRNAs are usually few in comparison with the hundreds of thousands of candidate sequences in a genome, which results in high class imbalance. This imbalance has a strong influence on most standard classifiers, and if not properly addressed in the model and the experiments, not only performance reported can be completely unrealistic but also the classifier will not be able to work properly for pre-miRNA prediction. Besides, another important issue is that for most of the machine learning (ML) approaches already used (supervised methods), it is necessary to have both positive and negative examples. The selection of positive examples is straightforward (well-known pre-miRNAs). However, it is difficult to build a representative set of negative examples because they should be sequences with hairpin structure that do not contain a pre-miRNA. RESULTS This review provides a comprehensive study and comparative assessment of methods from these two ML approaches for dealing with the prediction of novel pre-miRNAs: supervised and unsupervised training. We present and analyze the ML proposals that have appeared during the past 10 years in literature. They have been compared in several prediction tasks involving two model genomes and increasing imbalance levels. This work provides a review of existing ML approaches for pre-miRNA prediction and fair comparisons of the classifiers with same features and data sets, instead of just a revision of published software tools. The results and the discussion can help the community to select the most adequate bioinformatics approach according to the prediction task at hand. The comparative results obtained suggest that from low to mid-imbalance levels between classes, supervised methods can be the best. However, at very high imbalance levels, closer to real case scenarios, models including unsupervised and deep learning can provide better performance.
Collapse
Affiliation(s)
- Georgina Stegmayer
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Leandro E Di Persia
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Mariano Rubiolo
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Matias Gerard
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Milton Pividori
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Cristian Yones
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Leandro A Bugnon
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Tadeo Rodriguez
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Jonathan Raad
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
3
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
4
|
Kumar A, Dubey A. Rhizosphere microbiome: Engineering bacterial competitiveness for enhancing crop production. J Adv Res 2020; 24:337-352. [PMID: 32461810 PMCID: PMC7240055 DOI: 10.1016/j.jare.2020.04.014] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 04/15/2020] [Accepted: 04/25/2020] [Indexed: 12/29/2022] Open
Abstract
Plants in nature are constantly exposed to a variety of abiotic and biotic stresses which limits their growth and production. Enhancing crop yield and production to feed exponentially growing global population in a sustainable manner by reduced chemical fertilization and agrochemicals will be a big challenge. Recently, the targeted application of beneficial plant microbiome and their cocktails to counteract abiotic and biotic stress is gaining momentum and becomes an exciting frontier of research. Advances in next generation sequencing (NGS) platform, gene editing technologies, metagenomics and bioinformatics approaches allows us to unravel the entangled webs of interactions of holobionts and core microbiomes for efficiently deploying the microbiome to increase crops nutrient acquisition and resistance to abiotic and biotic stress. In this review, we focused on shaping rhizosphere microbiome of susceptible host plant from resistant plant which comprises of specific type of microbial community with multiple potential benefits and targeted CRISPR/Cas9 based strategies for the manipulation of susceptibility genes in crop plants for improving plant health. This review is significant in providing first-hand information to improve fundamental understanding of the process which helps in shaping rhizosphere microbiome.
Collapse
Affiliation(s)
- Ashwani Kumar
- Metagenomics and Secretomics Research Laboratory, Department of Botany, Dr. Harisingh Gour University (A Central University), Sagar 470003, M.P., India
| | - Anamika Dubey
- Metagenomics and Secretomics Research Laboratory, Department of Botany, Dr. Harisingh Gour University (A Central University), Sagar 470003, M.P., India
| |
Collapse
|
5
|
Eicher T, Kinnebrew G, Patt A, Spencer K, Ying K, Ma Q, Machiraju R, Mathé EA. Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources. Metabolites 2020; 10:E202. [PMID: 32429287 PMCID: PMC7281435 DOI: 10.3390/metabo10050202] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/07/2020] [Accepted: 05/13/2020] [Indexed: 02/06/2023] Open
Abstract
As researchers are increasingly able to collect data on a large scale from multiple clinical and omics modalities, multi-omics integration is becoming a critical component of metabolomics research. This introduces a need for increased understanding by the metabolomics researcher of computational and statistical analysis methods relevant to multi-omics studies. In this review, we discuss common types of analyses performed in multi-omics studies and the computational and statistical methods that can be used for each type of analysis. We pinpoint the caveats and considerations for analysis methods, including required parameters, sample size and data distribution requirements, sources of a priori knowledge, and techniques for the evaluation of model accuracy. Finally, for the types of analyses discussed, we provide examples of the applications of corresponding methods to clinical and basic research. We intend that our review may be used as a guide for metabolomics researchers to choose effective techniques for multi-omics analyses relevant to their field of study.
Collapse
Affiliation(s)
- Tara Eicher
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
| | - Garrett Kinnebrew
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Bioinformatics Shared Resource Group, The Ohio State University, Columbus, OH 43210, USA
| | - Andrew Patt
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
| | - Kyle Spencer
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
- Nationwide Children’s Research Hospital, Columbus, OH 43210, USA
| | - Kevin Ying
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Molecular, Cellular and Developmental Biology Program, The Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
| | - Raghu Machiraju
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
- Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH 43210, USA
| | - Ewy A. Mathé
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
| |
Collapse
|
6
|
Marshall-Colón A, Kliebenstein DJ. Plant Networks as Traits and Hypotheses: Moving Beyond Description. TRENDS IN PLANT SCIENCE 2019; 24:840-852. [PMID: 31300195 DOI: 10.1016/j.tplants.2019.06.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 05/31/2019] [Accepted: 06/04/2019] [Indexed: 05/04/2023]
Abstract
Biology relies on the central thesis that the genes in an organism encode molecular mechanisms that combine with stimuli and raw materials from the environment to create a final phenotypic expression representative of the genomic programming. While conceptually simple, the genotype-to-phenotype linkage in a eukaryotic organism relies on the interactions of thousands of genes and an environment with a potentially unknowable level of complexity. Modern biology has moved to the use of networks in systems biology to try to simplify this complexity to decode how an organism's genome works. Previously, biological networks were basic ways to organize, simplify, and analyze data. However, recent advances are allowing networks to move beyond description and become phenotypes or hypotheses in their own right. This review discusses these efforts, like mapping responses across biological scales, including relationships among cellular entities, and the direct use of networks as traits or hypotheses.
Collapse
Affiliation(s)
- Amy Marshall-Colón
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Daniel J Kliebenstein
- Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA; DynaMo Center of Excellence, University of Copenhagen, Thorvaldsensvej 40, DK-1871 Frederiksberg C, Denmark.
| |
Collapse
|
7
|
Cortina PR, Santiago AN, Sance MM, Peralta IE, Carrari F, Asis R. Neuronal network analyses reveal novel associations between volatile organic compounds and sensory properties of tomato fruits. Metabolomics 2018; 14:57. [PMID: 30830349 DOI: 10.1007/s11306-018-1355-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 03/22/2018] [Indexed: 01/06/2023]
Abstract
INTRODUCTION The process of tomato (Solanum lycopersicum) breeding has affected negatively the fruit organoleptic properties and this is evident when comparing modern cultivars with heirloom varieties. Flavor of tomato fruit is determined by a complex combination of volatile and nonvolatile metabolites that is not yet understood. OBJECTIVES The aim of this work was to provide an alternative approach to exploring the relationship between tomato odour/taste and volatile organic compounds (VOCs). METHODS VOC composition and organoleptic properties of seven Andean tomato landraces along with an edible wild species (Solanum pimpinellifolium) and four commercial varieties were characterized. Six hedonic traits were analyzed by a semitrained sensory panel to describe the organoleptic properties. Ninety-four VOCs were analyzed by headspace solid phase microextraction/gas chromatography-mass spectrometry (HS/SPME/GC-MS). The relationship between sensory data and VOCs was explored using an Artificial Neural Networks model (Kohonen Self Organizing Maps, omeSOM). RESULTS AND CONCLUSION The results showed a strong preference by panelists for tomatoes of landraces than for commercial varieties and wild species. The predictive analysis by omeSOM showed 15 VOCs significantly associated to the typical and atypical tomato odour and taste. Moreover, omeSOM was used to predict the relationship of VOC ratios with sensory data. A total of 108 VOC ratios out of 8837 VOC ratios were predicted to be contributing to the typical and atypical tomato odour and taste. The metabolic origin of these flavor-associated VOCs and the metabolic point or target for breeding strategies were discussed.
Collapse
Affiliation(s)
- Pablo R Cortina
- INFIQC, Departamento de Química Orgánica, Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, 5000, Córdoba, Argentina
| | - Ana N Santiago
- INFIQC, Departamento de Química Orgánica, Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, 5000, Córdoba, Argentina
| | - María M Sance
- IADIZA, CCT-CONICET Mendoza, Parque General San Martín, 5500, Mendoza, Argentina
| | - Iris E Peralta
- IADIZA, CCT-CONICET Mendoza, Parque General San Martín, 5500, Mendoza, Argentina
- Facultad de Ciencias Agrarias, Universidad Nacional deCuyo y CCT CONICET Mendoza, Chacras de Coria, Lujan de Cuyo, 5505, Mendoza, Argentina
| | - Fernando Carrari
- Instituto de Biotecnología, Instituto Nacional de Tecnología Agropecuaria (IB-INTA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), PO Box 25, B1686WAA, Castelar, Argentina
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, Rua do Matão, 277, São Paulo, 05508-090, Brazil
| | - Ramón Asis
- CIBICI, Departamento de Bioquímica Clínica, Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, 5000, Córdoba, Argentina.
| |
Collapse
|
8
|
Leale G, Baya AE, Milone DH, Granitto PM, Stegmayer G. Inferring Unknown Biological Function by Integration of GO Annotations and Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:168-180. [PMID: 27723603 DOI: 10.1109/tcbb.2016.2615960] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Characterizing genes with semantic information is an important process regarding the description of gene products. In spite that complete genomes of many organisms have been already sequenced, the biological functions of all of their genes are still unknown. Since experimentally studying the functions of those genes, one by one, would be unfeasible, new computational methods for gene functions inference are needed. We present here a novel computational approach for inferring biological function for a set of genes with previously unknown function, given a set of genes with well-known information. This approach is based on the premise that genes with similar behaviour should be grouped together. This is known as the guilt-by-association principle. Thus, it is possible to take advantage of clustering techniques to obtain groups of unknown genes that are co-clustered with genes that have well-known semantic information (GO annotations). Meaningful knowledge to infer unknown semantic information can therefore be provided by these well-known genes. We provide a method to explore the potential function of new genes according to those currently annotated. The results obtained indicate that the proposed approach could be a useful and effective tool when used by biologists to guide the inference of biological functions for recently discovered genes. Our work sets an important landmark in the field of identifying unknown gene functions through clustering, using an external source of biological input. A simple web interface to this proposal can be found at http://fich.unl.edu.ar/sinc/webdemo/gamma-am/.
Collapse
|
9
|
Stegmayer G, Yones C, Kamenetzky L, Milone DH. High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1316-1326. [PMID: 27295687 DOI: 10.1109/tcbb.2016.2576459] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The computational prediction of novel microRNA within a full genome involves identifying sequences having the highest chance of being a miRNA precursor (pre-miRNA). These sequences are usually named candidates to miRNA. The well-known pre-miRNAs are usually only a few in comparison to the hundreds of thousands of potential candidates to miRNA that have to be analyzed, which makes this task a high class-imbalance classification problem. The classical way of approaching it has been training a binary classifier in a supervised manner, using well-known pre-miRNAs as positive class and artificially defining the negative class. However, although the selection of positive labeled examples is straightforward, it is very difficult to build a set of negative examples in order to obtain a good set of training samples for a supervised method. In this work, we propose a novel and effective way of approaching this problem using machine learning, without the definition of negative examples. The proposal is based on clustering unlabeled sequences of a genome together with well-known miRNA precursors for the organism under study, which allows for the quick identification of the best candidates to miRNA as those sequences clustered with known precursors. Furthermore, we propose a deep model to overcome the problem of having very few positive class labels. They are always maintained in the deep levels as positive class while less likely pre-miRNA sequences are filtered level after level. Our approach has been compared with other methods for pre-miRNAs prediction in several species, showing effective predictivity of novel miRNAs. Additionally, we will show that our approach has a lower training time and allows for a better graphical navegability and interpretation of the results. A web-demo interface to try deepSOM is available at http://fich.unl.edu.ar/sinc/web-demo/deepsom/.
Collapse
|
10
|
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data. Genomics 2016; 107:274-80. [DOI: 10.1016/j.ygeno.2016.04.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Revised: 04/06/2016] [Accepted: 04/18/2016] [Indexed: 11/17/2022]
|
11
|
Lv D, Wang X, Dong J, Zhuang Y, Huang S, Ma B, Chen P, Li X, Zhang B, Li Z, Jin B. Systematic characterization of lncRNAs' cell-to-cell expression heterogeneity in glioblastoma cells. Oncotarget 2016; 7:18403-14. [PMID: 26918340 PMCID: PMC4951297 DOI: 10.18632/oncotarget.7580] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2015] [Accepted: 02/11/2016] [Indexed: 12/30/2022] Open
Abstract
Glioblastoma (GBM) is the most common malignant adult brain tumor generally associated with high level of cellular heterogeneity and a dismal prognosis. Long noncoding RNAs (lncRNAs) are emerging as novel mediators of tumorigenesis. Recently developed single-cell RNA-seq provides an unprecedented way for analysis of the cell-to-cell variability in lncRNA expression profiles. Here we comprehensively examined the expression patterns of 2,003 lncRNAs in 380 cells from five primary GBMs and two glioblastoma stem-like cell (GSC) lines. Employing the self-organizing maps, we displayed the landscape of the lncRNA expression dynamics for individual cells. Further analyses revealed heterogeneous nature of lncRNA in abundance and splicing patterns. Moreover, lncRNA expression variation is also ubiquitously present in the established GSC lines composed of seemingly identical cells. Through comparative analysis of GSC and corresponding differentiated cell cultures, we defined a stemness signature by the set of 31 differentially expressed lncRNAs, which can disclose stemness gradients in five tumors. Additionally, based on known classifier lncRNAs for molecular subtypes, each tumor was found to comprise individual cells representing four subtypes. Our systematic characterization of lncRNA expression heterogeneity lays the foundation for future efforts to further understand the function of lncRNA, develop valuable biomarkers, and enhance knowledge of GBM biology.
Collapse
Affiliation(s)
- Dekang Lv
- Institute of Cancer Stem Cell, Cancer Center, Dalian Medical University, Dalian, 116044, Liaoning, P.R. China
| | - Xiang Wang
- Institute of Cancer Stem Cell, Cancer Center, Dalian Medical University, Dalian, 116044, Liaoning, P.R. China
| | - Jun Dong
- Institute of Cancer Stem Cell, Cancer Center, Dalian Medical University, Dalian, 116044, Liaoning, P.R. China
| | - Yan Zhuang
- Institute of Cancer Stem Cell, Cancer Center, Dalian Medical University, Dalian, 116044, Liaoning, P.R. China
| | - Shuyu Huang
- Institute of Cancer Stem Cell, Cancer Center, Dalian Medical University, Dalian, 116044, Liaoning, P.R. China
| | - Binbin Ma
- Department of Neurosurgery, The Second Hospital of Dalian Medical University, Dalian, 116023, Liaoning, P.R. China
| | - Puxiang Chen
- Department of Obstetrics and Gynecology, The Second Xiangya Hospital of Central South University, Changsha, 410011, Hunan, P.R. China
| | - Xiaodong Li
- Institute of Cancer Stem Cell, Cancer Center, Dalian Medical University, Dalian, 116044, Liaoning, P.R. China
| | - Bo Zhang
- Department of Neurosurgery, The Second Hospital of Dalian Medical University, Dalian, 116023, Liaoning, P.R. China
| | - Zhiguang Li
- Institute of Cancer Stem Cell, Cancer Center, Dalian Medical University, Dalian, 116044, Liaoning, P.R. China
| | - Bilian Jin
- Institute of Cancer Stem Cell, Cancer Center, Dalian Medical University, Dalian, 116044, Liaoning, P.R. China
| |
Collapse
|
12
|
Elbl P, Navarro BV, de Oliveira LF, Almeida J, Mosini AC, dos Santos ALW, Rossi M, Floh EIS. Identification and Evaluation of Reference Genes for Quantitative Analysis of Brazilian Pine (Araucaria angustifolia Bertol. Kuntze) Gene Expression. PLoS One 2015; 10:e0136714. [PMID: 26313945 PMCID: PMC4552031 DOI: 10.1371/journal.pone.0136714] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 08/07/2015] [Indexed: 11/29/2022] Open
Abstract
Quantitative analysis of gene expression is a fundamental experimental approach in many fields of plant biology, but it requires the use of internal controls representing constitutively expressed genes for reliable transcript quantification. In this study, we identified fifteen putative reference genes from an A. angustifolia transcriptome database. Variation in transcript levels was first evaluated in silico by comparing read counts and then by quantitative real-time PCR (qRT-PCR), resulting in the identification of six candidate genes. The consistency of transcript abundance was also calculated applying geNorm and NormFinder software packages followed by a validation approach using four target genes. The results presented here indicate that a diverse set of samples should ideally be used in order to identify constitutively expressed genes, and that the use of any two reference genes in combination, of the six tested genes, is sufficient for effective expression normalization. Finally, in agreement with the in silico prediction, a comprehensive analysis of the qRT-PCR data combined with validation analysis revealed that AaEIF4B-L and AaPP2A are the most suitable reference genes for comparative studies of A. angustifolia gene expression.
Collapse
Affiliation(s)
- Paula Elbl
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
| | - Bruno V. Navarro
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
| | - Leandro F. de Oliveira
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
| | - Juliana Almeida
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
| | - Amanda C. Mosini
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
| | - André L. W. dos Santos
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
| | - Magdalena Rossi
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
| | - Eny I. S. Floh
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
- * E-mail:
| |
Collapse
|
13
|
Milone DH, Stegmayer G, López M, Kamenetzky L, Carrari F. Improving clustering with metabolic pathway data. BMC Bioinformatics 2014; 15:101. [PMID: 24717120 PMCID: PMC4002909 DOI: 10.1186/1471-2105-15-101] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 03/25/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. RESULTS A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. CONCLUSIONS Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.
Collapse
Affiliation(s)
- Diego H Milone
- Research Center for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, (3000) Santa Fe, Argentina.
| | | | | | | | | |
Collapse
|
14
|
Gerard MF, Stegmayer G, Milone DH. An evolutionary approach for searching metabolic pathways. Comput Biol Med 2013; 43:1704-12. [DOI: 10.1016/j.compbiomed.2013.08.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Revised: 08/18/2013] [Accepted: 08/21/2013] [Indexed: 11/26/2022]
|
15
|
Abstract
We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) occupancy. New candidate cell-type-specific enhancers were identified for multiple ENCODE cell types in this way, along with new candidates for ubiquitous enhancer activity. An interactive web interface was developed to allow users to visualize and custom-mine the ENCODE SOM. We conclude that large SOMs trained on chromatin data from multiple cell types provide a powerful way to identify complex relationships in genomic data at user-selected levels of granularity.
Collapse
|
16
|
Quadrana L, Almeida J, Otaiza SN, Duffy T, Corrêa da Silva JV, de Godoy F, Asís R, Bermúdez L, Fernie AR, Carrari F, Rossi M. Transcriptional regulation of tocopherol biosynthesis in tomato. PLANT MOLECULAR BIOLOGY 2013; 81:309-25. [PMID: 23247837 DOI: 10.1007/s11103-012-0001-4] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 12/10/2012] [Indexed: 05/21/2023]
Abstract
Tocopherols, compounds with vitamin E (VTE) activity, are potent lipid-soluble antioxidants synthesized only by photosynthetic organisms. Their biosynthesis requires the condensation of phytyl-diphosphate and homogentisate, derived from the methylerythritol phosphate (MEP) and shikimate pathways (SK), respectively. These metabolic pathways are central in plant chloroplast metabolism and are involved in the biosynthesis of important molecules such as chlorophyll, carotenoids, aromatic amino-acids and prenylquinones. In the last decade, few studies have provided insights into the regulation of VTE biosynthesis and its accumulation. However, the pathway regulatory mechanism/s at mRNA level remains unclear. We have recently identified a collection of tomato genes involved in tocopherol biosynthesis. In this work, by a dedicated qPCR array platform, the transcript levels of 47 genes, including paralogs, were determined in leaves and across fruit development. Expression data were analyzed for correlation with tocopherol profiles by coregulation network and neural clustering approaches. The results showed that tocopherol biosynthesis is controlled both temporally and spatially however total tocopherol content remains constant. These analyses exposed 18 key genes from MEP, SK, phytol recycling and VTE-core pathways highly associated with VTE content in leaves and fruits. Moreover, genomic analyses of promoter regions suggested that the expression of the tocopherol-core pathway genes is trancriptionally coregulated with specific genes of the upstream pathways. Whilst the transcriptional profiles of the precursor pathway genes would suggest an increase in VTE content across fruit development, the data indicate that in the M82 cultivar phytyl diphosphate supply limits tocopherol biosynthesis in later fruit stages. This is in part due to the decreasing transcript levels of geranylgeranyl reductase (GGDR) which restricts the isoprenoid precursor availability. As a proof of concept, by analyzing a collection of Andean landrace tomato genotypes, the role of the pinpointed genes in determining fruit tocopherol content was confirmed. The results uncovered a finely tuned regulation able to shift the precursor pathways controlling substrate influx for VTE biosynthesis and overcoming endogenous competition for intermediates. The whole set of data allowed to propose that 1-deoxy-D-xylulose-5-phosphate synthase and GGDR encoding genes, which determine phytyl-diphosphate availability, together with enzyme encoding genes involved in chlorophyll-derived phytol metabolism appear as the most plausible targets to be engineered aiming to improve tomato fruit nutritional value.
Collapse
Affiliation(s)
- Leandro Quadrana
- Instituto de Biotecnología, Instituto Nacional de Tecnología Agropecuaria and Consejo Nacional de Investigaciones Científicas y Técnicas, B1712WAA, Castelar, Argentina.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Stegmayer G, Gerard M, Milone D. Data Mining Over Biological Datasets: An Integrated Approach Based on Computational Intelligence. IEEE COMPUT INTELL M 2012. [DOI: 10.1109/mci.2012.2215122] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
18
|
Stegmayer G, Milone DH, Kamenetzky L, López MG, Carrari F. A biologically inspired validity measure for comparison of clustering methods over metabolic data sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:706-716. [PMID: 22231623 DOI: 10.1109/tcbb.2012.10] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In the biological domain, clustering is based on the assumption that genes or metabolites involved in a common biological process are coexpressed/coaccumulated under the control of the same regulatory network. Thus, a detailed inspection of the grouped patterns to verify their memberships to well-known metabolic pathways could be very useful for the evaluation of clusters from a biological perspective. The aim of this work is to propose a novel approach for the comparison of clustering methods over metabolic data sets, including prior biological knowledge about the relation among elements that constitute the clusters. A way of measuring the biological significance of clustering solutions is proposed. This is addressed from the perspective of the usefulness of the clusters to identify those patterns that change in coordination and belong to common pathways of metabolic regulation. The measure summarizes in a compact way the objective analysis of clustering methods, which respects coherence and clusters distribution. It also evaluates the biological internal connections of such clusters considering common pathways. The proposed measure was tested in two biological databases using three clustering methods.
Collapse
Affiliation(s)
- Georgina Stegmayer
- CONICET and the Center for Research and Development of Information Systems-CIDISI, UTN-FRSF, Lavaise 610, Santa Fe 3000, Argentina.
| | | | | | | | | |
Collapse
|
19
|
Almeida J, Quadrana L, Asís R, Setta N, de Godoy F, Bermúdez L, Otaiza SN, Corrêa da Silva JV, Fernie AR, Carrari F, Rossi M. Genetic dissection of vitamin E biosynthesis in tomato. JOURNAL OF EXPERIMENTAL BOTANY 2011; 62:3781-98. [PMID: 21527625 PMCID: PMC3134339 DOI: 10.1093/jxb/err055] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2010] [Revised: 02/07/2011] [Accepted: 02/08/2011] [Indexed: 05/20/2023]
Abstract
Vegetables are critical for human health as they are a source of multiple vitamins including vitamin E (VTE). In plants, the synthesis of VTE compounds, tocopherol and tocotrienol, derives from precursors of the shikimate and methylerythritol phosphate pathways. Quantitative trait loci (QTL) for α-tocopherol content in ripe fruit have previously been determined in an Solanum pennellii tomato introgression line population. In this work, variations of tocopherol isoforms (α, β, γ, and δ) in ripe fruits of these lines were studied. In parallel all tomato genes structurally associated with VTE biosynthesis were identified and mapped. Previously identified VTE QTL on chromosomes 6 and 9 were confirmed whilst novel ones were identified on chromosomes 7 and 8. Integrated analysis at the metabolic, genetic and genomic levels allowed us to propose 16 candidate loci putatively affecting tocopherol content in tomato. A comparative analysis revealed polymorphisms at nucleotide and amino acid levels between Solanum lycopersicum and S. pennellii candidate alleles. Moreover, evolutionary analyses showed the presence of codons evolving under both neutral and positive selection, which may explain the phenotypic differences between species. These data represent an important step in understanding the genetic determinants of VTE natural variation in tomato fruit and as such in the ability to improve the content of this important nutriceutical.
Collapse
Affiliation(s)
- Juliana Almeida
- Departamento de Botânica-IB-USP, 277, 05508-900, São Paulo, SP, Brazil
| | - Leandro Quadrana
- Instituto de Biotecnología, Instituto Nacional de Tecnología Agropecuaría (IB-INTA), and Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), PO Box 25, B1712WAA Castelar, Argentina (partner group of the Max Planck Institute for Molecular Plant Physiology, Potsdam-Golm, Germany)
| | - Ramón Asís
- CIBICI, Facultad de Ciencias Químicas Universidad Nacional de Córdoba, CC 5000, Córdoba, Argentina
| | - Nathalia Setta
- Departamento de Botânica-IB-USP, 277, 05508-900, São Paulo, SP, Brazil
| | - Fabiana de Godoy
- Departamento de Botânica-IB-USP, 277, 05508-900, São Paulo, SP, Brazil
| | - Luisa Bermúdez
- Departamento de Botânica-IB-USP, 277, 05508-900, São Paulo, SP, Brazil
| | - Santiago N. Otaiza
- CIBICI, Facultad de Ciencias Químicas Universidad Nacional de Córdoba, CC 5000, Córdoba, Argentina
| | | | - Alisdair R. Fernie
- Max Planck Institute for Molecular Plant Physiology, Wissenschaftspark Golm, Am Mühlenberg 1, Potsdam-Golm, D-14476, Germany
| | - Fernando Carrari
- Instituto de Biotecnología, Instituto Nacional de Tecnología Agropecuaría (IB-INTA), and Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), PO Box 25, B1712WAA Castelar, Argentina (partner group of the Max Planck Institute for Molecular Plant Physiology, Potsdam-Golm, Germany)
| | - Magdalena Rossi
- Departamento de Botânica-IB-USP, 277, 05508-900, São Paulo, SP, Brazil
- To whom correspondence should be addressed. E-mail: ; E-mail:
| |
Collapse
|