1
|
Classification of Hot and Cold Recombination Regions in Saccharomyces cerevisiae: Comparative Analysis of Two Machine Learning Techniques. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES 2019. [DOI: 10.1007/s40010-017-0427-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
2
|
Mishra B, Kumar N, Mukhtar MS. Systems Biology and Machine Learning in Plant-Pathogen Interactions. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2019; 32:45-55. [PMID: 30418085 DOI: 10.1094/mpmi-08-18-0221-fi] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Systems biology is an inclusive approach to study the static and dynamic emergent properties on a global scale by integrating multiomics datasets to establish qualitative and quantitative associations among multiple biological components. With an abundance of improved high throughput -omics datasets, network-based analyses and machine learning technologies are playing a pivotal role in comprehensive understanding of biological systems. Network topological features reveal most important nodes within a network as well as prioritize significant molecular components for diverse biological networks, including coexpression, protein-protein interaction, and gene regulatory networks. Machine learning techniques provide enormous predictive power through specific feature extraction from biological data. Deep learning, a subtype of machine learning, has plausible future applications because a domain expert for feature extraction is not needed in this algorithm. Inspired by diverse domains of biology, we here review classic systems biology techniques applied in plant immunity thus far. We also discuss additional advanced approaches in both graph theory and machine learning, which may provide new insights for understanding plant-microbe interactions. Finally, we propose a hybrid approach in plant immune systems that harnesses the power of both network biology and machine learning, with a potential to be applicable to both model systems and agronomically important crop plants.
Collapse
Affiliation(s)
| | | | - M Shahid Mukhtar
- 1 Department of Biology, and
- 2 Nutrition Obesity Research Center, University of Alabama at Birmingham, 1300 University Blvd., Birmingham 35294, U.S.A
| |
Collapse
|
3
|
Dwivedi AK, Chouhan U. Comparative study of artificial neural network for classification of hot and cold recombination regions in Saccharomyces cerevisiae. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2466-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
4
|
Junttila S, Laiho A, Gyenesei A, Rudd S. Whole transcriptome characterization of the effects of dehydration and rehydration on Cladonia rangiferina, the grey reindeer lichen. BMC Genomics 2013; 14:870. [PMID: 24325588 PMCID: PMC3878897 DOI: 10.1186/1471-2164-14-870] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Accepted: 11/14/2013] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Lichens are symbiotic organisms with a fungal and an algal or a cyanobacterial partner. Lichens inhabit some of the harshest climates on earth and most lichen species are desiccation-tolerant. Lichen desiccation-tolerance has been studied at the biochemical level and through proteomics, but the underlying molecular genetic mechanisms remain largely unexplored. The objective of our study was to examine the effects of dehydration and rehydration on the gene expression of Cladonia rangiferina. RESULTS Samples of C. rangiferina were collected at several time points during both the dehydration and rehydration process and the gene expression intensities were measured using a custom DNA microarray. Several genes, which were differentially expressed in one or more time points, were identified. The microarray results were validated using qRT-PCR analysis. Enrichment analysis of differentially expressed transcripts was also performed to identify the Gene Ontology terms most associated with the rehydration and dehydration process. CONCLUSIONS Our data identify differential expression patterns for hundreds of genes that are modulated during dehydration and rehydration in Cladonia rangiferina. These dehydration and rehydration events clearly differ from each other at the molecular level and the largest changes to gene expression are observed within minutes following rehydration. Distinct changes are observed during the earliest stage of rehydration and the mechanisms not appear to be shared with the later stages of wetting or with drying. Several of the most differentially expressed genes are similar to genes identified in previous studies that have investigated the molecular mechanisms of other desiccation-tolerant organisms. We present here the first microarray experiment for any lichen species and have for the first time studied the genetic mechanisms behind lichen desiccation-tolerance at the whole transcriptome level.
Collapse
Affiliation(s)
- Sini Junttila
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu, Turku, Finland
- The Finnish Microarray and Sequencing Centre, Turku Centre for Biotechnology, Tykistökatu, Turku, Finland
| | - Asta Laiho
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu, Turku, Finland
- The Finnish Microarray and Sequencing Centre, Turku Centre for Biotechnology, Tykistökatu, Turku, Finland
| | - Attila Gyenesei
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu, Turku, Finland
- The Finnish Microarray and Sequencing Centre, Turku Centre for Biotechnology, Tykistökatu, Turku, Finland
| | - Stephen Rudd
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu, Turku, Finland
| |
Collapse
|
5
|
Junttila S, Rudd S. Characterization of a transcriptome from a non-model organism, Cladonia rangiferina, the grey reindeer lichen, using high-throughput next generation sequencing and EST sequence data. BMC Genomics 2012; 13:575. [PMID: 23110403 PMCID: PMC3534622 DOI: 10.1186/1471-2164-13-575] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Accepted: 10/11/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Lichens are symbiotic organisms that have a remarkable ability to survive in some of the most extreme terrestrial climates on earth. Lichens can endure frequent desiccation and wetting cycles and are able to survive in a dehydrated molecular dormant state for decades at a time. Genetic resources have been established in lichen species for the study of molecular systematics and their taxonomic classification. No lichen species have been characterised yet using genomics and the molecular mechanisms underlying the lichen symbiosis and the fundamentals of desiccation tolerance remain undescribed. We report the characterisation of a transcriptome of the grey reindeer lichen, Cladonia rangiferina, using high-throughput next-generation transcriptome sequencing and traditional Sanger EST sequencing data. RESULTS Altogether 243,729 high quality sequence reads were de novo assembled into 16,204 contigs and 49,587 singletons. The genome of origin for the sequences produced was predicted using Eclat with sequences derived from the axenically grown symbiotic partners used as training sequences for the classification model. 62.8% of the sequences were classified as being of fungal origin while the remaining 37.2% were predicted as being of algal origin. The assembled sequences were annotated by BLASTX comparison against a non-redundant protein sequence database with 34.4% of the sequences having a BLAST match. 29.3% of the sequences had a Gene Ontology term match and 27.9% of the sequences had a domain or structural match following an InterPro search. 60 KEGG pathways with more than 10 associated sequences were identified. CONCLUSIONS Our results present a first transcriptome sequencing and de novo assembly for a lichen species and describe the ongoing molecular processes and the most active pathways in C. rangiferina. This brings a meaningful contribution to publicly available lichen sequence information. These data provide a first glimpse into the molecular nature of the lichen symbiosis and characterise the transcriptional space of this remarkable organism. These data will also enable further studies aimed at deciphering the genetic mechanisms behind lichen desiccation tolerance.
Collapse
Affiliation(s)
- Sini Junttila
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
| | - Stephen Rudd
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
| |
Collapse
|
6
|
Next generation transcriptomes for next generation genomes using est2assembly. BMC Bioinformatics 2009; 10:447. [PMID: 20034392 PMCID: PMC3087352 DOI: 10.1186/1471-2105-10-447] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Accepted: 12/24/2009] [Indexed: 11/10/2022] Open
Abstract
Background The decreasing costs of capillary-based Sanger sequencing and next generation technologies, such as 454 pyrosequencing, have prompted an explosion of transcriptome projects in non-model species, where even shallow sequencing of transcriptomes can now be used to examine a range of research questions. This rapid growth in data has outstripped the ability of researchers working on non-model species to analyze and mine transcriptome data efficiently. Results Here we present a semi-automated platform 'est2assembly' that processes raw sequence data from Sanger or 454 sequencing into a hybrid de-novo assembly, annotates it and produces GMOD compatible output, including a SeqFeature database suitable for GBrowse. Users are able to parameterize assembler variables, judge assembly quality and determine the optimal assembly for their specific needs. We used est2assembly to process Drosophila and Bicyclus public Sanger EST data and then compared them to published 454 data as well as eight new insect transcriptome collections. Conclusions Analysis of such a wide variety of data allows us to understand how these new technologies can assist EST project design. We determine that assembler parameterization is as essential as standardized methods to judge the output of ESTs projects. Further, even shallow sequencing using 454 produces sufficient data to be of wide use to the community. est2assembly is an important tool to assist manual curation for gene models, an important resource in their own right but especially for species which are due to acquire a genome project using Next Generation Sequencing.
Collapse
|
7
|
Bowen JK, Mesarich CH, Rees-George J, Cui W, Fitzgerald A, Win J, Plummer KM, Templeton MD. Candidate effector gene identification in the ascomycete fungal phytopathogen Venturia inaequalis by expressed sequence tag analysis. MOLECULAR PLANT PATHOLOGY 2009; 10:431-48. [PMID: 19400844 PMCID: PMC6640279 DOI: 10.1111/j.1364-3703.2009.00543.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The hemi-biotrophic fungus Venturia inaequalis infects members of the Maloideae, causing the economically important apple disease, scab. The plant-pathogen interaction of Malus and V. inaequalis follows the gene-for-gene model. cDNA libraries were constructed, and bioinformatic analysis of the resulting expressed sequence tags (ESTs) was used to characterize potential effector genes. Effectors are small proteins, secreted in planta, that are assumed to facilitate infection. Therefore, a cDNA library was constructed from a compatible interaction. To distinguish pathogen from plant sequences, the library was probed with genomic DNA from V. inaequalis to enrich for pathogen genes, and cDNA libraries were constructed from in vitro-grown material. A suppression subtractive hybridization library enriched for cellophane-induced genes was included, as growth on cellophane may mimic that in planta, with the differentiation of structures resembling those formed during plant colonization. Clustering of ESTs from the in planta and in vitro libraries indicated a fungal origin of the resulting non-redundant sequence. A total of 937 ESTs was classified as putatively fungal, which could be assembled into 633 non-redundant sequences. Sixteen new candidate effector genes were identified from V. inaequalis based on features common to characterized effector genes from filamentous fungi, i.e. they encode a small, novel, cysteine-rich protein, with a putative signal peptide. Three of the 16 candidates, in particular, conformed to most of the protein structural characteristics expected of fungal effectors and showed significant levels of transcriptional up-regulation during in planta growth. In addition to candidate effector genes, this collection of ESTs represents a valuable genomic resource for V. inaequalis.
Collapse
Affiliation(s)
- Joanna K Bowen
- The New Zealand Institute for Plant and Food Research Limited, Mt. Albert Research Centre, Auckland, New Zealand.
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Abstract
An associative neural network (ASNN) is an ensemble-based method inspired by the function and structure of neural network correlations in brain. The method operates by simulating the short- and long-term memory of neural networks. The long-term memory is represented by ensemble of neural network weights, while the short-term memory is stored as a pool of internal neural network representations of the input pattern. The organization allows the ASNN to incorporate new data cases in short-term memory and provides high generalization ability without the need to retrain the neural network weights. The method can be used to estimate a bias and the applicability domain of models. Applications of the ASNN in QSAR and drug design are exemplified. The developed algorithm is available at http://www.vcclab.org.
Collapse
Affiliation(s)
- Igor V Tetko
- GSF--Institute for Bioinformatics, Neuherberg, Germany
| |
Collapse
|
9
|
Abstract
UNLABELLED Given the growing amount of biological data, data mining methods have become an integral part of bioinformatics research. Unfortunately, standard data mining tools are often not sufficiently equipped for handling raw data such as e.g. amino acid sequences. One popular and freely available framework that contains many well-known data mining algorithms is the Waikato Environment for Knowledge Analysis (Weka). In the BioWeka project, we introduce various input formats for bioinformatics data and bioinformatics methods like alignments to Weka. This allows users to easily combine them with Weka's classification, clustering, validation and visualization facilities on a single platform and therefore reduces the overhead of converting data between different data formats as well as the need to write custom evaluation procedures that can deal with many different programs. We encourage users to participate in this project by adding their own components and data formats to BioWeka. AVAILABILITY The software, documentation and tutorial are available at http://www.bioweka.org.
Collapse
Affiliation(s)
- Jan E Gewehr
- Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-University Munich, Amalienstrasse 17, D-80333 Munich, Germany
| | | | | |
Collapse
|
10
|
Emmersen J, Rudd S, Mewes HW, Tetko IV. Separation of sequences from host-pathogen interface using triplet nucleotide frequencies. Fungal Genet Biol 2007; 44:231-41. [PMID: 17218127 DOI: 10.1016/j.fgb.2006.11.010] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Revised: 10/22/2006] [Accepted: 11/27/2006] [Indexed: 11/22/2022]
Abstract
The identification of genes involved in host-pathogen interactions is important for the elucidation of mechanisms of disease resistance and host susceptibility. A traditional way to classify the origin of genes sampled from a pool of mixed cDNA is through sequence similarity to known genes from either the pathogen or host organism or other closely related species. This approach does not work when the identified sequence has no close homologues in the sequence databases. In our previous studies, we classified genes using their codon frequencies. This method, however, explicitly required the prediction of CDS regions and thus could not be applied to sequences composed from the non-coding regions of genes. In this study, we show that the use of sliding-window triplet frequencies extends the application of the algorithm to both coding and non-coding sequences and also increases the prediction accuracy of a Support Vector Machine classifier from 95.6+/-0.3 to 96.5+/-0.2. Thus the use of the triplet frequencies increased the prediction accuracy of the new method by more than 20% compared to our previous approach. A functional analysis of sequences detected gene families having significantly higher or lower probability to be correctly classified compared to the average accuracy of the method is described. The server to perform classification of EST sequences using triplet frequencies is available at (URL: http://mips.gsf.de/proj/est3).
Collapse
Affiliation(s)
- Jeppe Emmersen
- Institut for Miljø og Bioteknologi, Aalborg Universitet, Sohngaardsholmsvej 49, 9000 Aalborg, Denmark
| | | | | | | |
Collapse
|
11
|
Zhou T, Weng J, Sun X, Lu Z. Support vector machine for classification of meiotic recombination hotspots and coldspots in Saccharomyces cerevisiae based on codon composition. BMC Bioinformatics 2006; 7:223. [PMID: 16640774 PMCID: PMC1463011 DOI: 10.1186/1471-2105-7-223] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2005] [Accepted: 04/26/2006] [Indexed: 11/30/2022] Open
Abstract
Background Meiotic double-strand breaks occur at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Hotspots and coldspots are receiving increasing attention in research into the mechanism of meiotic recombination. However, predicting hotspots and coldspots from DNA sequence information is still a challenging task. Results We present a novel method for classification of hot and cold ORFs located in hotspots and coldspots respectively in Saccharomyces cerevisiae, using support vector machine (SVM), which relies on codon composition differences. This method has achieved a high classification accuracy of 85.0%. Since codon composition is a fusion of codon usage bias and amino acid composition signals, the ability of these two kinds of sequence attributes to discriminate hot ORFs from cold ORFs was also investigated separately. Our results indicate that neither codon usage bias nor amino acid composition taken separately performed as well as codon composition. Moreover, our SVM based method was applied to the full genome: We predicted the hot/cold ORFs from the yeast genome by using cutoffs of recombination rate. We found that the performance of our method for predicting cold ORFs is not as good as that for predicting hot ORFs. Besides, we also observed a considerable correlation between meiotic recombination rate and amino acid composition of certain residues, which probably reflects the structural and functional dissimilarity between the hot and cold groups. Conclusion We have introduced a SVM-based novel method to discriminate hot ORFs from cold ones. Applying codon composition as sequence attributes, we have achieved a high classification accuracy, which suggests that codon composition has strong potential to be used as sequence attributes in the prediction of hot and cold ORFs.
Collapse
Affiliation(s)
- Tong Zhou
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Jianhong Weng
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| |
Collapse
|
12
|
Rudd S, Tetko IV. Eclair--a web service for unravelling species origin of sequences sampled from mixed host interfaces. Nucleic Acids Res 2005; 33:W724-7. [PMID: 15980572 PMCID: PMC1160195 DOI: 10.1093/nar/gki434] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The identification of the genes that participate at the biological interface of two species remains critical to our understanding of the mechanisms of disease resistance, disease susceptibility and symbiosis. The sequencing of complementary DNA (cDNA) libraries prepared from the biological interface between two organisms provides an inexpensive way to identify the novel genes that may be expressed as a cause or consequence of compatible or incompatible interactions. Sequence classification and annotation of species origin typically use an orthology-based approach and require access to large portions of either genome, or a close relative. Novel species- or clade-specific sequences may have no counterpart within existing databases and remain ambiguous features. Here we present a web-service, Eclair, which utilizes support vector machines for the classification of the origin of expressed sequence tags stemming from mixed host cDNA libraries. In addition to providing an interface for the classification of sequences, users are presented with the opportunity to train a model to suit their preferred species pair. Eclair is freely available at http://eclair.btk.fi.
Collapse
Affiliation(s)
- Stephen Rudd
- Centre for Biotechnology, Tykistökatu 6 FIN-20521, Turku, Finland.
| | | |
Collapse
|
13
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|