Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Steinbiss S, Kastens S, Kurtz S. LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons. Mob DNA 2012;3:18. [PMID: 23131050 PMCID: PMC3582472 DOI: 10.1186/1759-8753-3-18] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 08/31/2012] [Indexed: 11/30/2022] Open

For:	Steinbiss S, Kastens S, Kurtz S. LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons. Mob DNA 2012;3:18. [PMID: 23131050 PMCID: PMC3582472 DOI: 10.1186/1759-8753-3-18] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 08/31/2012] [Indexed: 11/30/2022] Open

Number

Cited by Other Article(s)

Orozco-Arias S, Isaza G, Guyot R. Retrotransposons in Plant Genomes: Structure, Identification, and Classification through Bioinformatics and Machine Learning. Int J Mol Sci 2019;20:E3837. [PMID: 31390781 PMCID: PMC6696364 DOI: 10.3390/ijms20153837] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 07/31/2019] [Accepted: 08/02/2019] [Indexed: 01/26/2023] Open

Valencia JD, Girgis HZ. LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo. BMC Genomics 2019;20:450. [PMID: 31159720 PMCID: PMC6547461 DOI: 10.1186/s12864-019-5796-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 05/14/2019] [Indexed: 12/19/2022] Open

Abstract

BACKGROUND

Long terminal repeat retrotransposons are the most abundant transposons in plants. They play important roles in alternative splicing, recombination, gene regulation, and defense mechanisms. Large-scale sequencing projects for plant genomes are currently underway. Software tools are important for annotating long terminal repeat retrotransposons in these newly available genomes. However, the available tools are not very sensitive to known elements and perform inconsistently on different genomes. Some are hard to install or obsolete. They may struggle to process large plant genomes. None can be executed in parallel out of the box and very few have features to support visual review of new elements. To overcome these limitations, we developed LtrDetector, which uses techniques inspired by signal-processing.

RESULTS

We compared LtrDetector to LTR_Finder and LTRharvest, the two most successful predecessor tools, on six plant genomes. For each organism, we constructed a ground truth data set based on queries from a consensus sequence database. According to this evaluation, LtrDetector was the most sensitive tool, achieving 16-23% improvement in sensitivity over LTRharvest and 21% improvement over LTR_Finder. All three tools had low false positive rates, with LtrDetector achieving 98.2% precision, in between its two competitors. Overall, LtrDetector provides the best compromise between high sensitivity and low false positive rate while requiring moderate time and utilizing memory available on personal computers.

CONCLUSIONS

LtrDetector uses a novel methodology revolving around k-mer distributions, which allows it to produce high-quality results using relatively lightweight procedures. It is easy to install and use. It is not species specific, performing well using its default parameters on genomes of varying size and repeat content. It is automatically configured for parallel execution and runs efficiently on an ordinary personal computer. It includes a k-mer scores visualization tool to facilitate manual review of the identified elements. These features make LtrDetector an attractive tool for future annotation projects involving long terminal repeat retrotransposons.

Collapse

Inpactor, Integrated and Parallel Analyzer and Classifier of LTR Retrotransposons and Its Application for Pineapple LTR Retrotransposons Diversity and Dynamics. BIOLOGY 2018;7:biology7020032. [PMID: 29799487 PMCID: PMC6022998 DOI: 10.3390/biology7020032] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 05/16/2018] [Accepted: 05/22/2018] [Indexed: 12/22/2022]

Schietgat L, Vens C, Cerri R, Fischer CN, Costa E, Ramon J, Carareto CMA, Blockeel H. A machine learning based framework to identify and classify long terminal repeat retrotransposons. PLoS Comput Biol 2018;14:e1006097. [PMID: 29684010 PMCID: PMC5933816 DOI: 10.1371/journal.pcbi.1006097] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Revised: 05/03/2018] [Accepted: 03/19/2018] [Indexed: 12/03/2022] Open

Abstract

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner’s predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.

Over the years, with the increase of the acquisition of biological data, the extraction of knowledge from this data is getting more important. To understand how biology works is very important to increase the quality of the products and services which use biological data. This directly influences companies and governments, which need to remain in the knowledge frontier of an increasing competitive economy. Transposable Elements (TEs) are an example of very important biological data, and to understand their role in the genomes of organisms is very important for the development of products based on biological data. As an example, we can cite the production biofuels such as the sugar-cane-based ones. Many studies have revealed the presence of active TEs in this plant, which has gained economic importance in many countries. To understand how TEs influence the plant should help researchers to develop more resistant varieties of sugar-cane, increasing the production. Thus, the development of computational methods able to help biologists in the correct identification and classification of TEs is very important from both theoretical and practical perspectives.

Collapse

Genome-wide analysis of transposable elements in the coffee berry borer Hypothenemus hampei (Coleoptera: Curculionidae): description of novel families. Mol Genet Genomics 2017;292:565-583. [PMID: 28204924 DOI: 10.1007/s00438-017-1291-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2016] [Accepted: 01/12/2017] [Indexed: 10/20/2022]

Abstract

The coffee berry borer (CBB) Hypothenemus hampei is the most limiting pest of coffee production worldwide. The CBB genome has been recently sequenced; however, information regarding the presence and characteristics of transposable elements (TEs) was not provided. Using systematic searching strategies based on both de novo and homology-based approaches, we present a library of TEs from the draft genome of CBB sequenced by the Colombian Coffee Growers Federation. The library consists of 880 sequences classified as 66% Class I (LTRs: 46%, non-LTRs: 20%) and 34% Class II (DNA transposons: 8%, Helitrons: 16% and MITEs: 10%) elements, including families of the three main LTR (Gypsy, Bel-Pao and Copia) and non-LTR (CR1, Daphne, I/Nimb, Jockey, Kiri, R1, R2 and R4) clades and DNA superfamilies (Tc1-mariner, hAT, Merlin, P, PIF-Harbinger, PiggyBac and Helitron). We propose the existence of novel families: Hypo, belonging to the LTR Gypsy superfamily; Hamp, belonging to non-LTRs; and rosa, belonging to Class II or DNA transposons. Although the rosa clade has been previously described, it was considered to be a basal subfamily of the mariner family. Based on our phylogenetic analysis, including Tc1, mariner, pogo, rosa and Lsra elements from other insects, we propose that rosa and Lsra elements are subfamilies of an independent family of Class II elements termed rosa. The annotations obtained indicate that a low percentage of the assembled CBB genome (approximately 8.2%) consists of TEs. Although these TEs display high diversity, most sequences are degenerate, with few full-length copies of LTR and DNA transposons and several complete and putatively active copies of non-LTR elements. MITEs constitute approximately 50% of the total TEs content, with a high proportion associated with DNA transposons in the Tc1-mariner superfamily.

Collapse

Characterization of new transposable element sub-families from white clover (Trifolium repens) using PCR amplification. Genetica 2016;144:577-589. [PMID: 27671023 DOI: 10.1007/s10709-016-9926-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 09/17/2016] [Indexed: 12/15/2022]

Monat C, Tando N, Tranchant-Dubreuil C, Sabot F. LTRclassifier: A website for fast structural LTR retrotransposons classification in plants. Mob Genet Elements 2016;6:e1241050. [PMID: 28090381 DOI: 10.1080/2159256x.2016.1241050] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Revised: 09/20/2016] [Accepted: 09/20/2016] [Indexed: 10/20/2022] Open

Möller S, Afgan E, Banck M, Bonnal RJP, Booth T, Chilton J, Cock PJA, Gumbel M, Harris N, Holland R, Kalaš M, Kaján L, Kibukawa E, Powel DR, Prins P, Quinn J, Sallou O, Strozzi F, Seemann T, Sloggett C, Soiland-Reyes S, Spooner W, Steinbiss S, Tille A, Travis AJ, Guimera R, Katayama T, Chapman BA. Community-driven development for computational biology at Sprints, Hackathons and Codefests. BMC Bioinformatics 2014;15 Suppl 14:S7. [PMID: 25472764 PMCID: PMC4255748 DOI: 10.1186/1471-2105-15-s14-s7] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Abstract

Background

Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches.

Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large.

Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together.

Results

This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities.

Conclusions

Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.

Collapse

Gremme G, Steinbiss S, Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:645-56. [PMID: 24091398 DOI: 10.1109/tcbb.2013.68] [Citation(s) in RCA: 229] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]