1
|
Shityakov S, Bencurova E, Förster C, Dandekar T. Modeling of shotgun sequencing of DNA plasmids using experimental and theoretical approaches. BMC Bioinformatics 2020; 21:132. [PMID: 32245400 PMCID: PMC7126183 DOI: 10.1186/s12859-020-3461-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 03/19/2020] [Indexed: 01/02/2023] Open
Abstract
Background Processing and analysis of DNA sequences obtained from next-generation sequencing (NGS) face some difficulties in terms of the correct prediction of DNA sequencing outcomes without the implementation of bioinformatics approaches. However, algorithms based on NGS perform inefficiently due to the generation of long DNA fragments, the difficulty of assembling them and the complexity of the used genomes. On the other hand, the Sanger DNA sequencing method is still considered to be the most reliable; it is a reliable choice for virtual modeling to build all possible consensus sequences from smaller DNA fragments. Results In silico and in vitro experiments were conducted: (1) to implement and test our novel sequencing algorithm, using the standard cloning vectors of different length and (2) to validate experimentally virtual shotgun sequencing using the PCR technique with the number of cycles from 1 to 9 for each reaction. Conclusions We applied a novel algorithm based on Sanger methodology to correctly predict and emphasize the performance of DNA sequencing techniques as well as in de novo DNA sequencing and its further application in synthetic biology. We demonstrate the statistical significance of our results. Graphical abstract ![]()
Collapse
Affiliation(s)
- Sergey Shityakov
- Department of Bioinformatics, University of Würzburg, 97074, Würzburg, Germany. .,Department of Psychiatry, China Medical University Hospital, 404, Taichung, Taiwan.
| | - Elena Bencurova
- Department of Bioinformatics, University of Würzburg, 97074, Würzburg, Germany
| | - Carola Förster
- Department of Anesthesia and Critical Care, Würzburg University Hospital, 97080, Würzburg, Germany
| | - Thomas Dandekar
- Department of Bioinformatics, University of Würzburg, 97074, Würzburg, Germany.
| |
Collapse
|
2
|
Carmona R, Zafra A, Seoane P, Castro AJ, Guerrero-Fernández D, Castillo-Castillo T, Medina-García A, Cánovas FM, Aldana-Montes JF, Navas-Delgado I, Alché JDD, Claros MG. ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome. FRONTIERS IN PLANT SCIENCE 2015; 6:625. [PMID: 26322066 PMCID: PMC4531244 DOI: 10.3389/fpls.2015.00625] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 07/28/2015] [Indexed: 05/18/2023]
Abstract
Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species.
Collapse
Affiliation(s)
- Rosario Carmona
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
- Plataforma Andaluza de Bioinformática, Edificio de Bioinnovación, Universidad de MálagaMálaga, Spain
| | - Adoración Zafra
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
| | - Pedro Seoane
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de MálagaMálaga, Spain
| | - Antonio J. Castro
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
| | - Darío Guerrero-Fernández
- Plataforma Andaluza de Bioinformática, Edificio de Bioinnovación, Universidad de MálagaMálaga, Spain
| | | | - Ana Medina-García
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de MálagaMálaga, Spain
| | - Francisco M. Cánovas
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de MálagaMálaga, Spain
| | - José F. Aldana-Montes
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de MálagaMálaga, Spain
| | - Ismael Navas-Delgado
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de MálagaMálaga, Spain
| | - Juan de Dios Alché
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
| | - M. Gonzalo Claros
- Plataforma Andaluza de Bioinformática, Edificio de Bioinnovación, Universidad de MálagaMálaga, Spain
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de MálagaMálaga, Spain
- *Correspondence: M. Gonzalo Claros, Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Campus de Teatinos, 29071 Málaga, Spain,
| |
Collapse
|
3
|
Canales J, Bautista R, Label P, Gómez-Maldonado J, Lesur I, Fernández-Pozo N, Rueda-López M, Guerrero-Fernández D, Castro-Rodríguez V, Benzekri H, Cañas RA, Guevara MA, Rodrigues A, Seoane P, Teyssier C, Morel A, Ehrenmann F, Le Provost G, Lalanne C, Noirot C, Klopp C, Reymond I, García-Gutiérrez A, Trontin JF, Lelu-Walter MA, Miguel C, Cervera MT, Cantón FR, Plomion C, Harvengt L, Avila C, Gonzalo Claros M, Cánovas FM. De novo assembly of maritime pine transcriptome: implications for forest breeding and biotechnology. PLANT BIOTECHNOLOGY JOURNAL 2014; 12:286-99. [PMID: 24256179 DOI: 10.1111/pbi.12136] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2013] [Revised: 09/24/2013] [Accepted: 09/26/2013] [Indexed: 05/21/2023]
Abstract
Maritime pine (Pinus pinasterAit.) is a widely distributed conifer species in Southwestern Europe and one of the most advanced models for conifer research. In the current work, comprehensive characterization of the maritime pine transcriptome was performed using a combination of two different next-generation sequencing platforms, 454 and Illumina. De novo assembly of the transcriptome provided a catalogue of 26 020 unique transcripts in maritime pine trees and a collection of 9641 full-length cDNAs. Quality of the transcriptome assembly was validated by RT-PCR amplification of selected transcripts for structural and regulatory genes. Transcription factors and enzyme-encoding transcripts were annotated. Furthermore, the available sequencing data permitted the identification of polymorphisms and the establishment of robust single nucleotide polymorphism (SNP) and simple-sequence repeat (SSR) databases for genotyping applications and integration of translational genomics in maritime pine breeding programmes. All our data are freely available at SustainpineDB, the P. pinaster expressional database. Results reported here on the maritime pine transcriptome represent a valuable resource for future basic and applied studies on this ecological and economically important pine species.
Collapse
Affiliation(s)
- Javier Canales
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Málaga, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Lee EJ, Kamli MR, Pokharel S, Malik A, Tareq KMA, Roouf Bhat A, Park HB, Lee YS, Kim S, Yang B, Young Chung K, Choi I. Expressed sequence tags for bovine muscle satellite cells, myotube formed-cells and adipocyte-like cells. PLoS One 2013; 8:e79780. [PMID: 24224006 PMCID: PMC3818215 DOI: 10.1371/journal.pone.0079780] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 09/25/2013] [Indexed: 12/25/2022] Open
Abstract
Background Muscle satellite cells (MSCs) represent a devoted stem cell population that is responsible for postnatal muscle growth and skeletal muscle regeneration. An important characteristic of MSCs is that they encompass multi potential mesenchymal stem cell activity and are able to differentiate into myocytes and adipocytes. To achieve a global view of the genes differentially expressed in MSCs, myotube formed-cells (MFCs) and adipocyte-like cells (ALCs), we performed large-scale EST sequencing of normalized cDNA libraries developed from bovine MSCs. Results A total of 24,192 clones were assembled into 3,333 clusters, 5,517 singletons and 3,842contigs. Functional annotation of these unigenes revealed that a large portion of the differentially expressed genes are involved in cellular and signaling processes. Database for Annotation, Visualization and Integrated Discovery (DAVID) functional analysis of three subsets of highly expressed gene lists (MSC233, MFC258, and ALC248) highlighted some common and unique biological processes among MSC, MFC and ALC. Additionally, genes that may be specific to MSC, MFC and ALC are reported here, and the role of dimethylargininedimethylaminohydrolase2 (DDAH2) during myogenesis and hemoglobinsubunitalpha2 (HBA2) during transdifferentiation in C2C12 were assayed as a case study. DDAH2 was up-regulated during myognesis and knockdown of DDAH2 by siRNA significantly decreased myogenin (MYOG) expression corresponding with the slight change in cell morphology. In contrast, HBA2 was up-regulated during ALC formation and resulted in decreased intracellular lipid accumulation and CD36 mRNA expression upon knockdown assay. Conclusion In this study, a large number of EST sequences were generated from the MSC, MFC and ALC. Overall, the collection of ESTs generated in this study provides a starting point for the identification of novel genes involved in MFC and ALC formation, which in turn offers a fundamental resource to enable better understanding of the mechanism of muscle differentiation and transdifferentiation.
Collapse
Affiliation(s)
- Eun Ju Lee
- School of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
- Bovine Genome Resources Bank, Yeungnam University, Gyeongsan, Republic of Korea
| | - Majid Rasool Kamli
- School of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
| | - Smritee Pokharel
- School of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
| | - Adeel Malik
- School of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
| | - K. M. A. Tareq
- School of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
| | - Abdul Roouf Bhat
- School of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
| | - Hee-Bok Park
- Institute of Agriculture and Life Sciences, Gyeongsang National University, Jinju, Republic of Korea
| | - Yong Seok Lee
- Bovine Genome Resources Bank, Yeungnam University, Gyeongsan, Republic of Korea
- Department of Life Science and Biotechnology, College of Natural Sciences, Soonchunhyang University, Asan, Korea
| | - SangHoon Kim
- Department of Biology, Kyung Hee University, Seoul, Republic of Korea
| | - Bohsuk Yang
- Hanwoo Experiment Station, National Institute of Animal Science, RDA, Pyeongchang, Seoul, Republic of Korea
| | - Ki Young Chung
- Hanwoo Experiment Station, National Institute of Animal Science, RDA, Pyeongchang, Seoul, Republic of Korea
- * E-mail: (IC); (KYC)
| | - Inho Choi
- School of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
- Bovine Genome Resources Bank, Yeungnam University, Gyeongsan, Republic of Korea
- * E-mail: (IC); (KYC)
| |
Collapse
|
5
|
Alnemer LM, Seetan RI, Bassi FM, Chitraranjan C, Helsene A, Loree P, Goshn SB, Gu YQ, Luo MC, Iqbal MJ, Lazo GR, Denton AM, Kianian SF. Wheat Zapper: a flexible online tool for colinearity studies in grass genomes. Funct Integr Genomics 2013; 13:11-7. [PMID: 23474942 DOI: 10.1007/s10142-013-0317-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2012] [Revised: 02/08/2013] [Accepted: 02/12/2013] [Indexed: 10/27/2022]
Abstract
In the course of evolution, the genomes of grasses have maintained an observable degree of gene order conservation. The information available for already sequenced genomes can be used to predict the gene order of nonsequenced species by means of comparative colinearity studies. The "Wheat Zapper" application presented here performs on-demand colinearity analysis between wheat, rice, Sorghum, and Brachypodium in a simple, time efficient, and flexible manner. This application was specifically designed to provide plant scientists with a set of tools, comprising not only synteny inference, but also automated primer design, intron/exon boundaries prediction, visual representation using the graphic tool Circos 0.53, and the possibility of downloading FASTA sequences for downstream applications. Quality of the "Wheat Zapper" prediction was confirmed against the genome of maize, with good correlation (r > 0.83) observed between the gene order predicted on the basis of synteny and their actual position on the genome. Further, the accuracy of "Wheat Zapper" was calculated at 0.65 considering the "Genome Zipper" application as the "gold" standard. The differences between these two tools are amply discussed, making the point that "Wheat Zapper" is an accurate and reliable on-demand tool that is sure to benefit the cereal scientific community. The Wheat Zapper is available at http://wge.ndsu.nodak.edu/wheatzapper/ .
Collapse
Affiliation(s)
- Loai M Alnemer
- Computer Information Systems Department, The University of Jordan, Amman, 11942, Jordan
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Dhandapani V, Choi SR, Paul P, Kim YK, Ramchiary N, Hur Y, Lim YP. Development of EST database and transcriptome analysis in the leaves of Brassica rapa using a newly developed pipeline. Genes Genomics 2012. [DOI: 10.1007/s13258-012-0015-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
7
|
Zhao R, Cao Y, Xu H, Lv L, Qiao D, Cao Y. ANALYSIS OF EXPRESSED SEQUENCE TAGS FROM THE GREEN ALGA DUNALIELLA SALINA (CHLOROPHYTA)(1). JOURNAL OF PHYCOLOGY 2011; 47:1454-1460. [PMID: 27020369 DOI: 10.1111/j.1529-8817.2011.01071.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The unicellular green alga Dunaliella salina (Dunal) Teodor. is a novel model photosynthetic eukaryote for studying photosystems, high salinity acclimation, and carotenoid accumulation. In spite of such significance, there have been limited studies on the Dunaliella genome transcriptome and proteome. To further investigate D. salina, a cDNA library was constructed and sequenced. Here, we present the analysis of the 2,282 expressed sequence tags (ESTs) generated together with 3,990 ESTs from dbEST. A total of 4,148 unique sequences (UniSeqs) were identified, of which 56.1% had sequence similarity with Uniprot entries, suggesting that a large number of unique genes may be harbored by Dunaliella. Additionally, protein family domains were identified to further characterize these sequences. Then, we also compared EST sequences with different complete eukaryotic genomes from several animals, plants, and fungi. We observed notable differences between D. salina and other organisms. This EST collection and its annotation provided a significant resource for basic and applied research on D. salina and laid the foundation for a systematic analysis of the transcriptome basis of green algae development and diversification.
Collapse
Affiliation(s)
- Rui Zhao
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, College of Life Sciences, Sichuan University, Chengdu, China, 610064
| | - Yu Cao
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, College of Life Sciences, Sichuan University, Chengdu, China, 610064
| | - Hui Xu
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, College of Life Sciences, Sichuan University, Chengdu, China, 610064
| | - Linfeng Lv
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, College of Life Sciences, Sichuan University, Chengdu, China, 610064
| | - Dairong Qiao
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, College of Life Sciences, Sichuan University, Chengdu, China, 610064
| | - Yi Cao
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, College of Life Sciences, Sichuan University, Chengdu, China, 610064
| |
Collapse
|
8
|
Marconi TG, Costa EA, Miranda HR, Mancini MC, Cardoso-Silva CB, Oliveira KM, Pinto LR, Mollinari M, Garcia AA, Souza AP. Functional markers for gene mapping and genetic diversity studies in sugarcane. BMC Res Notes 2011; 4:264. [PMID: 21798036 PMCID: PMC3158763 DOI: 10.1186/1756-0500-4-264] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2011] [Accepted: 07/28/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The database of sugarcane expressed sequence tags (EST) offers a great opportunity for developing molecular markers that are directly associated with important agronomic traits. The development of new EST-SSR markers represents an important tool for genetic analysis. In sugarcane breeding programs, functional markers can be used to accelerate the process and select important agronomic traits, especially in the mapping of quantitative traits loci (QTL) and plant resistant pathogens or qualitative resistance loci (QRL). The aim of this work was to develop new simple sequence repeat (SSR) markers in sugarcane using the sugarcane expressed sequence tag (SUCEST database). FINDINGS A total of 365 EST-SSR molecular markers with trinucleotide motifs were developed and evaluated in a collection of 18 genotypes of sugarcane (15 varieties and 3 species). In total, 287 of the EST-SSRs markers amplified fragments of the expected size and were polymorphic in the analyzed sugarcane varieties. The number of alleles ranged from 2-18, with an average of 6 alleles per locus, while polymorphism information content values ranged from 0.21-0.92, with an average of 0.69. The discrimination power was high for the majority of the EST-SSRs, with an average value of 0.80. Among the markers characterized in this study some have particular interest, those that are related to bacterial defense responses, generation of precursor metabolites and energy and those involved in carbohydrate metabolic process. CONCLUSIONS These EST-SSR markers presented in this work can be efficiently used for genetic mapping studies of segregating sugarcane populations. The high Polymorphism Information Content (PIC) and Discriminant Power (DP) presented facilitate the QTL identification and marker-assisted selection due the association with functional regions of the genome became an important tool for the sugarcane breeding program.
Collapse
Affiliation(s)
- Thiago G Marconi
- Centro de Biologia Molecular e Engenharia Genética (CBMEG) - Universidade Estadual de Campinas (UNICAMP), Cidade Universitária Zeferino Vaz, CP 6010, CEP 13083-970, Campinas, SP, Brazil.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Fernández-Pozo N, Canales J, Guerrero-Fernández D, Villalobos DP, Díaz-Moreno SM, Bautista R, Flores-Monterroso A, Guevara MÁ, Perdiguero P, Collada C, Cervera MT, Soto A, Ordás R, Cantón FR, Avila C, Cánovas FM, Claros MG. EuroPineDB: a high-coverage web database for maritime pine transcriptome. BMC Genomics 2011; 12:366. [PMID: 21762488 PMCID: PMC3152544 DOI: 10.1186/1471-2164-12-366] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2011] [Accepted: 07/15/2011] [Indexed: 11/30/2022] Open
Abstract
Background Pinus pinaster is an economically and ecologically important species that is becoming a woody gymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply. Therefore, the expressed portion of the genome has to be characterised and the results and annotations have to be stored in dedicated databases. Description EuroPineDB is the largest sequence collection available for a single pine species, Pinus pinaster (maritime pine), since it comprises 951 641 raw sequence reads obtained from non-normalised cDNA libraries and high-throughput sequencing from adult (xylem, phloem, roots, stem, needles, cones, strobili) and embryonic (germinated embryos, buds, callus) maritime pine tissues. Using open-source tools, sequences were optimally pre-processed, assembled, and extensively annotated (GO, EC and KEGG terms, descriptions, SNPs, SSRs, ORFs and InterPro codes). As a result, a 10.5× P. pinaster genome was covered and assembled in 55 322 UniGenes. A total of 32 919 (59.5%) of P. pinaster UniGenes were annotated with at least one description, revealing at least 18 466 different genes. The complete database, which is designed to be scalable, maintainable, and expandable, is freely available at: http://www.scbi.uma.es/pindb/. It can be retrieved by gene libraries, pine species, annotations, UniGenes and microarrays (i.e., the sequences are distributed in two-colour microarrays; this is the only conifer database that provides this information) and will be periodically updated. Small assemblies can be viewed using a dedicated visualisation tool that connects them with SNPs. Any sequence or annotation set shown on-screen can be downloaded. Retrieval mechanisms for sequences and gene annotations are provided. Conclusions The EuroPineDB with its integrated information can be used to reveal new knowledge, offers an easy-to-use collection of information to directly support experimental work (including microarray hybridisation), and provides deeper knowledge on the maritime pine transcriptome.
Collapse
Affiliation(s)
- Noé Fernández-Pozo
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Campus de Teatinos s/n, Universidad de Málaga, 29071 Málaga, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Choi HK, Goes da Silva F, Lim HJ, Iandolino A, Seo YS, Lee SW, Cook DR. Diagnosis of Pierce's disease using biomarkers specific to Xylella fastidiosa rRNA and Vitis vinifera gene expression. PHYTOPATHOLOGY 2010; 100:1089-99. [PMID: 20839944 DOI: 10.1094/phyto-01-10-0014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Pierce's disease (PD), caused by Xylella fastidiosa, represents one of the most damaging diseases of cultivated grape. Management of PD in the vineyard often relies on the removal of infected individuals, which otherwise serve as a source of inoculum for nearby healthy vines. Effective implementation of such control measures requires early diagnosis, which is complicated by the fact that infected vines often harbor high titers of the pathogen in advance of visual symptom development. Here, we report a biomarker system that simultaneously monitors Xylella-induced plant transcripts as well as Xylella ribosomal (r)RNA. Plant biomarker genes were derived from a combination of in silico analysis of grape expressed sequence tags and validation by means of reverse-transcriptase polymerase chain reaction (RT-PCR). Four genes upregulated upon PD infection were individually multiplexed with an X. fastidiosa marker rRNA and scored using either real-time RT-PCR or gel-based conventional RT-PCR techniques. The system was sufficiently sensitive to detect both host gene transcript and pathogen rRNA in asymptomatic infected plants. Moreover, these plant biomarker genes were not induced by water deficit, which is a component of PD development. Such biomarker genes could have utility for disease control by aiding early detection and as a screening tool in breeding programs.
Collapse
Affiliation(s)
- H-K Choi
- Department of Genetic Engineering, Dong-A University, Busan, Republic of Korea.
| | | | | | | | | | | | | |
Collapse
|
11
|
Antonescu C, Antonescu V, Sultana R, Quackenbush J. Using the DFCI gene index databases for biological discovery. ACTA ACUST UNITED AC 2010; Chapter 1:1.6.1-1.6.36. [PMID: 20205187 DOI: 10.1002/0471250953.bi0106s29] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Collapse
|
12
|
A gene family-based method for interspecies comparisons of sequencing-based transcriptomes and its use in environmental adaptation analysis. J Genet Genomics 2010; 37:205-18. [PMID: 20347830 DOI: 10.1016/s1673-8527(09)60039-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2009] [Revised: 01/20/2010] [Accepted: 02/03/2010] [Indexed: 11/21/2022]
Abstract
We describe a new method for sequencing-based cross-species transcriptome comparisons and define a new metric for evaluating gene expression across species using protein-coding families as units of comparison. Using this measure transcriptomes from different species were evaluated by mapping them to gene families and integrating the mapping results with expression data. Statistical tests were applied to the transcriptome evaluation results to identify differentially expressed families. A Perl program named Pro-Diff was compiled to implement this method. To evaluate the method and provide an example of its use, two liver EST transcriptomes from two closely related fish that live in different temperature zones were compared. One EST library was from a recent sequencing project of Dissosticus mawsoni, a fish that lives in cold Antarctic sea waters, while the other was newly sequenced data (available at: http://www.fishgenome.org/polarbank/) from Notothenia angustata, a species that lives in temperate near-shore water of southern New Zealand. Results from the comparison were consistent with results inferred from phenotype differences and also with our previously published Gene Ontology-based method. The Pro-Diff program and operation manual can be downloaded from: http://www.fishgenome.org/download/Prodiff.rar.
Collapse
|
13
|
O'Neil ST, Dzurisin JDK, Carmichael RD, Lobo NF, Emrich SJ, Hellmann JJ. Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon. BMC Genomics 2010; 11:310. [PMID: 20478048 PMCID: PMC2887415 DOI: 10.1186/1471-2164-11-310] [Citation(s) in RCA: 112] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 05/17/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Several recent studies have demonstrated the use of Roche 454 sequencing technology for de novo transcriptome analysis. Low error rates and high coverage also allow for effective SNP discovery and genetic diversity estimates. However, genetically diverse datasets, such as those sourced from natural populations, pose challenges for assembly programs and subsequent analysis. Further, estimating the effectiveness of transcript discovery using Roche 454 transcriptome data is still a difficult task. RESULTS Using the Roche 454 FLX Titanium platform, we sequenced and assembled larval transcriptomes for two butterfly species: the Propertius duskywing, Erynnis propertius (Lepidoptera: Hesperiidae) and the Anise swallowtail, Papilio zelicaon (Lepidoptera: Papilionidae). The Expressed Sequence Tags (ESTs) generated represent a diverse sample drawn from multiple populations, developmental stages, and stress treatments. Despite this diversity, > 95% of the ESTs assembled into long (> 714 bp on average) and highly covered (> 9.6x on average) contigs. To estimate the effectiveness of transcript discovery, we compared the number of bases in the hit region of unigenes (contigs and singletons) to the length of the best match silkworm (Bombyx mori) protein--this "ortholog hit ratio" gives a close estimate on the amount of the transcript discovered relative to a model lepidopteran genome. For each species, we tested two assembly programs and two parameter sets; although CAP3 is commonly used for such data, the assemblies produced by Celera Assembler with modified parameters were chosen over those produced by CAP3 based on contig and singleton counts as well as ortholog hit ratio analysis. In the final assemblies, 1,413 E. propertius and 1,940 P. zelicaon unigenes had a ratio > 0.8; 2,866 E. propertius and 4,015 P. zelicaon unigenes had a ratio > 0.5. CONCLUSIONS Ultimately, these assemblies and SNP data will be used to generate microarrays for ecoinformatics examining climate change tolerance of different natural populations. These studies will benefit from high quality assemblies with few singletons (less than 26% of bases for each assembled transcriptome are present in unassembled singleton ESTs) and effective transcript discovery (over 6,500 of our putative orthologs cover at least 50% of the corresponding model silkworm gene).
Collapse
Affiliation(s)
- Shawn T O'Neil
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
| | | | | | | | | | | |
Collapse
|
14
|
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinformatics 2010. [PMID: 20089148 DOI: 10.1186/1471‐2105‐11‐38] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. RESULTS SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. CONCLUSIONS SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.
Collapse
|
15
|
Falgueras J, Lara AJ, Fernández-Pozo N, Cantón FR, Pérez-Trabado G, Claros MG. SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinformatics 2010; 11:38. [PMID: 20089148 PMCID: PMC2832897 DOI: 10.1186/1471-2105-11-38] [Citation(s) in RCA: 142] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2009] [Accepted: 01/20/2010] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. RESULTS SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. CONCLUSIONS SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.
Collapse
Affiliation(s)
- Juan Falgueras
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain
| | - Antonio J Lara
- Plataforma Andaluza de Bioinformática, Universidad de Málaga, 29071 Málaga, Spain
| | - Noé Fernández-Pozo
- Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, 29071 Málaga, Spain
| | - Francisco R Cantón
- Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, 29071 Málaga, Spain
| | - Guillermo Pérez-Trabado
- Plataforma Andaluza de Bioinformática, Universidad de Málaga, 29071 Málaga, Spain
- Departamento de Arquitectura de Computadores, Universidad de Málaga, Málaga, Spain
| | - M Gonzalo Claros
- Plataforma Andaluza de Bioinformática, Universidad de Málaga, 29071 Málaga, Spain
- Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, 29071 Málaga, Spain
| |
Collapse
|
16
|
Mochida K, Yoshida T, Sakurai T, Ogihara Y, Shinozaki K. TriFLDB: a database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics. PLANT PHYSIOLOGY 2009; 150:1135-46. [PMID: 19448038 PMCID: PMC2705016 DOI: 10.1104/pp.109.138214] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2009] [Accepted: 05/08/2009] [Indexed: 05/19/2023]
Abstract
The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp/.
Collapse
|
17
|
Bekel T, Henckel K, Küster H, Meyer F, Mittard Runte V, Neuweger H, Paarmann D, Rupp O, Zakrzewski M, Pühler A, Stoye J, Goesmann A. The Sequence Analysis and Management System – SAMS-2.0: Data management and sequence analysis adapted to changing requirements from traditional sanger sequencing to ultrafast sequencing technologies. J Biotechnol 2009; 140:3-12. [DOI: 10.1016/j.jbiotec.2009.01.006] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
18
|
Scheibye-Alsing K, Hoffmann S, Frankel A, Jensen P, Stadler PF, Mang Y, Tommerup N, Gilchrist MJ, Nygård AB, Cirera S, Jørgensen CB, Fredholm M, Gorodkin J. Sequence assembly. Comput Biol Chem 2008; 33:121-36. [PMID: 19152793 DOI: 10.1016/j.compbiolchem.2008.11.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2008] [Revised: 11/28/2008] [Accepted: 11/28/2008] [Indexed: 01/20/2023]
Abstract
Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and plays an important role in processing the information generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly programs. We describe the basic principles of computational assembly along with the main concerns, such as repetitive sequences in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html.
Collapse
Affiliation(s)
- K Scheibye-Alsing
- Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Argout X, Fouet O, Wincker P, Gramacho K, Legavre T, Sabau X, Risterucci AM, Da Silva C, Cascardo J, Allegre M, Kuhn D, Verica J, Courtois B, Loor G, Babin R, Sounigo O, Ducamp M, Guiltinan MJ, Ruiz M, Alemanno L, Machado R, Phillips W, Schnell R, Gilmour M, Rosenquist E, Butler D, Maximova S, Lanaud C. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions. BMC Genomics 2008; 9:512. [PMID: 18973681 PMCID: PMC2642826 DOI: 10.1186/1471-2164-9-512] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2008] [Accepted: 10/30/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao. RESULTS Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species.Gene Ontology (GO) annotation was applied to distribute the ESTs among the main GO categories.A specific information system (ESTtik) was constructed to process, store and manage this EST collection allowing the user to query a database.To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection.A large collection of new genetic markers was provided by this ESTs collection. CONCLUSION This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow the construction of a high density gene map of T. cacao. This EST collection represents a unique and important molecular resource for T. cacao study and improvement, facilitating the discovery of candidate genes for important T. cacao trait variation.
Collapse
Affiliation(s)
- Xavier Argout
- Biological Systems Department, UMR DAP TA 40/03, CIRAD, Montpellier, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008; 9:386. [PMID: 18803844 PMCID: PMC2563014 DOI: 10.1186/1471-2105-9-386] [Citation(s) in RCA: 2348] [Impact Index Per Article: 146.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2008] [Accepted: 09/19/2008] [Indexed: 02/01/2023] Open
Abstract
Background Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers. Results A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats. Conclusion The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data.
Collapse
Affiliation(s)
- F Meyer
- Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S Cass Avenue, Argonne, IL 60439, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Chojnowski JL, Braun EL. Turtle isochore structure is intermediate between amphibians and other amniotes. Integr Comp Biol 2008; 48:454-62. [PMID: 21669806 DOI: 10.1093/icb/icn062] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Vertebrate genomes are comprised of isochores that are relatively long (>100 kb) regions with a relatively homogenous (either GC-rich or AT-rich) base composition and with rather sharp boundaries with neighboring isochores. Mammals and living archosaurs (birds and crocodilians) have heterogeneous genomes that include very GC-rich isochores. In sharp contrast, the genomes of amphibians and fishes are more homogeneous and they have a lower overall GC content. Because DNA with higher GC content is more thermostable, the elevated GC content of mammalian and archosaurian DNA has been hypothesized to be an adaptation to higher body temperatures. This hypothesis can be tested by examining structure of isochores across the reptilian clade, which includes the archosaurs, testudines (turtles), and lepidosaurs (lizards and snakes), because reptiles exhibit diverse body sizes, metabolic rates, and patterns of thermoregulation. This study focuses on a comparative analysis of a new set of expressed genes of the red-eared slider turtle and orthologs of the turtle genes in mammalian (human, mouse, dog, and opossum), archosaurian (chicken and alligator), and amphibian (western clawed frog) genomes. EST (expressed sequence tag) data from a turtle cDNA library enriched for genes that have specialized functions (developmental genes) revealed using the GC content of the third-codon-position to examine isochore structure requires careful consideration of the types of genes examined. The more highly expressed genes (e.g., housekeeping genes) are more likely to be GC-rich than are genes with specialized functions. However, the set of highly expressed turtle genes demonstrated that the turtle genome has a GC content that is intermediate between the GC-poor amphibians and the GC-rich mammals and archosaurs. There was a strong correlation between the GC content of all turtle genes and the GC content of other vertebrate genes, with the slope of the line describing this relationship also indicating that the isochore structure of turtles is intermediate between that of amphibians and other amniotes. These data are consistent with some thermal hypotheses of isochore evolution, but we believe that the credible set of models for isochore evolution still includes a variety of models. These data expand the amount of genomic data available from reptiles upon which future studies of reptilian genomics can build.
Collapse
Affiliation(s)
- Jena L Chojnowski
- Department of Zoology, University of Florida, 223 Bartram Hall, PO Box 118525, Gainesville, FL 32611, USA
| | | |
Collapse
|
22
|
Laney SJ, Buttaro CJ, Visconti S, Pilotte N, Ramzy RMR, Weil GJ, Williams SA. A reverse transcriptase-PCR assay for detecting filarial infective larvae in mosquitoes. PLoS Negl Trop Dis 2008; 2:e251. [PMID: 18560545 PMCID: PMC2413423 DOI: 10.1371/journal.pntd.0000251] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Accepted: 05/20/2008] [Indexed: 11/19/2022] Open
Abstract
Background Existing molecular assays for filarial parasite DNA in mosquitoes cannot distinguish between infected mosquitoes that contain any stage of the parasite and infective mosquitoes that harbor third stage larvae (L3) capable of establishing new infections in humans. We now report development of a molecular L3-detection assay for Brugia malayi in vectors based on RT-PCR detection of an L3-activated gene transcript. Methodology/Principal Findings Candidate genes identified by bioinformatics analysis of EST datasets across the B. malayi life cycle were initially screened by PCR using cDNA libraries as templates. Stage-specificity was confirmed using RNA isolated from infected mosquitoes. Mosquitoes were collected daily for 14 days after feeding on microfilaremic cat blood. RT-PCR was performed with primer sets that were specific for individual candidate genes. Many promising candidates with strong expression in the L3 stage were excluded because of low-level transcription in less mature larvae. One transcript (TC8100, which encodes a particular form of collagen) was only detected in mosquitoes that contained L3 larvae. This assay detects a single L3 in a pool of 25 mosquitoes. Conclusions/Significance This L3-activated gene transcript, combined with a control transcript (tph-1, accession # U80971) that is constitutively expressed by all vector-stage filarial larvae, can be used to detect filarial infectivity in pools of mosquito vectors. This general approach (detection of stage-specific gene transcripts from eukaryotic pathogens) may also be useful for detecting infective stages of other vector-borne parasites. The Global Programme for the Elimination of Lymphatic Filariasis (GPELF) was launched in the year 1998 with the goal of eliminating lymphatic filariasis by 2020. As the success of mass drug administration (MDA) in the global program drives the rates of infection in endemic populations to very low levels, the development of new, highly sensitive methods are required for monitoring transmission by screening mosquitoes for the presence of L3 infective larvae. The current method of mosquito dissection to identify L3 larvae is laborious and insensitive and is not amenable to screening large numbers of mosquitoes. Existing molecular assays for the detection of filarial parasite DNA in mosquitoes are sensitive and can easily screen large numbers of vectors. However, current PCR-based methods cannot distinguish between infected mosquitoes that contain any stage of the parasite and infective mosquitoes that harbor third stage larvae (L3) capable of establishing new infections in humans. This paper reports the first development of a molecular L3-detection assay for a filarial parasite in mosquitoes based on RT-PCR detection of an L3-activated gene transcript. This strategy of detecting stage-specific messenger RNA from filarial parasites may also prove useful for detecting infective stages of other vector-borne pathogens.
Collapse
Affiliation(s)
- Sandra J Laney
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA.
| | | | | | | | | | | | | |
Collapse
|
23
|
Kim CK, Choi JW, Park D, Kang MJ, Seol YJ, Hyun DY, Hahn JH. PlantGI: a database for searching gene indices in agricultural plants developed at NIAB, Korea. Bioinformation 2008; 2:344-5. [PMID: 18685722 PMCID: PMC2478734 DOI: 10.6026/97320630002344] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2008] [Revised: 04/22/2008] [Accepted: 04/28/2008] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED The Plant Gene Index (PlantGI) database is developed as a web-based search system with search capabilities for keywords to provide information on gene indices specifically for agricultural plants. The database contains specific Gene Index information for ten agricultural species, namely, rice, Chinese cabbage, wheat, maize, soybean, barley, mushroom, Arabidopsis, hot pepper and tomato. PlantGI differs from other Gene Index databases in being specific to agricultural plant species and thus complements services from similar other developments. The database includes options for interactive mining of EST CONTIGS and assembled EST data for user specific keyword queries. The current version of PlantGI contains a total of 34,000 EST CONTIGS data for rice (8488 records), wheat (8560 records), maize (4570 records), soybean (3726 records), barley (3417 records), Chinese cabbage (3602 records), tomato (1236 records), hot pepper (998 records), mushroom (130 records) and Arabidopsis (8 records). AVAILABILITY The database is available for free at http://www.niab.go.kr/nabic/.
Collapse
Affiliation(s)
- Chang Kug Kim
- Bioinformatics Division, National Institute of Agricultural Biotechnology (NIAB), Suwon 441-707, Korea
| | - Ji Weon Choi
- Postharvest Technology Div., National Horticultural Research Institute (NHRI), Suwon 440-706, Korea
| | - DongSuk Park
- Bioinformatics Division, National Institute of Agricultural Biotechnology (NIAB), Suwon 441-707, Korea
| | - Man Jung Kang
- Bioinformatics Division, National Institute of Agricultural Biotechnology (NIAB), Suwon 441-707, Korea
| | - Young-Joo Seol
- Bioinformatics Division, National Institute of Agricultural Biotechnology (NIAB), Suwon 441-707, Korea
| | - Do Yoon Hyun
- Genetic Resources Div., NIAB, Suwon 441-707, Korea
| | - Jang Ho Hahn
- Bioinformatics Division, National Institute of Agricultural Biotechnology (NIAB), Suwon 441-707, Korea
| |
Collapse
|
24
|
Lee Y, Quackenbush J. Using the TIGR gene index databases for biological discovery. ACTA ACUST UNITED AC 2008; Chapter 1:Unit 1.6. [PMID: 18428690 DOI: 10.1002/0471250953.bi0106s03] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Collapse
Affiliation(s)
- Yuandan Lee
- The Institute for Genomic Research, Rockville, Maryland, USA
| | | |
Collapse
|
25
|
Mukesh M, Kataria RS, Kumar V, Pandey D, Sodhi M, Ahlawat SP, Sobti RC, Mishra BP. Construction and Evaluation of Directionally Cloned cDNA Libraries from Lactating and Non-lactating Mammary Gland of River Buffalo ( Bubalus bubalis): A Resource for Gene Identification in Bubaline Genome. JOURNAL OF APPLIED ANIMAL RESEARCH 2008. [DOI: 10.1080/09712119.2008.9706902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
26
|
Abstract
The sequencing of the human genome and ensuing wave of data generation have brought new light upon the extent and importance of alternative splicing as an RNA regulatory mechanism. Alternative splicing could potentially explain the complexity of protein repertoire during evolution, and defects in the splicing mechanism are responsible for diseases as complex as cancer. Among the challenges that rise in light of these discoveries are cataloguing splice variation in the human and other eukaryotic genomes, and identifying and characterizing the splicing regulatory elements that control their expression. Bioinformatics efforts tackling these two questions are just at the beginning. This article is a survey of these methods.
Collapse
Affiliation(s)
- Liliana Florea
- Department of Computer Science, George Washington University, Academic Center-Rm 714, Washington DC 20052, USA.
| |
Collapse
|
27
|
Abstract
Data analysis of serial analysis of gene expression (SAGE) tag experiments begins with the extraction of tags from single-pass sequence files of ditag concatemers. When using DNA base quality values generated during base calling, it is possible to control the false-positive discovery rate of unique tags. This chapter describes how to set up a system for generating tag lists from quality associated sequence data.
Collapse
|
28
|
Abstract
In recent years, genome-wide detection of alternative splicing based on Expressed Sequence Tag (EST) sequence alignments with mRNA and genomic sequences has dramatically expanded our understanding of the role of alternative splicing in functional regulation. This chapter reviews the data, methodology, and technical challenges of these genome-wide analyses of alternative splicing, and briefly surveys some of the uses to which such alternative splicing databases have been put. For example, with proper alternative splicing database schema design, it is possible to query genome-wide for alternative splicing patterns that are specific to particular tissues, disease states (e.g., cancer), gender, or developmental stages. EST alignments can be used to estimate exon inclusion or exclusion level of alternatively spliced exons and evolutionary changes for various species can be inferred from exon inclusion level. Such databases can also help automate design of probes for RT-PCR and microarrays, enabling high throughput experimental measurement of alternative splicing.
Collapse
|
29
|
Chojnowski JL, Franklin J, Katsu Y, Iguchi T, Guillette LJ, Kimball RT, Braun EL. Patterns of Vertebrate Isochore Evolution Revealed by Comparison of Expressed Mammalian, Avian, and Crocodilian Genes. J Mol Evol 2007; 65:259-66. [PMID: 17674077 DOI: 10.1007/s00239-007-9003-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2006] [Accepted: 05/18/2007] [Indexed: 10/23/2022]
Abstract
Vertebrate genomes are mosaics of isochores, defined as long (>100 kb) regions with relatively homogeneous within-region base composition. Birds and mammals have more GC-rich isochores than amphibians and fish, and the GC-rich isochores of birds and mammals have been suggested to be an adaptation to homeothermy. If this hypothesis is correct, all poikilothermic (cold-blooded) vertebrates, including the nonavian reptiles, are expected to lack a GC-rich isochore structure. Previous studies using various methods to examine isochore structure in crocodilians, turtles, and squamates have led to different conclusions. We collected more than 6000 expressed sequence tags (ESTs) from the American alligator to overcome sample size limitations suggested to be the fundamental problem in the previous reptilian studies. The alligator ESTs were assembled and aligned with their human, mouse, chicken, and western clawed frog orthologs, resulting in 366 alignments. Analyses of third-codon-position GC content provided conclusive evidence that the poikilothermic alligator has GC-rich isochores, like homeothermic birds and mammals. We placed these results in a theoretical framework able to unify available models of isochore evolution. The data collected for this study allowed us to reject the models that explain the evolution of GC content using changes in body temperature associated with the transition from poikilothermy to homeothermy. Falsification of these models places fundamental constraints upon the plausible pathways for the evolution of isochores.
Collapse
Affiliation(s)
- Jena L Chojnowski
- Department of Zoology, University of Florida, Gainesville, FL 32611, USA.
| | | | | | | | | | | | | |
Collapse
|
30
|
Affiliation(s)
- Dmitrij Frishman
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenchaftszentrum Weihenstephan, 85350 Freising, Germany
| |
Collapse
|
31
|
Masoudi-Nejad A, Goto S, Jauregui R, Ito M, Kawashima S, Moriya Y, Endo TR, Kanehisa M. EGENES: transcriptome-based plant database of genes with metabolic pathway information and expressed sequence tag indices in KEGG. PLANT PHYSIOLOGY 2007; 144:857-66. [PMID: 17468225 PMCID: PMC1914165 DOI: 10.1104/pp.106.095059] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2006] [Accepted: 04/18/2007] [Indexed: 05/15/2023]
Abstract
EGENES is a knowledge-based database for efficient analysis of plant expressed sequence tags (ESTs) that was recently added to the KEGG suite of databases. It links plant genomic information with higher order functional information in a single database. It also provides gene indices for each genome. The genomic information in EGENES is a collection of EST contigs constructed from assembly of ESTs. Due to the extremely large genomes of plant species, the bulk collection of data such as ESTs is a quick way to capture a complete repertoire of genes expressed in an organism. Using ESTs for reconstructing metabolic pathways is a new expansion in KEGG and provides researchers with a new resource for species in which only EST sequences are available. Functional annotation in EGENES is a process of linking a set of genes/transcripts in each genome with a network of interacting molecules in the cell. EGENES is a multispecies, integrated resource consisting of genomic, chemical, and network information containing a complete set of building blocks (genes and molecules) and wiring diagrams (biological pathways) to represent cellular functions. Using EGENES, genome-based pathway annotation and EST-based annotation can now be compared and mutually validated. The ultimate goals of EGENES will be to: bring new plant species into KEGG by clustering and annotating ESTs; abstract knowledge and principles from large-scale plant EST data; and improve computational prediction of systems of higher complexity. EGENES will be updated at least once a year. EGENES is publicly available and is accessible by the following link or by KEGG's navigation system (http://www.genome.jp/kegg-bin/create_kegg_menu?category=plants_egenes).
Collapse
Affiliation(s)
- Ali Masoudi-Nejad
- Laboratory of Bioknowledge Systems, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto 611-0011, Japan.
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Borges JC, Cagliari TC, Ramos CHI. Expression and variability of molecular chaperones in the sugarcane expressome. JOURNAL OF PLANT PHYSIOLOGY 2007; 164:505-13. [PMID: 16687190 DOI: 10.1016/j.jplph.2006.03.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2005] [Accepted: 03/19/2006] [Indexed: 05/09/2023]
Abstract
Molecular chaperones perform folding assistance in newly synthesized polypeptides preventing aggregation processes, recovering proteins from aggregates, among other important cellular functions. Thus their study presents great biotechnological importance. The present work discusses the mining for chaperone-related sequences within the sugarcane EST genome project database, which resulted in approximately 300 different sequences. Since molecular chaperones are highly conserved in most organisms studied so far, the number of sequences related to these proteins in sugarcane was very similar to the number found in the Arabidopsis thaliana genome. The Hsp70 family was the main molecular chaperone system present in the sugarcane expressome. However, many other relevant molecular chaperones systems were also present. A digital RNA blot analysis showed that 5'ESTs from all molecular chaperones were found in every sugarcane library, despite their heterogeneous expression profiles. The results presented here suggest the importance of molecular chaperones to polypeptide metabolism in sugarcane cells, based on their abundance and variability. Finally, these data have being used to guide more in deep analysis, permitting the choice of specific targets to study.
Collapse
Affiliation(s)
- Júlio C Borges
- Laboratório Nacional de Luz Síncrotron, Caixa Postal 6192, 13084-971 Campinas SP, Brazil.
| | | | | |
Collapse
|
33
|
Longhorn SJ, Foster PG, Vogler AP. The nematode?arthropod clade revisited: phylogenomic analyses from ribosomal protein genes misled by shared evolutionary biases. Cladistics 2007; 23:130-144. [DOI: 10.1111/j.1096-0031.2006.00132.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
34
|
Chen FC, Wang SS, Chaw SM, Huang YT, Chuang TJ. Plant Gene and Alternatively Spliced Variant Annotator. A plant genome annotation pipeline for rice gene and alternatively spliced variant identification with cross-species expressed sequence tag conservation from seven plant species. PLANT PHYSIOLOGY 2007; 143:1086-95. [PMID: 17220363 PMCID: PMC1820933 DOI: 10.1104/pp.106.092460] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The completion of the rice (Oryza sativa) genome draft has brought unprecedented opportunities for genomic studies of the world's most important food crop. Previous rice gene annotations have relied mainly on ab initio methods, which usually yield a high rate of false-positive predictions and give only limited information regarding alternative splicing in rice genes. Comparative approaches based on expressed sequence tags (ESTs) can compensate for the drawbacks of ab initio methods because they can simultaneously identify experimental data-supported genes and alternatively spliced transcripts. Furthermore, cross-species EST information can be used to not only offset the insufficiency of same-species ESTs but also derive evolutionary implications. In this study, we used ESTs from seven plant species, rice, wheat (Triticum aestivum), maize (Zea mays), barley (Hordeum vulgare), sorghum (Sorghum bicolor), soybean (Glycine max), and Arabidopsis (Arabidopsis thaliana), to annotate the rice genome. We developed a plant genome annotation pipeline, Plant Gene and Alternatively Spliced Variant Annotator (PGAA). Using this approach, we identified 852 genes (931 isoforms) not annotated in other widely used databases (i.e. the Institute for Genomic Research, National Center for Biotechnology Information, and Rice Annotation Project) and found 87% of them supported by both rice and nonrice EST evidence. PGAA also identified more than 44,000 alternatively spliced events, of which approximately 20% are not observed in the other three annotations. These novel annotations represent rich opportunities for rice genome research, because the functions of most of our annotated genes are currently unknown. Also, in the PGAA annotation, the isoforms with non-rice-EST-supported exons are significantly enriched in transporter activity but significantly underrepresented in transcription regulator activity. We have also identified potential lineage-specific and conserved isoforms, which are important markers in evolutionary studies. The data and the Web-based interface, RiceViewer, are available for public access at http://RiceViewer.genomics.sinica.edu.tw/.
Collapse
Affiliation(s)
- Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, National Health Research Institute, Miaoli County 350, Taiwan
| | | | | | | | | |
Collapse
|
35
|
|
36
|
Pickart MA, Klee EW, Nielsen AL, Sivasubbu S, Mendenhall EM, Bill BR, Chen E, Eckfeldt CE, Knowlton M, Robu ME, Larson JD, Deng Y, Schimmenti LA, Ellis LB, Verfaillie CM, Hammerschmidt M, Farber SA, Ekker SC. Genome-wide reverse genetics framework to identify novel functions of the vertebrate secretome. PLoS One 2006; 1:e104. [PMID: 17218990 PMCID: PMC1766371 DOI: 10.1371/journal.pone.0000104] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Accepted: 11/12/2006] [Indexed: 11/18/2022] Open
Abstract
Background Understanding the functional role(s) of the more than 20,000 proteins of the vertebrate genome is a major next step in the post-genome era. The approximately 4,000 co-translationally translocated (CTT) proteins – representing the vertebrate secretome – are important for such vertebrate-critical processes as organogenesis. However, the role(s) for most of these genes is currently unknown. Results We identified 585 putative full-length zebrafish CTT proteins using cross-species genomic and EST-based comparative sequence analyses. We further investigated 150 of these genes (Figure 1) for unique function using morpholino-based analysis in zebrafish embryos. 12% of the CTT protein-deficient embryos resulted in specific developmental defects, a notably higher rate of gene function annotation than the 2%–3% estimate from random gene mutagenesis studies. Conclusion(s) This initial collection includes novel genes required for the development of vascular, hematopoietic, pigmentation, and craniofacial tissues, as well as lipid metabolism, and organogenesis. This study provides a framework utilizing zebrafish for the systematic assignment of biological function in a vertebrate genome.
Collapse
Affiliation(s)
- Michael A. Pickart
- Department of Oral Sciences and Minnesota Craniofacial Research Training Program MinnCResT, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Eric W. Klee
- Laboratory Medicine and Pathology and Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Aubrey L. Nielsen
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Arnold and Mabel Beckman Center for Transposon Research, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Sridhar Sivasubbu
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Arnold and Mabel Beckman Center for Transposon Research, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Eric M. Mendenhall
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Arnold and Mabel Beckman Center for Transposon Research, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Medicine, Division of Hematology, Oncology, and Transplantation, and Stem Cell Institute, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Brent R. Bill
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Arnold and Mabel Beckman Center for Transposon Research, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Pediatrics, Genetics and Metabolism and Department of Ophthalmology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Eleanor Chen
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Arnold and Mabel Beckman Center for Transposon Research, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Craig E. Eckfeldt
- Department of Medicine, Division of Hematology, Oncology, and Transplantation, and Stem Cell Institute, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Michelle Knowlton
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Arnold and Mabel Beckman Center for Transposon Research, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Mara E. Robu
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Arnold and Mabel Beckman Center for Transposon Research, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Oral Sciences and Minnesota Craniofacial Research Training Program MinnCResT, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Jon D. Larson
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Arnold and Mabel Beckman Center for Transposon Research, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Yun Deng
- Carnegie Institute of Washington, Baltimore, Maryland, United States of America
| | - Lisa A. Schimmenti
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Pediatrics, Genetics and Metabolism and Department of Ophthalmology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Lynda B.M. Ellis
- Laboratory Medicine and Pathology and Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Catherine M. Verfaillie
- Department of Medicine, Division of Hematology, Oncology, and Transplantation, and Stem Cell Institute, University of Minnesota, Minneapolis, Minnesota, United States of America
| | | | - Steven A. Farber
- Carnegie Institute of Washington, Baltimore, Maryland, United States of America
| | - Stephen C. Ekker
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Arnold and Mabel Beckman Center for Transposon Research, University of Minnesota, Minneapolis, Minnesota, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
37
|
Venier P, De Pittà C, Pallavicini A, Marsano F, Varotto L, Romualdi C, Dondero F, Viarengo A, Lanfranchi G. Development of mussel mRNA profiling: Can gene expression trends reveal coastal water pollution? Mutat Res 2006; 602:121-34. [PMID: 17010391 DOI: 10.1016/j.mrfmmm.2006.08.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2006] [Revised: 08/21/2006] [Accepted: 08/21/2006] [Indexed: 05/12/2023]
Abstract
Marine bivalves of the genus Mytilus are intertidal filter-feeders commonly used as biosensors of coastal pollution. Mussels adjust their functions to ordinary environmental changes, e.g. temperature fluctuations and emersion-related hypoxia, and react to various contaminants, accumulated from the surrounding water and defining a potential health risk for sea-food consumers. Despite the increasing use of mussels in environmental monitoring, their genome and gene functions are largely unexplored. Hence, we started the systematic identification of expressed sequence tags and prepared a cDNA microarray of Mytilus galloprovincialis including 1714 mussel probes (76% singletons, approximately 50% putatively identified transcripts) plus unrelated controls. To assess the potential use of the gene set represented in MytArray 1.0, we tested different tissues and groups of mussels. The resulting data highlighted the transcriptional specificity of the mussel tissues. Further testing of the most responsive digestive gland allowed correct classification of mussels treated with mixtures of heavy metals or organic contaminants (expression changes of specific genes discriminated the two pollutant cocktails). Similar analyses made a distinction possible between mussels living in the Venice lagoon (Italy) at the petrochemical district and mussels close to the open sea. The suggestive presence of gene markers tracing organic contaminants more than heavy metals in mussels from the industrial district is consistent with reported trends of chemical contamination. Further study is necessary in order to understand how much gene expression profiles can disclose the signatures of pollutants in mussel cells and tissues. Nevertheless, the gene expression patterns described in this paper support a wider characterization of the mussel transcriptome and point to the development of novel environmental metrics.
Collapse
Affiliation(s)
- Paola Venier
- Department of Biology and CRIBI Biotechnology Centre, University of Padova, Via Bassi 58/B, 35131 Padova, Italy
| | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Abstract
Genomics and bioinformatics have great potential to help address numerous topics in ecology and evolution. Expressed sequence tags (ESTs) can bridge genomics and molecular ecology because they can provide a means of accessing the gene space of almost any organism. We review how ESTs have been used in molecular ecology research in the last several years by providing sequence data for the design of molecular markers, genome-wide studies of gene expression and selection, the identification of candidate genes underlying adaptation, and the basis for studies of gene family and genome evolution. Given the tremendous recent advances in inexpensive sequencing technologies, we predict that molecular ecologists will increasingly be developing and using EST collections in the years to come. With this in mind, we close our review by discussing aspects of EST resource development of particular relevance for molecular ecologists.
Collapse
Affiliation(s)
- Amy Bouck
- Department of Biology, Box 90338, Duke University, Durham, NC 27708, USA.
| | | |
Collapse
|
39
|
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow. BMC Bioinformatics 2006; 7:513. [PMID: 17123449 PMCID: PMC1676024 DOI: 10.1186/1471-2105-7-513] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2006] [Accepted: 11/23/2006] [Indexed: 11/25/2022] Open
Abstract
Background Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. Results In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. Conclusion JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from or .
Collapse
|
40
|
Arhondakis S, Clay O, Bernardi G. Compositional properties of human cDNA libraries: practical implications. FEBS Lett 2006; 580:5772-8. [PMID: 17022979 DOI: 10.1016/j.febslet.2006.09.034] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2006] [Revised: 09/12/2006] [Accepted: 09/19/2006] [Indexed: 01/28/2023]
Abstract
The strikingly wide and bimodal gene distribution exhibited by the human genome has prompted us to study the correlations between EST-counts (expression levels) and base composition of genes, especially since existing data are contradictory. Here we investigate how cDNA library preparation affects the GC distributions of ESTs and/or genes found in the library, and address consequences for expression studies. We observe that strongly anomalous GC distributions often indicate experimental biases or deficits during their preparation. We propose the use of compositional distributions of raw ESTs from a cDNA library, and/or of the genes they represent, as a simple and effective tool for quality control.
Collapse
Affiliation(s)
- Stilianos Arhondakis
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
| | | | | |
Collapse
|
41
|
Turner JD, Schote AB, Macedo JA, Pelascini LPL, Muller CP. Tissue specific glucocorticoid receptor expression, a role for alternative first exon usage? Biochem Pharmacol 2006; 72:1529-37. [PMID: 16930562 DOI: 10.1016/j.bcp.2006.07.005] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2006] [Revised: 07/04/2006] [Accepted: 07/11/2006] [Indexed: 01/28/2023]
Abstract
The CpG island upstream of the GR is highly structured and conserved at least in all the animal species that have been investigated. Sequence alignment of these CpG islands shows inter-species homology ranging from 64 to 99%. This 3.1kb CpG rich region upstream of the GR exon 2 encodes 5' untranslated mRNA regions. These CpG rich regions are organised into multiple first exons and, as we and others have postulated, each with its own promoter region. Alternative mRNA transcript variants are obtained by the splicing of these alternative first exons to a common acceptor site in the second exon of the GR. Exon 2 contains an in-frame stop codon immediately upstream of the ATG start codon to ensure that this 5' heterogeneity remains untranslated, and that the sequence and structure of the GR is unaffected. Tissue specific differential usage of exon 1s has been observed in a range of human tissues, and to a lesser extent in the rat and mouse. The GR expression level is tightly controlled within each tissue or cell type at baseline and upon stimulation. We suggest that no single promoter region may be capable of containing all the necessary promoter elements and yet preserve the necessary proximity to the transcription initiation site to produce such a plethora of responses. Thus we further suggest that alternative first exons each under the control of specific transcription factors control both the tissue specific GR expression and are involved in the tissue specific GR transcriptional response to stimulation. Spreading the necessary promoter elements over multiple promoter regions, each with an associated alternative transcription initiation site would appear to vastly increase the capacity for transcriptional control of GR.
Collapse
Affiliation(s)
- Jonathan D Turner
- Institute of Immunology, Laboratoire National de Santé, 20A rue Auguste Lumière, L-1950 Luxembourg, Grand Duchy of Luxembourg
| | | | | | | | | |
Collapse
|
42
|
Abstract
MOTIVATION Repeat sequences in ESTs are a source of problems, in particular for clustering. ESTs are therefore commonly masked against a library of known repeats. High quality repeat libraries are available for the widely studied organisms, but for most other organisms the lack of such libraries is likely to compromise the quality of EST analysis. RESULTS We present a fast, flexible and library-less method for masking repeats in EST sequences, based on match statistics within the EST collection. The method is not linked to a particular clustering algorithm. Extensive testing on datasets using different clustering methods and a genomic mapping as reference shows that this method gives results that are better than or as good as those obtained using RepeatMasker with a repeat library. AVAILABILITY The implementation of RBR is available under the terms of the GPL from http://www.ii.uib.no/~ketil/bioinformatics CONTACT ketil.malde@bccs.uib.no SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ketil Malde
- Computational Biology Unit, Bergen Centre for Computational Sciences, University of Bergen, Norway.
| | | | | | | |
Collapse
|
43
|
Lorenzini DM, da Silva PI, Soares MB, Arruda P, Setubal J, Daffre S. Discovery of immune-related genes expressed in hemocytes of the tarantula spider Acanthoscurria gomesiana. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2006; 30:545-56. [PMID: 16386302 DOI: 10.1016/j.dci.2005.09.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2005] [Revised: 08/28/2005] [Accepted: 09/02/2005] [Indexed: 05/05/2023]
Abstract
The present study reports the identification of immune related transcripts from hemocytes of the spider Acanthoscurria gomesiana by high throughput sequencing of expressed sequence tags (ESTs). To generate ESTs from hemocytes, two cDNA libraries were prepared: one by directional cloning (primary) and the other by the normalization of the first (normalized). A total of 7584 clones were sequenced and the identical ESTs were clustered, resulting in 3723 assembled sequences (AS). At least 20% of these sequences are putative novel genes. The automatic functional annotation of AS based on Gene Ontology revealed several abundant transcripts related to the following functional classes: hemocyanin, lectin, and structural constituents of ribosome and cytoskeleton. From this annotation, 73 transcripts possibly involved in immune response were also identified, suggesting the existence of several molecular processes not previously described for spiders, such as: pathogen recognition, coagulation, complement activation, cell adhesion and intracellular signaling pathway for the activation of cellular defenses.
Collapse
Affiliation(s)
- Daniel M Lorenzini
- Departamento de Parasitologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, Avenida Prof. Lineu Prestes, 1374, CEP 05508-900 São Paulo, SP, Brazil
| | | | | | | | | | | |
Collapse
|
44
|
Wang JPZ, Lindsay BG, Cui L, Wall PK, Marion J, Zhang J, dePamphilis CW. Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries. BMC Bioinformatics 2005; 6:300. [PMID: 16351717 PMCID: PMC1369009 DOI: 10.1186/1471-2105-6-300] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2004] [Accepted: 12/13/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed. RESULTS We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four Arabidopsis thaliana EST sets suggests that the number of expressed genes present in four different cDNA libraries of Arabidopsis thaliana varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%. CONCLUSION The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing.
Collapse
Affiliation(s)
- Ji-Ping Z Wang
- Department of Statistics, Northwestern University, Evanston, IL 60208, USA
| | - Bruce G Lindsay
- Department of Statistics, Penn State University, University Park 16802, USA
| | - Liying Cui
- Department of Biology, Penn State University, University Park 16802, USA
| | - P Kerr Wall
- Department of Biology, Penn State University, University Park 16802, USA
| | - Josh Marion
- Department of Computer Science, Penn State University, University Park 16802, USA
| | - Jiaxuan Zhang
- College of Software, Tsinghua University, Beijing, 100086, PR China
| | | |
Collapse
|
45
|
Wu Y, Rozenfeld S, Defferrard A, Ruggiero K, Udall JA, Kim H, Llewellyn DJ, Dennis ES. Cycloheximide treatment of cotton ovules alters the abundance of specific classes of mRNAs and generates novel ESTs for microarray expression profiling. Mol Genet Genomics 2005; 274:477-93. [PMID: 16208490 DOI: 10.1007/s00438-005-0049-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2005] [Accepted: 08/19/2005] [Indexed: 10/25/2022]
Abstract
Fibres of cotton (Gossypium hirsutum L.) are single elongated epidermal cells that start to develop on the outer surface of cotton ovules on the day of anthesis. Little is known about the control of fibre initiation and development. As a first step towards discovering important genes involved in fibre initiation and development using a genomics approach, we report technical advances aimed at reducing redundancy and increasing coverage for anonymous cDNA microarrays in this study. Cotton ovule cDNA libraries (both normalised and un-normalised) from around the time of fibre initial formation have been prepared and partially characterised by sequencing. Re-association-based normalisation partially reduced library redundancy and increased representation of novel sequences. However, another library generated from in vitro cultured cotton ovules treated with the protein synthesis inhibitor, cycloheximide, showed a significantly altered gene representation including a greater proportion of protein phosphorylation genes, transport genes and transcription factors and a much reduced proportion of protein synthesis genes than were identified in the conventional types of libraries. Over 10,000 expressed sequence tag (EST) clones randomly selected from the three libraries were printed on microarray slides and used to assess gene expression in tissue cultured ovules with and without cycloheximide treatment. The microarray results showed that cycloheximide had a dramatic effect in modifying the pattern of the gene expression in cultured ovules, affecting the same types of genes identified in the preliminary analysis on relative EST abundance in the different ovule cDNA libraries. Cycloheximide clearly provided a simple and useful method for enriching novel gene sequences for genomic studies.
Collapse
Affiliation(s)
- Yingru Wu
- CSIRO Plant Industry, GPO Box 1600, Canberra, ACT, 2601, Australia
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Firnhaber C, Pühler A, Küster H. EST sequencing and time course microarray hybridizations identify more than 700 Medicago truncatula genes with developmental expression regulation in flowers and pods. PLANTA 2005; 222:269-83. [PMID: 15968508 DOI: 10.1007/s00425-005-1543-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2004] [Accepted: 02/25/2005] [Indexed: 05/03/2023]
Abstract
To evaluate the molecular mechanisms during pod and seed formation in legumes, starting with the development of reproductive organs, we constructed two cDNA libraries from developing flowers (MtFLOW) and pods including seeds (MtPOSE) of the model plant Medicago truncatula Gaertner. A total of 2,516 expressed sequence tags (ESTs) clustered into 1,776 nonredundant sequences (2k-set), which were annotated and assigned to functional classes. While about 30% of the ESTs encoded proteins of yet unknown function, typical annotations pointed to seed storage proteins, LTPs and lipoxygenases. The 2k-set was used to upgrade Mt6k-RIT microarrays (Küster et al. in J Biotechnol 108: 95, 2004) to Mt8k versions representing approximately 6,300 nonredundant M. truncatula genes. These were used to perform time course expression profiling studies based on hybridizations of samples that covered eight different developmental stages from flower buds to almost mature pods versus leaves as a common reference. About 180 up- and 70 downregulated genes were typically found for each stage and in total, 782 genes were either twofold up- or downregulated in at least one of the eight stages investigated. Based on this set, a combination of self-organizing map and hierarchical clustering revealed genes displaying expression regulation during characteristic stages of M. truncatula flower and pod development. Amongst those, several genes encoded proteins related to seed metabolism and development including novel regulators and proteins involved in signaling.
Collapse
Affiliation(s)
- Christian Firnhaber
- Lehrstuhl für Genetik, Fakultät für Biologie, Universität Bielefeld, Postfach 100131, 33501 Bielefeld, Germany
| | | | | |
Collapse
|
47
|
da Silva FG, Iandolino A, Al-Kayal F, Bohlmann MC, Cushman MA, Lim H, Ergul A, Figueroa R, Kabuloglu EK, Osborne C, Rowe J, Tattersall E, Leslie A, Xu J, Baek J, Cramer GR, Cushman JC, Cook DR. Characterizing the grape transcriptome. Analysis of expressed sequence tags from multiple Vitis species and development of a compendium of gene expression during berry development. PLANT PHYSIOLOGY 2005; 139:574-97. [PMID: 16219919 PMCID: PMC1255978 DOI: 10.1104/pp.105.065748] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2005] [Revised: 07/28/2005] [Accepted: 08/04/2005] [Indexed: 05/04/2023]
Abstract
We report the analysis and annotation of 146,075 expressed sequence tags from Vitis species. The majority of these sequences were derived from different cultivars of Vitis vinifera, comprising an estimated 25,746 unique contig and singleton sequences that survey transcription in various tissues and developmental stages and during biotic and abiotic stress. Putatively homologous proteins were identified for over 17,752 of the transcripts, with 1,962 transcripts further subdivided into one or more Gene Ontology categories. A simple structured vocabulary, with modules for plant genotype, plant development, and stress, was developed to describe the relationship between individual expressed sequence tags and cDNA libraries; the resulting vocabulary provides query terms to facilitate data mining within the context of a relational database. As a measure of the extent to which characterized metabolic pathways were encompassed by the data set, we searched for homologs of the enzymes leading from glycolysis, through the oxidative/nonoxidative pentose phosphate pathway, and into the general phenylpropanoid pathway. Homologs were identified for 65 of these 77 enzymes, with 86% of enzymatic steps represented by paralogous genes. Differentially expressed transcripts were identified by means of a stringent believability index cutoff of > or =98.4%. Correlation analysis and two-dimensional hierarchical clustering grouped these transcripts according to similarity of expression. In the broadest analysis, 665 differentially expressed transcripts were identified across 29 cDNA libraries, representing a range of developmental and stress conditions. The groupings revealed expected associations between plant developmental stages and tissue types, with the notable exception of abiotic stress treatments. A more focused analysis of flower and berry development identified 87 differentially expressed transcripts and provides the basis for a compendium that relates gene expression and annotation to previously characterized aspects of berry development and physiology. Comparison with published results for select genes, as well as correlation analysis between independent data sets, suggests that the inferred in silico patterns of expression are likely to be an accurate representation of transcript abundance for the conditions surveyed. Thus, the combined data set reveals the in silico expression patterns for hundreds of genes in V. vinifera, the majority of which have not been previously studied within this species.
Collapse
|
48
|
Sczyrba A, Beckstette M, Brivanlou AH, Giegerich R, Altmann CR. XenDB: full length cDNA prediction and cross species mapping in Xenopus laevis. BMC Genomics 2005; 6:123. [PMID: 16162280 PMCID: PMC1261260 DOI: 10.1186/1471-2164-6-123] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2005] [Accepted: 09/14/2005] [Indexed: 11/23/2022] Open
Abstract
Background Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. Description Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. Conclusion The results of the analysis have been stored in a publicly available database XenDB . A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches. Supplementary material can be found at .
Collapse
Affiliation(s)
- Alexander Sczyrba
- AG Praktische Informatik, Technische Fakultät, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Michael Beckstette
- AG Praktische Informatik, Technische Fakultät, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Ali H Brivanlou
- The Rockefeller University, Laboratory of Molecular Vertebrate Embryology, 1230 York Avenue, New York, NY 10021, USA
| | - Robert Giegerich
- AG Praktische Informatik, Technische Fakultät, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Curtis R Altmann
- FSU College of Medicine, Department of Biomedical Sciences, 1269 W. Call Street, Tallahassee, FL 32306, USA
| |
Collapse
|
49
|
Abstract
Recent studies showing that most "messenger" RNAs do not encode proteins finally explain the long-standing discrepancy between the small number of protein-coding genes found in vertebrate genomes and the much larger and ever-increasing number of polyadenylated transcripts identified by tag-sampling or microarray-based methods. Exploring the role and diversity of these numerous noncoding RNAs now constitutes a main challenge in transcription research.
Collapse
Affiliation(s)
- Jean-Michel Claverie
- Structural and Genomics Information Laboratory, CNRS UPR 2589, Institut de Biologie Structurale et Microbiologie, 31 chemin Joseph Aiguier, Marseille 13402, France.
| |
Collapse
|
50
|
Min XJ, Butler G, Storms R, Tsang A. TargetIdentifier: a webserver for identifying full-length cDNAs from EST sequences. Nucleic Acids Res 2005; 33:W669-72. [PMID: 15980559 PMCID: PMC1160197 DOI: 10.1093/nar/gki436] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
TargetIdentifier is a webserver that identifies full-length cDNA sequences from the expressed sequence tag (EST)-derived contig and singleton data. To accomplish this TargetIdentifier uses BLASTX alignments as a guide to locate protein coding regions and potential start and stop codons. This information is then used to determine whether the EST-derived sequences include their translation start codons. The algorithm also uses the BLASTX output to assign putative functions to the query sequences. The server is available at .
Collapse
Affiliation(s)
- Xiang Jia Min
- Centre for Structural and Functional Genomics, Concordia University, Montreal, Quebec H4B 1R6, Canada.
| | | | | | | |
Collapse
|