1
|
Kaabi B, Ahmed S, Soli R, Maktouf C. Analysis and Profiling of Leishmania major Expressed Sequence Tags. Ing Rech Biomed 2017. [DOI: 10.1016/j.irbm.2017.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
2
|
Wen JZ, Liao JY, Zheng LL, Xu H, Yang JH, Guan DG, Zhang SM, Zhou H, Qu LH. A contig-based strategy for the genome-wide discovery of microRNAs without complete genome resources. PLoS One 2014; 9:e88179. [PMID: 24516608 PMCID: PMC3917882 DOI: 10.1371/journal.pone.0088179] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 01/04/2014] [Indexed: 11/19/2022] Open
Abstract
MicroRNAs (miRNAs) are important regulators of many cellular processes and exist in a wide range of eukaryotes. High-throughput sequencing is a mainstream method of miRNA identification through which it is possible to obtain the complete small RNA profile of an organism. Currently, most approaches to miRNA identification rely on a reference genome for the prediction of hairpin structures. However, many species of economic and phylogenetic importance are non-model organisms without complete genome sequences, and this limits miRNA discovery. Here, to overcome this limitation, we have developed a contig-based miRNA identification strategy. We applied this method to a triploid species of edible banana (GCTCV-119, Musa spp. AAA group) and identified 180 pre-miRNAs and 314 mature miRNAs, which is three times more than those were predicted by the available dataset-based methods (represented by EST+GSS). Based on the recently published miRNA data set of Musa acuminate, the recall rate and precision of our strategy are estimated to be 70.6% and 92.2%, respectively, significantly better than those of EST+GSS-based strategy (10.2% and 50.0%, respectively). Our novel, efficient and cost-effective strategy facilitates the study of the functional and evolutionary role of miRNAs, as well as miRNA-based molecular breeding, in non-model species of economic or evolutionary interest.
Collapse
Affiliation(s)
- Jun-Zhi Wen
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, and School of Life Sciences, Sun Yat-sen University, Guangzhou, P. R. China
| | - Jian-You Liao
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Research Center of Medicine, Sun Yat-Sen Memorial Hospital, Sun Yat-sen University, Guangzhou, P. R. China
| | - Ling-Ling Zheng
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, and School of Life Sciences, Sun Yat-sen University, Guangzhou, P. R. China
| | - Hui Xu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, and School of Life Sciences, Sun Yat-sen University, Guangzhou, P. R. China
| | - Jian-Hua Yang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, and School of Life Sciences, Sun Yat-sen University, Guangzhou, P. R. China
| | - Dao-Gang Guan
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, and School of Life Sciences, Sun Yat-sen University, Guangzhou, P. R. China
| | - Si-Min Zhang
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Research Center of Medicine, Sun Yat-Sen Memorial Hospital, Sun Yat-sen University, Guangzhou, P. R. China
| | - Hui Zhou
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, and School of Life Sciences, Sun Yat-sen University, Guangzhou, P. R. China
| | - Liang-Hu Qu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, and School of Life Sciences, Sun Yat-sen University, Guangzhou, P. R. China
- * E-mail:
| |
Collapse
|
3
|
Shamloo-Dashtpagerdi R, Razi H, Lindlöf A, Niazi A, Dadkhodaie A, Ebrahimie E. Comparative analysis of expressed sequence tags (ESTs) from Triticum monococcum shoot apical meristem at vegetative and reproductive stages. Genes Genomics 2013. [DOI: 10.1007/s13258-013-0091-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
4
|
Hayashi Y, Shigenobu S, Watanabe D, Toga K, Saiki R, Shimada K, Bourguignon T, Lo N, Hojo M, Maekawa K, Miura T. Construction and characterization of normalized cDNA libraries by 454 pyrosequencing and estimation of DNA methylation levels in three distantly related termite species. PLoS One 2013; 8:e76678. [PMID: 24098800 PMCID: PMC3787108 DOI: 10.1371/journal.pone.0076678] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Accepted: 09/01/2013] [Indexed: 11/18/2022] Open
Abstract
In termites, division of labor among castes, categories of individuals that perform specialized tasks, increases colony-level productivity and is the key to their ecological success. Although molecular studies on caste polymorphism have been performed in termites, we are far from a comprehensive understanding of the molecular basis of this phenomenon. To facilitate future molecular studies, we aimed to construct expressed sequence tag (EST) libraries covering wide ranges of gene repertoires in three representative termite species, Hodotermopsis sjostedti, Reticulitermes speratus and Nasutitermes takasagoensis. We generated normalized cDNA libraries from whole bodies, except for guts containing microbes, of almost all castes, sexes and developmental stages and sequenced them with the 454 GS FLX titanium system. We obtained >1.2 million quality-filtered reads yielding >400 million bases for each of the three species. Isotigs, which are analogous to individual transcripts, and singletons were produced by assembling the reads and annotated using public databases. Genes related to juvenile hormone, which plays crucial roles in caste differentiation of termites, were identified from the EST libraries by BLAST search. To explore the potential for DNA methylation, which plays an important role in caste differentiation of honeybees, tBLASTn searches for DNA methyltransferases (dnmt1, dnmt2 and dnmt3) and methyl-CpG binding domain (mbd) were performed against the EST libraries. All four of these genes were found in the H. sjostedti library, while all except dnmt3 were found in R. speratus and N. takasagoensis. The ratio of the observed to the expected CpG content (CpG O/E), which is a proxy for DNA methylation level, was calculated for the coding sequences predicted from the isotigs and singletons. In all of the three species, the majority of coding sequences showed depletion of CpG O/E (less than 1), and the distributions of CpG O/E were bimodal, suggesting the presence of DNA methylation.
Collapse
Affiliation(s)
- Yoshinobu Hayashi
- Graduate School of Environmental Science, Hokkaido University, Sapporo, Japan
- School of Biological Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Shuji Shigenobu
- NIBB Core Research Facilities, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
- Department of Basic Biology, School of Life Science, Graduate University for Advanced Studies, Okazaki, Japan
| | - Dai Watanabe
- Graduate School of Environmental Science, Hokkaido University, Sapporo, Japan
- Graduate School of Science and Engineering, University of Toyama, Toyama, Japan
| | - Kouhei Toga
- Graduate School of Science and Engineering, University of Toyama, Toyama, Japan
- Graduate School of Bioagricultural Sciences, Nagoya University, Nagoya, Japan
| | - Ryota Saiki
- Graduate School of Science and Engineering, University of Toyama, Toyama, Japan
| | - Keisuke Shimada
- Graduate School of Science and Engineering, University of Toyama, Toyama, Japan
- Ishikawa Museum of Natural History, Kanazawa, Japan
| | - Thomas Bourguignon
- Graduate School of Environmental Science, Hokkaido University, Sapporo, Japan
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Nathan Lo
- School of Biological Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Masaru Hojo
- Tropical Biosphere Research Center, University of the Ryukyus, Okinawa, Japan
| | - Kiyoto Maekawa
- Graduate School of Science and Engineering, University of Toyama, Toyama, Japan
| | - Toru Miura
- Graduate School of Environmental Science, Hokkaido University, Sapporo, Japan
- * E-mail:
| |
Collapse
|
5
|
Abstract
Background The transcriptome of an organism can be studied with the analysis of expressed sequence tag (EST) data sets that offers a rapid and cost effective approach with several new and updated bioinformatics approaches and tools for assembly and annotation. The comprehensive analyses comprehend an organism along with the genome and proteome analysis. With the advent of large-scale sequencing projects and generation of sequence data at protein and cDNA levels, automated analysis pipeline is necessary to store, organize and annotate ESTs. Results TranSeqAnnotator is a workflow for large-scale analysis of transcriptomic data with the most appropriate bioinformatics tools for data management and analysis. The pipeline automatically cleans, clusters, assembles and generates consensus sequences, conceptually translates these into possible protein products and assigns putative function based on various DNA and protein similarity searches. Excretory/secretory (ES) proteins inferred from ESTs/short reads are also identified. The TranSeqAnnotator accepts FASTA format raw and quality ESTs along with protein and short read sequences and are analysed with user selected programs. After pre-processing and assembly, the dataset is annotated at the nucleotide, protein and ES protein levels. Conclusion TranSeqAnnotator has been developed in a Linux cluster, to perform an exhaustive and reliable analysis and provide detailed annotation. TranSeqAnnotator outputs gene ontologies, protein functional identifications in terms of mapping to protein domains and metabolic pathways. The pipeline is applied to annotate large EST datasets to identify several novel and known genes with therapeutic experimental validations and could serve as potential targets for parasite intervention. TransSeqAnnotator is freely available for the scientific community at http://estexplorer.biolinfo.org/TranSeqAnnotator/.
Collapse
|
6
|
Leu JH, Chen SH, Wang YB, Chen YC, Su SY, Lin CY, Ho JM, Lo CF. A review of the major penaeid shrimp EST studies and the construction of a shrimp transcriptome database based on the ESTs from four penaeid shrimp. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2011; 13:608-621. [PMID: 20401624 DOI: 10.1007/s10126-010-9286-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2010] [Accepted: 03/10/2010] [Indexed: 05/29/2023]
Abstract
By economic value, shrimp is currently the most important seafood commodity worldwide, and these animals are often the subject of scientific research in shrimp farming countries. High throughput methods, such as expressed sequence tags (ESTs), were originally developed to study human genomics, but they are now available for studying other important organisms, including shrimp. ESTs are short sequences generated by sequencing randomly selected cDNA clones from a cDNA library. This is currently the most efficient and powerful method for providing transcriptomic data for organisms with an uncharacterized genome. This review will summarize the sixteen major shrimp EST studies that have been conducted to date. In addition, we analyzed the EST data downloaded from NCBI dbEST for the four major penaeid shrimp species and constructed a database to host all of these EST data as well as our own analysis results. This database provides the shrimp aquaculture research community with an outline of the shrimp transcriptome as well as a tool for shrimp gene identification.
Collapse
Affiliation(s)
- Jiann-Horng Leu
- Center for Marine Bioenviroment and Biotechnology, National Taiwan Ocean University, No. 2, Pei-Ning Road, Keelung, 20224 Taiwan, Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Kozlov S, Grishin E. The mining of toxin-like polypeptides from EST database by single residue distribution analysis. BMC Genomics 2011; 12:88. [PMID: 21281459 PMCID: PMC3040730 DOI: 10.1186/1471-2164-12-88] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2010] [Accepted: 01/31/2011] [Indexed: 11/20/2022] Open
Abstract
Background Novel high throughput sequencing technologies require permanent development of bioinformatics data processing methods. Among them, rapid and reliable identification of encoded proteins plays a pivotal role. To search for particular protein families, the amino acid sequence motifs suitable for selective screening of nucleotide sequence databases may be used. In this work, we suggest a novel method for simplified representation of protein amino acid sequences named Single Residue Distribution Analysis, which is applicable both for homology search and database screening. Results Using the procedure developed, a search for amino acid sequence motifs in sea anemone polypeptides was performed, and 14 different motifs with broad and low specificity were discriminated. The adequacy of motifs for mining toxin-like sequences was confirmed by their ability to identify 100% toxin-like anemone polypeptides in the reference polypeptide database. The employment of novel motifs for the search of polypeptide toxins in Anemonia viridis EST dataset allowed us to identify 89 putative toxin precursors. The translated and modified ESTs were scanned using a special algorithm. In addition to direct comparison with the motifs developed, the putative signal peptides were predicted and homology with known structures was examined. Conclusions The suggested method may be used to retrieve structures of interest from the EST databases using simple amino acid sequence motifs as templates. The efficiency of the procedure for directed search of polypeptides is higher than that of most currently used methods. Analysis of 39939 ESTs of sea anemone Anemonia viridis resulted in identification of five protein precursors of earlier described toxins, discovery of 43 novel polypeptide toxins, and prediction of 39 putative polypeptide toxin sequences. In addition, two precursors of novel peptides presumably displaying neuronal function were disclosed.
Collapse
Affiliation(s)
- Sergey Kozlov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, ul. Miklukho-Maklaya 16/10, 117997 Moscow, Russia
| | | |
Collapse
|
8
|
Mochida K, Yoshida T, Sakurai T, Ogihara Y, Shinozaki K. TriFLDB: a database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics. PLANT PHYSIOLOGY 2009; 150:1135-46. [PMID: 19448038 PMCID: PMC2705016 DOI: 10.1104/pp.109.138214] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2009] [Accepted: 05/08/2009] [Indexed: 05/19/2023]
Abstract
The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp/.
Collapse
|
9
|
Tang Z, Choi JH, Hemmerich C, Sarangi A, Colbourne JK, Dong Q. ESTPiper--a web-based analysis pipeline for expressed sequence tags. BMC Genomics 2009; 10:174. [PMID: 19383159 PMCID: PMC2676306 DOI: 10.1186/1471-2164-10-174] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2008] [Accepted: 04/21/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND EST sequencing projects are increasing in scale and scope as the genome sequencing technologies migrate from core sequencing centers to individual research laboratories. Effectively, generating EST data is no longer a bottleneck for investigators. However, processing large amounts of EST data remains a non-trivial challenge for many. Web-based EST analysis tools are proving to be the most convenient option for biologists when performing their analysis, so these tools must continuously improve on their utility to keep in step with the growing needs of research communities. We have developed a web-based EST analysis pipeline called ESTPiper, which streamlines typical large-scale EST analysis components. RESULTS The intuitive web interface guides users through each step of base calling, data cleaning, assembly, genome alignment, annotation, analysis of gene ontology (GO), and microarray oligonucleotide probe design. Each step is modularized. Therefore, a user can execute them separately or together in batch mode. In addition, the user has control over the parameters used by the underlying programs. Extensive documentation of ESTPiper's functionality is embedded throughout the web site to facilitate understanding of the required input and interpretation of the computational results. The user can also download intermediate results and port files to separate programs for further analysis. In addition, our server provides a time-stamped description of the run history for reproducibility. The pipeline can also be installed locally, allowing researchers to modify ESTPiper to suit their own needs. CONCLUSION ESTPiper streamlines the typical process of EST analysis. The pipeline was initially designed in part to support the Daphnia pulex cDNA sequencing project. A web server hosting ESTPiper is provided at http://estpiper.cgb.indiana.edu/ to now support projects of all size. The software is also freely available from the authors for local installations.
Collapse
Affiliation(s)
- Zuojian Tang
- The Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana, USA
| | - Jeong-Hyeon Choi
- The Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana, USA
| | - Chris Hemmerich
- The Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana, USA
| | - Ankita Sarangi
- The Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana, USA
| | - John K Colbourne
- The Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana, USA
| | - Qunfeng Dong
- The Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana, USA
| |
Collapse
|
10
|
de Crécy-Lagard V, Hanson AD. Finding novel metabolic genes through plant-prokaryote phylogenomics. Trends Microbiol 2007; 15:563-70. [PMID: 17997099 DOI: 10.1016/j.tim.2007.10.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2007] [Revised: 10/12/2007] [Accepted: 10/12/2007] [Indexed: 12/26/2022]
Abstract
Plants and prokaryotes share thousands of genes. Those with known functions mostly encode enzymes of primary metabolism or other key biochemical components, and the same is almost surely true of those whose function is still obscure. The availability of hundreds of sequenced genomes and of rich postgenomic resources now makes possible the use of comparative genomics ('phylogenomics') of plants and prokaryotes to infer, and then verify, functions for such unknown genes. In this type of analysis, plant and prokaryote data each inform the search for function, and do so synergistically. This breaks with the past pattern of gene discovery, in which the information flow was most often unidirectional from prokaryotes to plants.
Collapse
Affiliation(s)
- Valérie de Crécy-Lagard
- Microbiology and Cell Science Department, University of Florida, Gainesville, FL 32611, USA.
| | | |
Collapse
|
11
|
Sanderson MJ, McMahon MM. Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evol Biol 2007; 7 Suppl 1:S3. [PMID: 17288576 PMCID: PMC1796612 DOI: 10.1186/1471-2148-7-s1-s3] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants. RESULTS A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead. CONCLUSION Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
Collapse
Affiliation(s)
- Michael J Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Michelle M McMahon
- Department of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
12
|
Abstract
Genomics and bioinformatics have great potential to help address numerous topics in ecology and evolution. Expressed sequence tags (ESTs) can bridge genomics and molecular ecology because they can provide a means of accessing the gene space of almost any organism. We review how ESTs have been used in molecular ecology research in the last several years by providing sequence data for the design of molecular markers, genome-wide studies of gene expression and selection, the identification of candidate genes underlying adaptation, and the basis for studies of gene family and genome evolution. Given the tremendous recent advances in inexpensive sequencing technologies, we predict that molecular ecologists will increasingly be developing and using EST collections in the years to come. With this in mind, we close our review by discussing aspects of EST resource development of particular relevance for molecular ecologists.
Collapse
Affiliation(s)
- Amy Bouck
- Department of Biology, Box 90338, Duke University, Durham, NC 27708, USA.
| | | |
Collapse
|