1
|
Laydon DJ, Bangham CRM, Asquith B. Estimating T-cell repertoire diversity: limitations of classical estimators and a new approach. Philos Trans R Soc Lond B Biol Sci 2015; 370:20140291. [PMID: 26150657 PMCID: PMC4528489 DOI: 10.1098/rstb.2014.0291] [Citation(s) in RCA: 118] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2015] [Indexed: 12/26/2022] Open
Abstract
A highly diverse T-cell receptor (TCR) repertoire is a fundamental property of an effective immune system, and is associated with efficient control of viral infections and other pathogens. However, direct measurement of total TCR diversity is impossible. The diversity is high and the frequency distribution of individual TCRs is heavily skewed; the diversity therefore cannot be captured in a blood sample. Consequently, estimators of the total number of TCR clonotypes that are present in the individual, in addition to those observed, are essential. This is analogous to the 'unseen species problem' in ecology. We review the diversity (species richness) estimators that have been applied to T-cell repertoires and the methods used to validate these estimators. We show that existing approaches have significant shortcomings, and frequently underestimate true TCR diversity. We highlight our recently developed estimator, DivE, which can accurately estimate diversity across a range of immunological and biological systems.
Collapse
MESH Headings
- Animals
- Gene Rearrangement, T-Lymphocyte
- Genetic Variation
- Host-Pathogen Interactions/genetics
- Host-Pathogen Interactions/immunology
- Humans
- Lymphocyte Count
- Models, Genetic
- Models, Immunological
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/immunology
- Statistics, Nonparametric
- T-Lymphocytes/immunology
Collapse
Affiliation(s)
- Daniel J Laydon
- Section of Immunology, Wright-Fleming Institute, Imperial College School of Medicine, London W2 1PG, UK
| | - Charles R M Bangham
- Section of Immunology, Wright-Fleming Institute, Imperial College School of Medicine, London W2 1PG, UK
| | - Becca Asquith
- Section of Immunology, Wright-Fleming Institute, Imperial College School of Medicine, London W2 1PG, UK
| |
Collapse
|
2
|
García-Ortega LF, Martínez O. How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq. PLoS One 2015; 10:e0130262. [PMID: 26107654 PMCID: PMC4479379 DOI: 10.1371/journal.pone.0130262] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 05/19/2015] [Indexed: 01/02/2023] Open
Abstract
RNA-seq experiments estimate the number of genes expressed in a transcriptome as well as their relative frequencies. However, an undetermined number of genes can remain undetected due to their low expression relative to the sample size (sequence depth). Estimation of the true number of genes expressed in a transcriptome is essential in order to determine which genes are exclusively expressed in specific tissues or under particular conditions. A reliable estimate of the true number of expressed genes is also required to accurately measure transcriptome changes and to predict the sequencing depth needed to increase the proportion of detected genes. This problem is analogous to ecological sampling problems such as estimating the number of species at a given site. Here we present a non-parametric estimator for the number of undetected genes as well as for the extra sample size needed to detect a given proportion of the undetected genes. Our estimators are superior to ones already published by having smaller standard errors and biases. We applied our method to a set of 32 publicly available RNA-seq experiments, including the evaluation of 311 individually sequenced libraries. We found that in the majority of the cases more than one thousand genes are undetected, and that on average approximately 6% of the expressed genes per accession remain undetected. This figure increases to approximately 10% if individual sequencing libraries are analyzed. Our method is also applicable to metagenomic experiments. Using our method, the number of undetected genes as well as the sample size needed to detect them can be calculated, leading to more accurate and complete gene expression studies.
Collapse
Affiliation(s)
- Luis Fernando García-Ortega
- Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav-IPN), Irapuato, Guanajuato, México
| | - Octavio Martínez
- Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav-IPN), Irapuato, Guanajuato, México
| |
Collapse
|
3
|
Rösner S, Brandl R, Segelbacher G, Lorenc T, Müller J. Noninvasive genetic sampling allows estimation of capercaillie numbers and population structure in the Bohemian Forest. EUR J WILDLIFE RES 2014. [DOI: 10.1007/s10344-014-0848-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
4
|
Altman N, Leebens-Mack J, Zahn L, Chanderbali A, Tian D, Werner L, Ma H, dePamphilis C. Behind the Scenes: Planning a Multispecies Microarray Experiment. ACTA ACUST UNITED AC 2013. [DOI: 10.1080/09332480.2006.10722799] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
5
|
Nesvizhskii AI. Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments. Proteomics 2012; 12:1639-55. [PMID: 22611043 DOI: 10.1002/pmic.201100537] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false-positive protein interactions present in unfiltered data sets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS data sets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data.
Collapse
|
6
|
Li YZ, Pan YH, Sun CB, Dong HT, Luo XL, Wang ZQ, Tang JL, Chen B. An ordered EST catalogue and gene expression profiles of cassava (Manihot esculenta) at key growth stages. PLANT MOLECULAR BIOLOGY 2010; 74:573-90. [PMID: 20957510 DOI: 10.1007/s11103-010-9698-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2010] [Accepted: 09/26/2010] [Indexed: 05/04/2023]
Abstract
A cDNA library was constructed from the root tissues of cassava variety Huanan 124 at the root bulking stage. A total of 9,600 cDNA clones from the library were sequenced with single-pass from the 5'-terminus to establish a catalogue of expressed sequence tags (ESTs). Assembly of the resulting EST sequences resulted in 2,878 putative unigenes. Blastn analysis showed that 62.6% of the unigenes matched with known cassava ESTs and the rest had no 'hits' against the cassava database in the integrative PlantGDB database. Blastx analysis showed that 1,715 (59.59%) of the unigenes matched with one or more GenBank protein entries and 1,163 (40.41%) had no 'hits'. A cDNA microarray with 2,878 unigenes was developed and used to analyze gene expression profiling of Huanan 124 at key growth stages including seedling, formation of root system, root bulking, and starch maturity. Array data analysis revealed that (1) the higher ratio of up-regulated ribosome-related genes was accompanied by a high ratio of up-regulated ubiquitin, proteasome-related and protease genes in cassava roots; (2) starch formation and degradation simultaneously occur at the early stages of root development but starch degradation is declined partially due to decrease in UDP-glucose dehydrogenase activity with root maturity; (3) starch may also be synthesized in situ in roots; (4) starch synthesis, translocation, and accumulation are also associated probably with signaling pathways that parallel Wnt, LAM, TCS and ErbB signaling pathways in animals; (5) constitutive expression of stress-responsive genes may be due to the adaptation of cassava to harsh environments during long-term evolution.
Collapse
Affiliation(s)
- You-Zhi Li
- Guangxi Key Laboratory of Subtropical Bioresource Conservation and Utilization, College of Life Science and Technology, Guangxi University, 530004, Nanning, Guangxi, People's Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Abstract
We propose a Poisson-compound gamma approach for species richness estimation. Based on the denseness and nesting properties of the gamma mixture, we fix the shape parameter of each gamma component at a unified value, and estimate the mixture using nonparametric maximum likelihood. A least-squares crossvalidation procedure is proposed for the choice of the common shape parameter. The performance of the resulting estimator of N is assessed using numerical studies and genomic data.
Collapse
Affiliation(s)
- Ji-Ping Wang
- Department of Statistics , Northwestern University , 2006 Sheridan Road, Evanston, Illinois 60208 , U.S.A.
| |
Collapse
|
8
|
High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets. Proc Natl Acad Sci U S A 2010; 107:1518-23. [PMID: 20080641 DOI: 10.1073/pnas.0913939107] [Citation(s) in RCA: 213] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Developing T cells face a series of cell fate choices in the thymus and in the periphery. The role of the individual T cell receptor (TCR) in determining decisions of cell fate remains unresolved. The stochastic/selection model postulates that the initial fate of the cell is independent of TCR specificity, with survival dependent on additional TCR/coreceptor "rescue" signals. The "instructive" model holds that cell fate is initiated by the interaction of the TCR with a cognate peptide-MHC complex. T cells are then segregated on the basis of TCR specificity with the aid of critical coreceptors and signal modulators [Chan S, Correia-Neves M, Benoist C, Mathis (1998) Immunol Rev 165: 195-207]. The former would predict a random representation of individual TCR across divergent T cell lineages whereas the latter would predict minimal overlap between divergent T cell subsets. To address this issue, we have used high-throughput sequencing to evaluate the TCR distribution among key T cell developmental and effector subsets from a single donor. We found numerous examples of individual subsets sharing identical TCR sequence, supporting a model of a stochastic process of cell fate determination coupled with dynamic patterns of clonal expansion of T cells bearing the same TCR sequence among both CD4(+) and CD8+ populations.
Collapse
|
9
|
Gou X, Yuan T, Wei X, Russell SD. Gene expression in the dimorphic sperm cells of Plumbago zeylanica: transcript profiling, diversity, and relationship to cell type. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2009; 60:33-47. [PMID: 19500307 DOI: 10.1111/j.1365-313x.2009.03934.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Plumbago zeylanica produces cytoplasmically dimorphic sperm cells that target the egg and central cell during fertilization. In mature pollen, the larger sperm cell contains numerous mitochondria, is associated with the vegetative nucleus (S(vn)), and fuses preferentially with the central cell, forming endosperm. The other, plastid-enriched sperm cell (S(ua)) fuses with the egg cell, forming the zygote and embryo. Sperm expressed genes were investigated using ESTs produced from each sperm type; differential expression was validated through suppression subtractive hybridization, custom microarrays, real-time RT-PCR and in situ hybridization. The expression profiles of dimorphic sperm cells reflect a diverse and broad complement of genes, including high proportions of conserved and unknown genes, as well as distinct patterns of expression. A number of genes were highly up-regulated in the male germ line, including some genes that were differentially expressed in either the S(ua) or the S(vn). Differentially up-regulated genes in the egg-targeted S(ua) showed increased expression in transcription and translation categories, whereas the central cell-targeted S(vn) displayed expanded expression in the hormone biosynthesis category. Interestingly, the up-regulated genes expressed in the sperm cells appeared to reflect the expected post-fusion profiles of the future embryo and endosperm. As sperm cytoplasm is known to be transmitted during fertilization in this plant, sperm-contributed mRNAs are probably transported during fertilization, which could influence early embryo and endosperm development.
Collapse
Affiliation(s)
- Xiaoping Gou
- Department of Botany, University of Oklahoma, Norman, OK 73019, USA
| | | | | | | |
Collapse
|
10
|
Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang H, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW. Comparison of next generation sequencing technologies for transcriptome characterization. BMC Genomics 2009; 10:347. [PMID: 19646272 PMCID: PMC2907694 DOI: 10.1186/1471-2164-10-347] [Citation(s) in RCA: 157] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 08/01/2009] [Indexed: 11/10/2022] Open
Abstract
Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.
Collapse
Affiliation(s)
- P Kerr Wall
- Department of Biology, Institute of Molecular Evolutionary Genetics, and The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Expressed sequence tags of the peanut pod nematode Ditylenchus africanus: the first transcriptome analysis of an Anguinid nematode. Mol Biochem Parasitol 2009; 167:32-40. [PMID: 19383517 DOI: 10.1016/j.molbiopara.2009.04.004] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2008] [Revised: 04/07/2009] [Accepted: 04/12/2009] [Indexed: 11/20/2022]
Abstract
In this study, 4847 expressed sequenced tags (ESTs) from mixed stages of the migratory plant-parasitic nematode Ditylenchus africanus (peanut pod nematode) were investigated. It is the first molecular survey of a nematode which belongs to the family of the Anguinidae (order Rhabditida, superfamily Sphaerularioidea). The sequences were clustered into 2596 unigenes, of which 43% did not show any homology to known protein, nucleotide, nematode EST or plant-parasitic nematode genome sequences. Gene ontology mapping revealed that most putative proteins are involved in developmental and reproductive processes. In addition unigenes involved in oxidative stress as well as in anhydrobiosis, such as LEA (late embryogenesis abundant protein) and trehalose-6-phosphate synthase were identified. Other tags showed homology to genes previously described as being involved in parasitism (expansin, SEC-2, calreticulin, 14-3-3b and various allergen proteins). In situ hybridization revealed that the expression of a putative expansin and a venom allergen protein was restricted to the gland cell area of the nematode, being in agreement with their presumed role in parasitism. Furthermore, seven putative novel candidate parasitism genes were identified based on the prediction of a signal peptide in the corresponding protein sequence and homologous ESTs exclusively in parasitic nematodes. These genes are interesting for further research and functional characterization. Finally, 34 unigenes were retained as good target candidates for future RNAi experiments, because of their nematode specific nature and observed lethal phenotypes of Caenorhabditis elegans homologs.
Collapse
|
12
|
Durden C, Dong Q. RICHEST--a web server for richness estimation in biological data. Bioinformation 2009; 3:296-8. [PMID: 19293995 PMCID: PMC2655047 DOI: 10.6026/97320630003296] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2009] [Accepted: 02/05/2009] [Indexed: 12/02/2022] Open
Abstract
Richness is defined as the number of distinct species or classes in a sample or population.
Although richness estimation is an important practice, it requires mathematical and computational methods
that are challenging to understand and implement. We have developed a web server, RICHness ESTimator (RICHEST),
which implements three non-parametric statistical methods for richness estimation. Its user-friendly web interface
allows users to analyze and compare their data conveniently over the web.
Collapse
Affiliation(s)
- Chris Durden
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana, USA
| | | |
Collapse
|
13
|
Susko E, Roger AJ. Statistical analysis of expressed sequence tags. Methods Mol Biol 2009; 533:277-287. [PMID: 19277567 DOI: 10.1007/978-1-60327-136-3_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Expressed sequence tag (EST) surveys are an efficient way to characterize large numbers of genes from an organism. The rate of gene discovery in an EST survey depends on the degree of redundancy of the cDNA libraries from which sequences are obtained. We consider statistics for the comparison of EST libraries based upon the frequencies with which genes occur in subsamples of reads. These measures are useful in determining which of the libraries, having a large proportion of genes in common, is more likely to yield new genes in future reads. We also present tests, with multiple corrections adjustments, for whether genes are equally represented or expressed in a pair of libraries.
Collapse
Affiliation(s)
- Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | | |
Collapse
|
14
|
Lewers KS, Saski CA, Cuthbertson BJ, Henry DC, Staton ME, Main DS, Dhanaraj AL, Rowland LJ, Tomkins JP. A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers. BMC PLANT BIOLOGY 2008; 8:69. [PMID: 18570660 PMCID: PMC2474608 DOI: 10.1186/1471-2229-8-69] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2008] [Accepted: 06/20/2008] [Indexed: 05/03/2023]
Abstract
BACKGROUND The recent development of novel repeat-fruiting types of blackberry (Rubus L.) cultivars, combined with a long history of morphological marker-assisted selection for thornlessness by blackberry breeders, has given rise to increased interest in using molecular markers to facilitate blackberry breeding. Yet no genetic maps, molecular markers, or even sequences exist specifically for cultivated blackberry. The purpose of this study is to begin development of these tools by generating and annotating the first blackberry expressed sequence tag (EST) library, designing primers from the ESTs to amplify regions containing simple sequence repeats (SSR), and testing the usefulness of a subset of the EST-SSRs with two blackberry cultivars. RESULTS A cDNA library of 18,432 clones was generated from expanding leaf tissue of the cultivar Merton Thornless, a progenitor of many thornless commercial cultivars. Among the most abundantly expressed of the 3,000 genes annotated were those involved with energy, cell structure, and defense. From individual sequences containing SSRs, 673 primer pairs were designed. Of a randomly chosen set of 33 primer pairs tested with two blackberry cultivars, 10 detected an average of 1.9 polymorphic PCR products. CONCLUSION This rate predicts that this library may yield as many as 940 SSR primer pairs detecting 1,786 polymorphisms. This may be sufficient to generate a genetic map that can be used to associate molecular markers with phenotypic traits, making possible molecular marker-assisted breeding to compliment existing morphological marker-assisted breeding in blackberry.
Collapse
Affiliation(s)
- Kim S Lewers
- USDA-ARS, Beltsville Agricultural Research Center, Genetic Improvement of Fruits and Vegetables Lab, Bldg. 010A, BARC-West, 10300 Baltimore Ave., Beltsville, MD 20705-2350, USA
| | - Chris A Saski
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
| | - Brandon J Cuthbertson
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
- National Institutes of Health/National Institute of Environmental Health Sciences, Laboratory of Signal Transduction, Peptide Hormone Action Group, 111 TW Alexander Drive, PO Box 12233, MD F3-04 Research Triangle Park, NC 27709-2233, USA
| | - David C Henry
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
| | - Meg E Staton
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
| | - Dorrie S Main
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
- Center for Integrated Biotechnology, Dept of Horticulture and Landscape Architecture, Washington State University, 45 Johnson Hall, Pullman, WA 99164-6414, USA
| | - Anik L Dhanaraj
- USDA-ARS, Beltsville Agricultural Research Center, Genetic Improvement of Fruits and Vegetables Lab, Bldg. 010A, BARC-West, 10300 Baltimore Ave., Beltsville, MD 20705-2350, USA
- Monsanto Research Centre, Biotech Product Support, 44/2A Bellary Road, NH-7, Hebbal, Bangalore 560 092, India
| | - Lisa J Rowland
- USDA-ARS, Beltsville Agricultural Research Center, Genetic Improvement of Fruits and Vegetables Lab, Bldg. 010A, BARC-West, 10300 Baltimore Ave., Beltsville, MD 20705-2350, USA
| | - Jeff P Tomkins
- Clemson University Genomics Institute, 51 New Cherry St., 304 Biosystems Research Complex, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
15
|
Cervigni GDL, Paniego N, Díaz M, Selva JP, Zappacosta D, Zanazzi D, Landerreche I, Martelotto L, Felitti S, Pessino S, Spangenberg G, Echenique V. Expressed sequence tag analysis and development of gene associated markers in a near-isogenic plant system of Eragrostis curvula. PLANT MOLECULAR BIOLOGY 2008; 67:1-10. [PMID: 18196464 DOI: 10.1007/s11103-007-9282-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2007] [Accepted: 12/22/2007] [Indexed: 05/05/2023]
Abstract
Eragrostis curvula (Schrad.) Nees is a forage grass native to the semiarid regions of Southern Africa, which reproduces mainly by pseudogamous diplosporous apomixis. A collection of ESTs was generated from four cDNA libraries, three of them obtained from panicles of near-isogenic lines with different ploidy levels and reproductive modes, and one obtained from 12 days-old plant leaves. A total of 12,295 high-quality ESTs were clustered and assembled, rendering 8,864 unigenes, including 1,490 contigs and 7,394 singletons, with a genome coverage of 22%. A total of 7,029 (79.11%) unigenes were functionally categorized by BLASTX analysis against sequences deposited in public databases, but only 37.80% could be classified according to Gene Ontology. Sequence comparison against the cereals genes indexes (GI) revealed 50% significant hits. A total of 254 EST-SSRs were detected from 219 singletons and 35 from contigs. Di- and tri- motifs were similarly represented with percentages of 38.95 and 40.16%, respectively. In addition, 190 SNPs and Indels were detected in 18 contigs generated from 3 to 4 libraries. The ESTs and the molecular markers obtained in this study will provide valuable resources for a wide range of applications including gene identification, genetic mapping, cultivar identification, analysis of genetic diversity, phenotype mapping and marker assisted selection.
Collapse
Affiliation(s)
- Gerardo D L Cervigni
- Centro de Recursos Naturales Renovables de la Zona Semiárida-CONICET, Camino de La Carrindanga Km 7.0, Bahia Blanca, Argentina
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Jacob J, Mitreva M, Vanholme B, Gheysen G. Exploring the transcriptome of the burrowing nematode Radopholus similis. Mol Genet Genomics 2008; 280:1-17. [PMID: 18386064 DOI: 10.1007/s00438-008-0340-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 03/19/2008] [Indexed: 01/03/2023]
Abstract
Radopholus similis is an important nematode pest on fruit crops in the tropics. Unraveling the transcriptome of this migratory plant-parasitic nematode can provide insight in the parasitism process and lead to more efficient control measures. For the first high throughput molecular characterization of this devastating nematode, 5,853 expressed sequence tags from a mixed stage population were generated. Adding 1,154 tags from the EST division of GenBank for subsequent analysis, resulted in a total of 7,007 ESTs, which represent approximately 3,200 genes. The mean G + C content of the nucleotides at the third codon position (GC3%) was calculated to be as high as 64.8%, the highest for nematodes reported to date. BLAST-searches resulted in about 70% of the clustered ESTs having homology to (DNA and protein) sequences from the GenBank database, whereas one-third of them did not match to any known sequence. Roughly 40% of these latter sequences are predicted to be coding, representing putative novel protein coding genes. Functional annotation of the sequences by GO annotation revealed the abundance of genes involved in reproduction and development, which reflects the nematode population biology. Genes with a role in the parasitism process are identified, as well as genes essential for nematode survival, providing information useful for parasite control. No evidence was found for the presence of trans-spliced leader sequences commonly occurring in nematodes, despite the use of various approaches. In conclusion, we found three different sources for the EST sequences: the majority has a nuclear origin, approximately 1% of the EST sequences are derived from the mitochondrial transcriptome, and interestingly, 1% of the tags are with high probability derived from Wolbachia, providing the first molecular indication for the presence of this endosymbiont in a plant-parasitic nematode.
Collapse
Affiliation(s)
- Joachim Jacob
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, 9000 Ghent, Belgium.
| | | | | | | |
Collapse
|
17
|
Sakurai T, Plata G, Rodríguez-Zapata F, Seki M, Salcedo A, Toyoda A, Ishiwata A, Tohme J, Sakaki Y, Shinozaki K, Ishitani M. Sequencing analysis of 20,000 full-length cDNA clones from cassava reveals lineage specific expansions in gene families related to stress response. BMC PLANT BIOLOGY 2007; 7:66. [PMID: 18096061 PMCID: PMC2245942 DOI: 10.1186/1471-2229-7-66] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2007] [Accepted: 12/20/2007] [Indexed: 05/18/2023]
Abstract
BACKGROUND Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs). RESULTS The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features. CONCLUSION The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome.
Collapse
Affiliation(s)
- Tetsuya Sakurai
- Metabolomics Research Group, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Germán Plata
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Fausto Rodríguez-Zapata
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Motoaki Seki
- Plant Functional Genomics Research Group, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Andrés Salcedo
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Atsushi Toyoda
- Genome Core Technology Facilities, RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Atsushi Ishiwata
- Metabolomics Research Group, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Joe Tohme
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Yoshiyuki Sakaki
- Genome Core Technology Facilities, RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Kazuo Shinozaki
- Plant Functional Genomics Research Group, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Manabu Ishitani
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| |
Collapse
|
18
|
Lijoi A, Mena RH, Prünster I. A Bayesian nonparametric method for prediction in EST analysis. BMC Bioinformatics 2007; 8:339. [PMID: 17868445 PMCID: PMC2220008 DOI: 10.1186/1471-2105-8-339] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2007] [Accepted: 09/14/2007] [Indexed: 11/30/2022] Open
Abstract
Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
Collapse
Affiliation(s)
- Antonio Lijoi
- Department of Economics and Quantitative Methods, University of Pavia, 27100 Pavia and Institute for Applied Mathematics and Information Technology, National Research Council, 20133 Milan, Italy
| | - Ramsés H Mena
- Research Institute for Applied Mathematics and Systems, National Autonomous University of Mexico, Mexico City, A.P. 20-726, Mexico
| | - Igor Prünster
- Department of Statistics and Applied Mathematics and ICER, University of Turin, 10122 Turin and Carlo Alberto College, 10024 Moncalieri, Italy
| |
Collapse
|
19
|
Gabashvili IS, Sokolowski BHA, Morton CC, Giersch ABS. Ion channel gene expression in the inner ear. J Assoc Res Otolaryngol 2007; 8:305-28. [PMID: 17541769 PMCID: PMC2538437 DOI: 10.1007/s10162-007-0082-y] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2007] [Accepted: 04/23/2007] [Indexed: 12/13/2022] Open
Abstract
The ion channel genome is still being defined despite numerous publications on the subject. The ion channel transcriptome is even more difficult to assess. Using high-throughput computational tools, we surveyed all available inner ear cDNA libraries to identify genes coding for ion channels. We mapped over 100,000 expressed sequence tags (ESTs) derived from human cochlea, mouse organ of Corti, mouse and zebrafish inner ear, and rat vestibular end organs to Homo sapiens, Mus musculus, Danio rerio, and Rattus norvegicus genomes. A survey of EST data alone reveals that at least a third of the ion channel genome is expressed in the inner ear, with highest expression occurring in hair cell-enriched mouse organ of Corti and rat vestibule. Our data and comparisons with other experimental techniques that measure gene expression show that every method has its limitations and does not per se provide a complete coverage of the inner ear ion channelome. In addition, the data show that most genes produce alternative transcripts with the same spectrum across multiple organisms, no ion channel gene variants are unique to the inner ear, and many splice variants have yet to be annotated. Our high-throughput approach offers a qualitative computational and experimental analysis of ion channel genes in inner ear cDNA collections. A lack of data and incomplete gene annotations prevent both rigorous statistical analyses and comparisons of entire ion channelomes derived from different tissues and organisms.
Collapse
|