1
|
Scalabrin S, Toniutti L, Di Gaspero G, Scaglione D, Magris G, Vidotto M, Pinosio S, Cattonaro F, Magni F, Jurman I, Cerutti M, Suggi Liverani F, Navarini L, Del Terra L, Pellegrino G, Ruosi MR, Vitulo N, Valle G, Pallavicini A, Graziosi G, Klein PE, Bentley N, Murray S, Solano W, Al Hakimi A, Schilling T, Montagnon C, Morgante M, Bertrand B. A single polyploidization event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm. Sci Rep 2020; 10:4642. [PMID: 32170172 PMCID: PMC7069947 DOI: 10.1038/s41598-020-61216-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 01/20/2020] [Indexed: 11/09/2022] Open
Abstract
The genome of the allotetraploid species Coffea arabica L. was sequenced to assemble independently the two component subgenomes (putatively deriving from C. canephora and C. eugenioides) and to perform a genome-wide analysis of the genetic diversity in cultivated coffee germplasm and in wild populations growing in the center of origin of the species. We assembled a total length of 1.536 Gbp, 444 Mb and 527 Mb of which were assigned to the canephora and eugenioides subgenomes, respectively, and predicted 46,562 gene models, 21,254 and 22,888 of which were assigned to the canephora and to the eugeniodes subgenome, respectively. Through a genome-wide SNP genotyping of 736 C. arabica accessions, we analyzed the genetic diversity in the species and its relationship with geographic distribution and historical records. We observed a weak population structure due to low-frequency derived alleles and highly negative values of Taijma’s D, suggesting a recent and severe bottleneck, most likely resulting from a single event of polyploidization, not only for the cultivated germplasm but also for the entire species. This conclusion is strongly supported by forward simulations of mutation accumulation. However, PCA revealed a cline of genetic diversity reflecting a west-to-east geographical distribution from the center of origin in East Africa to the Arabian Peninsula. The extremely low levels of variation observed in the species, as a consequence of the polyploidization event, make the exploitation of diversity within the species for breeding purposes less interesting than in most crop species and stress the need for introgression of new variability from the diploid progenitors.
Collapse
Affiliation(s)
- Simone Scalabrin
- IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy
| | - Lucile Toniutti
- World Coffee Research, 5 avenue du grand chêne, 34270, Saint-Mathieu-de-Tréviers, France.
| | - Gabriele Di Gaspero
- Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy
| | - Davide Scaglione
- IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy
| | - Gabriele Magris
- Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.,University of Udine, Department of Agricultural Food, Environmental and Animal Sciences, via delle scienze 206, I-33100, Udine, Italy
| | - Michele Vidotto
- IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy
| | - Sara Pinosio
- Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.,Institute of Biosciences and Bioresources, National Research Council, via Madonna del Piano 10, I-50019, Sesto Fiorentino (FI), Italy
| | - Federica Cattonaro
- IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy
| | - Federica Magni
- IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy
| | - Irena Jurman
- Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy
| | - Mario Cerutti
- Luigi Lavazza S.p.A., Innovation Center, I-10156, Torino, Italy
| | - Furio Suggi Liverani
- Illycaffè S.p.A., Research & Innovation, via Flavia 110, I-34147, Trieste, Italy
| | - Luciano Navarini
- Illycaffè S.p.A., Research & Innovation, via Flavia 110, I-34147, Trieste, Italy
| | - Lorenzo Del Terra
- Illycaffè S.p.A., Research & Innovation, via Flavia 110, I-34147, Trieste, Italy
| | | | | | - Nicola Vitulo
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Giorgio Valle
- CRIBI, Università degli Studi di Padova, viale G. Colombo 3, I-35121, Padova, Italy
| | | | - Giorgio Graziosi
- Department of Life Sciences, University of Trieste, I-34148, Trieste, Italy
| | - Patricia E Klein
- Department of Horticultural Sciences, Texas A&M University, College Station, TX, USA
| | - Nolan Bentley
- Department of Horticultural Sciences, Texas A&M University, College Station, TX, USA
| | - Seth Murray
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, USA
| | | | - Amin Al Hakimi
- Faculty of Agriculture, Sana'a University, Sana'a, Yemen
| | - Timothy Schilling
- World Coffee Research, 5 avenue du grand chêne, 34270, Saint-Mathieu-de-Tréviers, France
| | - Christophe Montagnon
- World Coffee Research, 5 avenue du grand chêne, 34270, Saint-Mathieu-de-Tréviers, France
| | - Michele Morgante
- Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.,University of Udine, Department of Agricultural Food, Environmental and Animal Sciences, via delle scienze 206, I-33100, Udine, Italy
| | - Benoit Bertrand
- CIRAD, IPME, 34 398, Montpellier, France.,UMR IPME, Univ. Montpellier, IRD, CIRAD, 34 398, Montpellier, France
| |
Collapse
|
2
|
Jeyaraj A, Zhang X, Hou Y, Shangguan M, Gajjeraman P, Li Y, Wei C. Genome-wide identification of conserved and novel microRNAs in one bud and two tender leaves of tea plant (Camellia sinensis) by small RNA sequencing, microarray-based hybridization and genome survey scaffold sequences. BMC PLANT BIOLOGY 2017; 17:212. [PMID: 29157210 PMCID: PMC5697157 DOI: 10.1186/s12870-017-1169-1] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2017] [Accepted: 11/10/2017] [Indexed: 05/19/2023]
Abstract
BACKGROUND MicroRNAs (miRNAs) are important for plant growth and responses to environmental stresses via post-transcriptional regulation of gene expression. Tea, which is primarily produced from one bud and two tender leaves of the tea plant (Camellia sinensis), is one of the most popular non-alcoholic beverages worldwide owing to its abundance of secondary metabolites. A large number of miRNAs have been identified in various plants, including non-model species. However, due to the lack of reference genome sequences and/or information of tea plant genome survey scaffold sequences, discovery of miRNAs has been limited in C. sinensis. RESULTS Using small RNA sequencing, combined with our recently obtained genome survey data, we have identified and analyzed 175 conserved and 83 novel miRNAs mainly in one bud and two tender leaves of the tea plant. Among these, 93 conserved and 18 novel miRNAs were validated using miRNA microarray hybridization. In addition, the expression pattern of 11 conserved and 8 novel miRNAs were validated by stem-loop-qRT-PCR. A total of 716 potential target genes of identified miRNAs were predicted. Further, Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed that most of the target genes were primarily involved in stress response and enzymes related to phenylpropanoid biosynthesis. The predicted targets of 4 conserved miRNAs were further validated by 5'RLM-RACE. A negative correlation between expression profiles of 3 out of 4 conserved miRNAs (csn-miR160a-5p, csn-miR164a, csn-miR828 and csn-miR858a) and their targets (ARF17, NAC100, WER and MYB12 transcription factor) were observed. CONCLUSION In summary, the present study is one of few such studies on miRNA detection and identification in the tea plant. The predicted target genes of majority of miRNAs encoded enzymes, transcription factors, and functional proteins. The miRNA-target transcription factor gene interactions may provide important clues about the regulatory mechanism of these miRNAs in the tea plant. The data reported in this study will make a huge contribution to knowledge on the potential miRNA regulators of the secondary metabolism pathway and other important biological processes in C. sinensis.
Collapse
Affiliation(s)
- Anburaj Jeyaraj
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, 130 Changjiang West Road, Hefei, Anhui Province 230036 People’s Republic of China
| | - Xiao Zhang
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, 130 Changjiang West Road, Hefei, Anhui Province 230036 People’s Republic of China
| | - Yan Hou
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, 130 Changjiang West Road, Hefei, Anhui Province 230036 People’s Republic of China
| | - Mingzhu Shangguan
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, 130 Changjiang West Road, Hefei, Anhui Province 230036 People’s Republic of China
| | - Prabu Gajjeraman
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, 130 Changjiang West Road, Hefei, Anhui Province 230036 People’s Republic of China
- Department of Biotechnology, Karpagam University, Coimbatore, India
| | - Yeyun Li
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, 130 Changjiang West Road, Hefei, Anhui Province 230036 People’s Republic of China
| | - Chaoling Wei
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, 130 Changjiang West Road, Hefei, Anhui Province 230036 People’s Republic of China
| |
Collapse
|
3
|
Daneri-Castro SN, Svensson B, Roberts TH. Barley germination: Spatio-temporal considerations for designing and interpreting ‘omics’ experiments. J Cereal Sci 2016. [DOI: 10.1016/j.jcs.2016.05.012] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
4
|
Abstract
Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
Collapse
|
5
|
Extensive error in the number of genes inferred from draft genome assemblies. PLoS Comput Biol 2014; 10:e1003998. [PMID: 25474019 PMCID: PMC4256071 DOI: 10.1371/journal.pcbi.1003998] [Citation(s) in RCA: 176] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 10/22/2014] [Indexed: 11/19/2022] Open
Abstract
Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process. The initial publication of the genome sequence of many plants, animals, and microbes is often accompanied with great fanfare. However, these genomes are almost always first-drafts, with a lot of missing data, many gaps, and many errors in the published sequences. Compounding this problem, the genes identified in draft genome sequences are also affected by incomplete genome assemblies: the number and exact structure of predicted genes may be incorrect. Here we quantify the extent of such errors, by comparing several draft genomes against completed versions of the same sequences. Surprisingly, we find huge numbers of errors in the number of genes predicted from draft assemblies, with more than half of all genes having the wrong number of copies in the draft genomes examined. Our investigation also reveals the major causes of these errors, and further analyses using additional functional data demonstrate that many of the gene predictions can be corrected. The results presented here suggest that many inferences based on published draft genomes may be erroneous, but offer a way forward for future analyses.
Collapse
|
6
|
Draft genome sequence, and a sequence-defined genetic linkage map of the legume crop species Lupinus angustifolius L. PLoS One 2013; 8:e64799. [PMID: 23734219 PMCID: PMC3667174 DOI: 10.1371/journal.pone.0064799] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2012] [Accepted: 04/17/2013] [Indexed: 11/30/2022] Open
Abstract
Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species.
Collapse
|
7
|
Lonardi S, Duma D, Alpert M, Cordero F, Beccuti M, Bhat PR, Wu Y, Ciardo G, Alsaihati B, Ma Y, Wanamaker S, Resnik J, Bozdag S, Luo MC, Close TJ. Combinatorial pooling enables selective sequencing of the barley gene space. PLoS Comput Biol 2013; 9:e1003010. [PMID: 23592960 PMCID: PMC3617026 DOI: 10.1371/journal.pcbi.1003010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 02/05/2013] [Indexed: 11/23/2022] Open
Abstract
For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding. The problem of obtaining the full genomic sequence of an organism has been solved either via a global brute-force approach (called whole-genome shotgun) or by a divide-and-conquer strategy (called clone-by-clone). Both approaches have advantages and disadvantages in terms of cost, manual labor, and the ability to deal with sequencing errors and highly repetitive regions of the genome. With the advent of second-generation sequencing instruments, the whole-genome shotgun approach has been the preferred choice. The clone-by-clone strategy is, however, still very relevant for large complex genomes. In fact, several research groups and international consortia have produced clone libraries and physical maps for many economically or ecologically important organisms and now are in a position to proceed with sequencing. In this manuscript, we demonstrate the feasibility of this approach on the gene-space of a large, very repetitive plant genome. The novelty of our approach is that, in order to take advantage of the throughput of the current generation of sequencing instruments, we pool hundreds of clones using a special type of “smart” pooling design that allows one to establish with high accuracy the source clone from the sequenced reads in a pool. Extensive simulations and experimental results support our claims.
Collapse
Affiliation(s)
- Stefano Lonardi
- Department of Computer Science and Engineering, University of California, Riverside, California, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Yang H, Tao Y, Zheng Z, Zhang Q, Zhou G, Sweetingham MW, Howieson JG, Li C. Draft genome sequence, and a sequence-defined genetic linkage map of the legume crop species Lupinus angustifolius L. PLoS One 2013. [PMID: 23734219 DOI: 10.1371/journal.pone.0064799.t002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2023] Open
Abstract
Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species.
Collapse
Affiliation(s)
- Huaan Yang
- Department of Agriculture and Food Western Australia, South Perth, Australia
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Tiwari VK, Riera-Lizarazu O, Gunn HL, Lopez K, Iqbal MJ, Kianian SF, Leonard JM. Endosperm tolerance of paternal aneuploidy allows radiation hybrid mapping of the wheat D-genome and a measure of γ ray-induced chromosome breaks. PLoS One 2012; 7:e48815. [PMID: 23144983 PMCID: PMC3492231 DOI: 10.1371/journal.pone.0048815] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2012] [Accepted: 10/01/2012] [Indexed: 11/21/2022] Open
Abstract
Physical mapping and genome sequencing are underway for the ≈17 Gb wheat genome. Physical mapping methods independent of meiotic recombination, such as radiation hybrid (RH) mapping, will aid precise anchoring of BAC contigs in the large regions of suppressed recombination in Triticeae genomes. Reports of endosperm development following pollination with irradiated pollen at dosages that cause embryo abortion prompted us to investigate endosperm as a potential source of RH mapping germplasm. Here, we report a novel approach to construct RH based physical maps of all seven D-genome chromosomes of the hexaploid wheat ‘Chinese Spring’, simultaneously. An 81-member subset of endosperm samples derived from 20-Gy irradiated pollen was genotyped for deletions, and 737 markers were mapped on seven D-genome chromosomes. Analysis of well-defined regions of six chromosomes suggested a map resolution of ∼830 kb could be achieved; this estimate was validated with assays of markers from a sequenced contig. We estimate that the panel contains ∼6,000 deletion bins for D-genome chromosomes and will require ∼18,000 markers for high resolution mapping. Map-based deletion estimates revealed a majority of 1–20 Mb interstitial deletions suggesting mutagenic repair of double-strand breaks in pollen provides a useful resource for RH mapping and map based cloning studies.
Collapse
Affiliation(s)
- Vijay K. Tiwari
- Department of Crop and Soil Science, Oregon State University, Corvallis, Oregon, United States of America
| | - Oscar Riera-Lizarazu
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Andhra Pradesh, India
| | - Hilary L. Gunn
- Department of Crop and Soil Science, Oregon State University, Corvallis, Oregon, United States of America
| | - KaSandra Lopez
- Department of Crop and Soil Science, Oregon State University, Corvallis, Oregon, United States of America
| | - M. Javed Iqbal
- Department of Plant Sciences, North Dakota State University, Fargo, North Dakota, United States of America
| | - Shahryar F. Kianian
- Department of Plant Sciences, North Dakota State University, Fargo, North Dakota, United States of America
| | - Jeffrey M. Leonard
- Department of Crop and Soil Science, Oregon State University, Corvallis, Oregon, United States of America
- * E-mail:
| |
Collapse
|
10
|
Feltus FA, Saski CA, Mockaitis K, Haiminen N, Parida L, Smith Z, Ford J, Staton ME, Ficklin SP, Blackmon BP, Cheng CH, Schnell RJ, Kuhn DN, Motamayor JC. Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes. BMC Genomics 2011; 12:379. [PMID: 21794110 PMCID: PMC3154204 DOI: 10.1186/1471-2164-12-379] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Accepted: 07/27/2011] [Indexed: 11/25/2022] Open
Abstract
Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
Collapse
Affiliation(s)
- Frank A Feltus
- Clemson University Genomics Institute, Clemson University, 51 New Cherry Street, Clemson, SC 29634, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|