1
|
Alaux M, Rogers J, Letellier T, Flores R, Alfama F, Pommier C, Mohellibi N, Durand S, Kimmel E, Michotey C, Guerche C, Loaec M, Lainé M, Steinbach D, Choulet F, Rimbert H, Leroy P, Guilhot N, Salse J, Feuillet C, Paux E, Eversole K, Adam-Blondon AF, Quesneville H. Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data. Genome Biol 2018; 19:111. [PMID: 30115101 PMCID: PMC6097284 DOI: 10.1186/s13059-018-1491-4] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 07/23/2018] [Indexed: 01/24/2023] Open
Abstract
The Wheat@URGI portal has been developed to provide the international community of researchers and breeders with access to the bread wheat reference genome sequence produced by the International Wheat Genome Sequencing Consortium. Genome browsers, BLAST, and InterMine tools have been established for in-depth exploration of the genome sequence together with additional linked datasets including physical maps, sequence variations, gene expression, and genetic and phenomic data from other international collaborative projects already stored in the GnpIS information system. The portal provides enhanced search and browser features that will facilitate the deployment of the latest genomics resources in wheat improvement.
Collapse
Affiliation(s)
- Michael Alaux
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France.
| | - Jane Rogers
- International Wheat Genome Sequencing Consortium (IWGSC), 18 High Street, Little Eversden, Cambridge, CB23 1HE, UK
| | | | - Raphaël Flores
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | | | - Cyril Pommier
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Nacer Mohellibi
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Sophie Durand
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Erik Kimmel
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Célia Michotey
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Claire Guerche
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Mikaël Loaec
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Mathilde Lainé
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Delphine Steinbach
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
- Present address: GQE-Le Moulon UMR 320, INRA, Université Paris-Sud, Université Paris-Saclay, CNRS, AgroParisTech, Ferme du Moulon, 91190, Gif-sur-Yvette, France
| | - Frédéric Choulet
- GDEC, INRA, Université Clermont Auvergne, 63000, Clermont-Ferrand, France
| | - Hélène Rimbert
- GDEC, INRA, Université Clermont Auvergne, 63000, Clermont-Ferrand, France
| | - Philippe Leroy
- GDEC, INRA, Université Clermont Auvergne, 63000, Clermont-Ferrand, France
| | - Nicolas Guilhot
- GDEC, INRA, Université Clermont Auvergne, 63000, Clermont-Ferrand, France
| | - Jérôme Salse
- GDEC, INRA, Université Clermont Auvergne, 63000, Clermont-Ferrand, France
| | - Catherine Feuillet
- GDEC, INRA, Université Clermont Auvergne, 63000, Clermont-Ferrand, France
- Present address: Inari Agriculture, 200 Sydney Street, Cambridge, MA, 02139, USA
| | - Etienne Paux
- GDEC, INRA, Université Clermont Auvergne, 63000, Clermont-Ferrand, France
| | - Kellye Eversole
- International Wheat Genome Sequencing Consortium (IWGSC), 5207 Wyoming Road, Bethesda, Maryland, 20816, USA
| | | | | |
Collapse
|
2
|
Wei X, Xu Z, Wang G, Hou J, Ma X, Liu H, Liu J, Chen B, Luo M, Xie B, Li R, Ruan J, Liu X. pBACode: a random-barcode-based high-throughput approach for BAC paired-end sequencing and physical clone mapping. Nucleic Acids Res 2017; 45:e52. [PMID: 27980066 PMCID: PMC5397170 DOI: 10.1093/nar/gkw1261] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 12/09/2016] [Indexed: 12/14/2022] Open
Abstract
Applications that use Bacterial Artificial Chromosome (BAC) libraries often require paired-end sequences and knowledge of the physical location of each clone in plates. To facilitate obtaining this information in high-throughput, we generated pBACode vectors: a pool of BAC cloning vectors, each with a pair of random barcodes flanking its cloning site. In a pBACode BAC library, the BAC ends and their linked barcodes can be sequenced in bulk. Barcode pairs are determined by sequencing the empty pBACode vectors, which allows BAC ends to be paired according to their barcodes. For physical clone mapping, the barcodes are used as unique markers for their linked genomic sequence. After multi-dimensional pooling of BAC clones, the barcodes are sequenced and deconvoluted to locate each clone. We generated a pBACode library of 94,464 clones for the flounder Paralichthys olivaceus and obtained paired-end sequence from 95.4% of the clones. Incorporating BAC paired-ends into the genome preassembly improved its continuity by over 10-fold. Furthermore, we were able to use the barcodes to map the physical locations of each clone in just 50 pools, with up to 11 808 clones per pool. Our physical clone mapping located 90.2% of BAC clones, enabling targeted characterization of chromosomal rearrangements.
Collapse
Affiliation(s)
- Xiaolin Wei
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PTN (Peking University-Tsinghua University-National Institute of Biological Sciences) Joint Graduate Program, Beijing 100084, China.,School of Life Sciences, Peking University, Beijing 100084, China
| | - Zhichao Xu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PTN (Peking University-Tsinghua University-National Institute of Biological Sciences) Joint Graduate Program, Beijing 100084, China
| | - Guixing Wang
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Qinhuangdao 066100, China
| | - Jilun Hou
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Qinhuangdao 066100, China
| | - Xiaopeng Ma
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PTN (Peking University-Tsinghua University-National Institute of Biological Sciences) Joint Graduate Program, Beijing 100084, China
| | - Haijin Liu
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Qinhuangdao 066100, China
| | - Jiadong Liu
- National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Bo Chen
- National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Meizhong Luo
- National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Bingyan Xie
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ruiqiang Li
- Novogene Bioinformatics Institute, Beijing 100083, China
| | - Jue Ruan
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Xiao Liu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
3
|
Zhang J, Kudrna D, Mu T, Li W, Copetti D, Yu Y, Goicoechea JL, Lei Y, Wing RA. Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences. Bioinformatics 2016; 32:3058-3064. [PMID: 27318200 PMCID: PMC5048067 DOI: 10.1093/bioinformatics/btw370] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 06/06/2016] [Indexed: 12/16/2022] Open
Abstract
Motivation: Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool—Genome Puzzle Master (GPM)—that enables the integration of additional genomic signposts to edit and build ‘new-gen-assemblies’ that result in high-quality ‘annotation-ready’ pseudomolecules. Results: With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to ‘group,’ ‘merge,’ ‘order and orient’ sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user’s total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. Availability and Implementation: The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS Contacts:jzhang@mail.hzau.edu.cn or rwing@mail.arizona.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianwei Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China Arizona Genomics Institute and BIO5 Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | - Dave Kudrna
- Arizona Genomics Institute and BIO5 Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | - Ting Mu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Weiming Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Dario Copetti
- Arizona Genomics Institute and BIO5 Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA International Rice Research Institute, Genetic Resource Center, Los Baños, Laguna, Philippines
| | - Yeisoo Yu
- Arizona Genomics Institute and BIO5 Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | - Jose Luis Goicoechea
- Arizona Genomics Institute and BIO5 Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | - Yang Lei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Rod A Wing
- Arizona Genomics Institute and BIO5 Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA International Rice Research Institute, Genetic Resource Center, Los Baños, Laguna, Philippines
| |
Collapse
|
4
|
Akpinar BA, Magni F, Yuce M, Lucas SJ, Šimková H, Šafář J, Vautrin S, Bergès H, Cattonaro F, Doležel J, Budak H. The physical map of wheat chromosome 5DS revealed gene duplications and small rearrangements. BMC Genomics 2015; 16:453. [PMID: 26070810 PMCID: PMC4465308 DOI: 10.1186/s12864-015-1641-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 05/19/2015] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND The substantially large bread wheat genome, organized into highly similar three sub-genomes, renders genomic research challenging. The construction of BAC-based physical maps of individual chromosomes reduces the complexity of this allohexaploid genome, enables elucidation of gene space and evolutionary relationships, provides tools for map-based cloning, and serves as a framework for reference sequencing efforts. In this study, we constructed the first comprehensive physical map of wheat chromosome arm 5DS, thereby exploring its gene space organization and evolution. RESULTS The physical map of 5DS was comprised of 164 contigs, of which 45 were organized into 21 supercontigs, covering 176 Mb with an N50 value of 2,173 kb. Fifty-eight of the contigs were larger than 1 Mb, with the largest contig spanning 6,649 kb. A total of 1,864 molecular markers were assigned to the map at a density of 10.5 markers/Mb, anchoring 100 of the 120 contigs (>5 clones) that constitute ~95 % of the cumulative length of the map. Ordering of 80 contigs along the deletion bins of chromosome arm 5DS revealed small-scale breaks in syntenic blocks. Analysis of the gene space of 5DS suggested an increasing gradient of genes organized in islands towards the telomere, with the highest gene density of 5.17 genes/Mb in the 0.67-0.78 deletion bin, 1.4 to 1.6 times that of all other bins. CONCLUSIONS Here, we provide a chromosome-specific view into the organization and evolution of the D genome of bread wheat, in comparison to one of its ancestors, revealing recent genome rearrangements. The high-quality physical map constructed in this study paves the way for the assembly of a reference sequence, from which breeding efforts will greatly benefit.
Collapse
Affiliation(s)
- Bala Ani Akpinar
- Sabanci University Nanotechnology Research and Application Centre (SUNUM), Sabanci University, Universite Cad. Orta Mah. No: 27, Tuzla, 34956, Istanbul, Turkey.
| | - Federica Magni
- Instituto di Genomica Applicata, Via J.Linussio 51, Udine, 33100, Italy.
| | - Meral Yuce
- Sabanci University Nanotechnology Research and Application Centre (SUNUM), Sabanci University, Universite Cad. Orta Mah. No: 27, Tuzla, 34956, Istanbul, Turkey.
| | - Stuart J Lucas
- Sabanci University Nanotechnology Research and Application Centre (SUNUM), Sabanci University, Universite Cad. Orta Mah. No: 27, Tuzla, 34956, Istanbul, Turkey.
| | - Hana Šimková
- Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany, CZ-78371, Olomouc, Czech Republic.
| | - Jan Šafář
- Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany, CZ-78371, Olomouc, Czech Republic.
| | - Sonia Vautrin
- Centre Nationales Ressources Génomiques Végétales, INRA UPR 1258, 24 Chemin de Borde Rouge - Auzeville 31326, Castanet-Tolosan, France.
| | - Hélène Bergès
- Centre Nationales Ressources Génomiques Végétales, INRA UPR 1258, 24 Chemin de Borde Rouge - Auzeville 31326, Castanet-Tolosan, France.
| | - Federica Cattonaro
- Instituto di Genomica Applicata, Via J.Linussio 51, Udine, 33100, Italy.
| | - Jaroslav Doležel
- Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany, CZ-78371, Olomouc, Czech Republic.
| | - Hikmet Budak
- Sabanci University Nanotechnology Research and Application Centre (SUNUM), Sabanci University, Universite Cad. Orta Mah. No: 27, Tuzla, 34956, Istanbul, Turkey.
- Molecular Biology, Genetics and Bioengineering Program, Sabanci University, 34956, Istanbul, Turkey.
| |
Collapse
|
5
|
Zhang J, Shao C, Zhang L, Liu K, Gao F, Dong Z, Xu P, Chen S. A first generation BAC-based physical map of the half-smooth tongue sole (Cynoglossus semilaevis) genome. BMC Genomics 2014; 15:215. [PMID: 24650389 PMCID: PMC3998196 DOI: 10.1186/1471-2164-15-215] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 03/10/2014] [Indexed: 02/06/2023] Open
Abstract
Background Half-smooth tongue sole (Cynoglossus semilaevis Günther) has been exploited as a commercially important cultured marine flatfish, and female grows 2–3 times faster than male. Genetic studies, especially on the chromosomal sex-determining system of this species, have been carried out in the last decade. Although the genome of half-smooth tongue sole was relatively small (626.9 Mb), there are still some difficulties in the high-quality assembly of the next generation genome sequencing reads without the assistance of a physical map, especially for the W chromosome of this fish due to abundance of repetitive sequences. The objective of this study is to construct a bacterial artificial chromosome (BAC)-based physical map for half-smooth tongue sole with the method of high information content fingerprinting (HICF). Results A physical map of half-smooth tongue sole was constructed with 30, 294 valid fingerprints (7.5 × genome coverage) with a tolerance of 4 and an initial cutoff of 1e-60. A total of 29,709 clones were assembled into 1,485 contigs with an average length of 539 kb and a N50 length of 664 kb. There were 394 contigs longer than the N50 length, and these contigs will be a useful resource for future integration with linkage map and whole genome sequence assembly. The estimated physical length of the assembled contigs was 797 Mb, representing approximately 1.27 coverage of the half-smooth tongue sole genome. The largest contig contained 410 BAC clones with a physical length of 3.48 Mb. Almost all of the 676 BAC clones (99.9%) in the 21 randomly selected contigs were positively validated by PCR assays, thereby confirming the reliability of the assembly. Conclusions A first generation BAC-based physical map of half-smooth tongue sole was constructed with high reliability. The map will promote genetic improvement programs of this fish, especially integration of physical and genetic maps, fine-mappings of important gene and/or QTL, comparative and evolutionary genomics studies, as well as whole genome sequence assembly.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Peng Xu
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao 266071, China.
| | | |
Collapse
|
6
|
Varshney RK, Mir RR, Bhatia S, Thudi M, Hu Y, Azam S, Zhang Y, Jaganathan D, You FM, Gao J, Riera-Lizarazu O, Luo MC. Integrated physical, genetic and genome map of chickpea (Cicer arietinum L.). Funct Integr Genomics 2014; 14:59-73. [PMID: 24610029 PMCID: PMC4273598 DOI: 10.1007/s10142-014-0363-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Revised: 01/27/2014] [Accepted: 01/31/2014] [Indexed: 10/25/2022]
Abstract
Physical map of chickpea was developed for the reference chickpea genotype (ICC 4958) using bacterial artificial chromosome (BAC) libraries targeting 71,094 clones (~12× coverage). High information content fingerprinting (HICF) of these clones gave high-quality fingerprinting data for 67,483 clones, and 1,174 contigs comprising 46,112 clones and 3,256 singletons were defined. In brief, 574 Mb genome size was assembled in 1,174 contigs with an average of 0.49 Mb per contig and 3,256 singletons represent 407 Mb genome. The physical map was linked with two genetic maps with the help of 245 BAC-end sequence (BES)-derived simple sequence repeat (SSR) markers. This allowed locating some of the BACs in the vicinity of some important quantitative trait loci (QTLs) for drought tolerance and reistance to Fusarium wilt and Ascochyta blight. In addition, fingerprinted contig (FPC) assembly was also integrated with the draft genome sequence of chickpea. As a result, ~965 BACs including 163 minimum tilling path (MTP) clones could be mapped on eight pseudo-molecules of chickpea forming 491 hypothetical contigs representing 54,013,992 bp (~54 Mb) of the draft genome. Comprehensive analysis of markers in abiotic and biotic stress tolerance QTL regions led to identification of 654, 306 and 23 genes in drought tolerance "QTL-hotspot" region, Ascochyta blight resistance QTL region and Fusarium wilt resistance QTL region, respectively. Integrated physical, genetic and genome map should provide a foundation for cloning and isolation of QTLs/genes for molecular dissection of traits as well as markers for molecular breeding for chickpea improvement.
Collapse
Affiliation(s)
- Rajeev K. Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
| | - Reyazul Rouf Mir
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
| | - Sabhyata Bhatia
- National Institute of Plant Genome Research (NIPGR), New Delhi, India
| | - Mahendar Thudi
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
| | - Yuqin Hu
- University of California, Davis, USA
| | - Sarwar Azam
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
| | | | - Deepa Jaganathan
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
| | - Frank M. You
- Cereal Research Centre, Agriculture and Agri-Food Canada, Winnipeg, Canada
| | | | - Oscar Riera-Lizarazu
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
- Dow AgroSciences, Pullman, USA
| | | |
Collapse
|
7
|
Wang X, Liu Q, Wang H, Luo CX, Wang G, Luo M. A BAC based physical map and genome survey of the rice false smut fungus Villosiclava virens. BMC Genomics 2013; 14:883. [PMID: 24341590 PMCID: PMC3878662 DOI: 10.1186/1471-2164-14-883] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2013] [Accepted: 12/04/2013] [Indexed: 01/08/2023] Open
Abstract
Background Rice false smut caused by Villosiclava virens is a devastating fungal disease that spreads in major rice-growing regions throughout the world. However, the genomic information for this fungal pathogen is limited and the pathogenic mechanism of this disease is still not clear. To facilitate genetic, molecular and genomic studies of this fungal pathogen, we constructed the first BAC-based physical map and performed the first genome survey for this species. Results High molecular weight genomic DNA was isolated from young mycelia of the Villosiclava virens strain UV-8b and a high-quality, large-insert and deep-coverage Bacterial Artificial Chromosome (BAC) library was constructed with the restriction enzyme HindIII. The BAC library consisted of 5,760 clones, which covers 22.7-fold of the UV-8b genome, with an average insert size of 140 kb and an empty clone rate of lower than 1%. BAC fingerprinting generated successful fingerprints for 2,290 BAC clones. Using the fingerprints, a whole genome-wide BAC physical map was constructed that contained 194 contigs (2,035 clones) spanning 51.2 Mb in physical length. Bidirectional-end sequencing of 4,512 BAC clones generated 6,560 high quality BAC end sequences (BESs), with a total length of 3,030,658 bp, representing 8.54% of the genome sequence. Analysis of the BESs revealed general genome information, including 51.52% GC content, 22.51% repetitive sequences, 376.12/Mb simple sequence repeat (SSR) density and approximately 36.01% coding regions. Sequence comparisons to other available fungal genome sequences through BESs showed high similarities to Metarhizium anisopliae, Trichoderma reesei, Nectria haematococca and Cordyceps militaris, which were generally in agreement with the 18S rRNA gene analysis results. Conclusion This study provides the first BAC-based physical map and genome information for the important rice fungal pathogen Villosiclava virens. The BAC clones, physical map and genome information will serve as fundamental resources to accelerate the genetic, molecular and genomic studies of this pathogen, including positional cloning, comparative genomic analysis and whole genome sequencing. The BAC library and physical map have been opened to researchers as public genomic resources (http://gresource.hzau.edu.cn/resource/resource.html).
Collapse
Affiliation(s)
| | | | | | | | | | - Meizhong Luo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei, 430070, PR China.
| |
Collapse
|
8
|
Breen J, Wicker T, Shatalina M, Frenkel Z, Bertin I, Philippe R, Spielmeyer W, Šimková H, Šafář J, Cattonaro F, Scalabrin S, Magni F, Vautrin S, Bergès H, Paux E, Fahima T, Doležel J, Korol A, Feuillet C, Keller B. A physical map of the short arm of wheat chromosome 1A. PLoS One 2013; 8:e80272. [PMID: 24278269 PMCID: PMC3836966 DOI: 10.1371/journal.pone.0080272] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2013] [Accepted: 10/11/2013] [Indexed: 12/31/2022] Open
Abstract
Bread wheat (Triticum aestivum) has a large and highly repetitive genome which poses major technical challenges for its study. To aid map-based cloning and future genome sequencing projects, we constructed a BAC-based physical map of the short arm of wheat chromosome 1A (1AS). From the assembly of 25,918 high information content (HICF) fingerprints from a 1AS-specific BAC library, 715 physical contigs were produced that cover almost 99% of the estimated size of the chromosome arm. The 3,414 BAC clones constituting the minimum tiling path were end-sequenced. Using a gene microarray containing ∼40 K NCBI UniGene EST clusters, PCR marker screening and BAC end sequences, we arranged 160 physical contigs (97 Mb or 35.3% of the chromosome arm) in a virtual order based on synteny with Brachypodium, rice and sorghum. BAC end sequences and information from microarray hybridisation was used to anchor 3.8 Mbp of Illumina sequences from flow-sorted chromosome 1AS to BAC contigs. Comparison of genetic and synteny-based physical maps indicated that ∼50% of all genetic recombination is confined to 14% of the physical length of the chromosome arm in the distal region. The 1AS physical map provides a framework for future genetic mapping projects as well as the basis for complete sequencing of chromosome arm 1AS.
Collapse
Affiliation(s)
- James Breen
- Institute of Plant Biology, University of Zurich, Zurich, Switzerland
| | - Thomas Wicker
- Institute of Plant Biology, University of Zurich, Zurich, Switzerland
| | | | - Zeev Frenkel
- Institute of Evolution, University of Haifa, Haifa, Israel
| | - Isabelle Bertin
- INRA UMR 1095, Genetique Diversite et Ecophysiologie des Cereales, Clermont-Ferrand, France
| | - Romain Philippe
- INRA UMR 1095, Genetique Diversite et Ecophysiologie des Cereales, Clermont-Ferrand, France
| | | | - Hana Šimková
- Centre of the Region Hana for Biotechnological and Agricultural Research, Institute of Experimental Botany, Olomouc, Czech Republic
| | - Jan Šafář
- Centre of the Region Hana for Biotechnological and Agricultural Research, Institute of Experimental Botany, Olomouc, Czech Republic
| | | | | | | | | | | | | | - Etienne Paux
- INRA UMR 1095, Genetique Diversite et Ecophysiologie des Cereales, Clermont-Ferrand, France
| | - Tzion Fahima
- Institute of Evolution, University of Haifa, Haifa, Israel
| | - Jaroslav Doležel
- Centre of the Region Hana for Biotechnological and Agricultural Research, Institute of Experimental Botany, Olomouc, Czech Republic
| | - Abraham Korol
- Institute of Evolution, University of Haifa, Haifa, Israel
| | - Catherine Feuillet
- INRA UMR 1095, Genetique Diversite et Ecophysiologie des Cereales, Clermont-Ferrand, France
| | - Beat Keller
- Institute of Plant Biology, University of Zurich, Zurich, Switzerland
- * E-mail:
| |
Collapse
|
9
|
Genomic resources for gene discovery, functional genome annotation, and evolutionary studies of maize and its close relatives. Genetics 2013; 195:723-37. [PMID: 24037269 DOI: 10.1534/genetics.113.157115] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.
Collapse
|
10
|
A 4-gigabase physical map unlocks the structure and evolution of the complex genome of Aegilops tauschii, the wheat D-genome progenitor. Proc Natl Acad Sci U S A 2013; 110:7940-5. [PMID: 23610408 DOI: 10.1073/pnas.1219082110] [Citation(s) in RCA: 178] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The current limitations in genome sequencing technology require the construction of physical maps for high-quality draft sequences of large plant genomes, such as that of Aegilops tauschii, the wheat D-genome progenitor. To construct a physical map of the Ae. tauschii genome, we fingerprinted 461,706 bacterial artificial chromosome clones, assembled contigs, designed a 10K Ae. tauschii Infinium SNP array, constructed a 7,185-marker genetic map, and anchored on the map contigs totaling 4.03 Gb. Using whole genome shotgun reads, we extended the SNP marker sequences and found 17,093 genes and gene fragments. We showed that collinearity of the Ae. tauschii genes with Brachypodium distachyon, rice, and sorghum decreased with phylogenetic distance and that structural genome evolution rates have been high across all investigated lineages in subfamily Pooideae, including that of Brachypodieae. We obtained additional information about the evolution of the seven Triticeae chromosomes from 12 ancestral chromosomes and uncovered a pattern of centromere inactivation accompanying nested chromosome insertions in grasses. We showed that the density of noncollinear genes along the Ae. tauschii chromosomes positively correlates with recombination rates, suggested a cause, and showed that new genes, exemplified by disease resistance genes, are preferentially located in high-recombination chromosome regions.
Collapse
|
11
|
Physical mapping integrated with syntenic analysis to characterize the gene space of the long arm of wheat chromosome 1A. PLoS One 2013; 8:e59542. [PMID: 23613713 PMCID: PMC3628912 DOI: 10.1371/journal.pone.0059542] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 02/15/2013] [Indexed: 12/02/2022] Open
Abstract
Background Bread wheat (Triticum aestivum L.) is one of the most important crops worldwide and its production faces pressing challenges, the solution of which demands genome information. However, the large, highly repetitive hexaploid wheat genome has been considered intractable to standard sequencing approaches. Therefore the International Wheat Genome Sequencing Consortium (IWGSC) proposes to map and sequence the genome on a chromosome-by-chromosome basis. Methodology/Principal Findings We have constructed a physical map of the long arm of bread wheat chromosome 1A using chromosome-specific BAC libraries by High Information Content Fingerprinting (HICF). Two alternative methods (FPC and LTC) were used to assemble the fingerprints into a high-resolution physical map of the chromosome arm. A total of 365 molecular markers were added to the map, in addition to 1122 putative unique transcripts that were identified by microarray hybridization. The final map consists of 1180 FPC-based or 583 LTC-based contigs. Conclusions/Significance The physical map presented here marks an important step forward in mapping of hexaploid bread wheat. The map is orders of magnitude more detailed than previously available maps of this chromosome, and the assignment of over a thousand putative expressed gene sequences to specific map locations will greatly assist future functional studies. This map will be an essential tool for future sequencing of and positional cloning within chromosome 1A.
Collapse
|
12
|
Navabi ZK, Huebert T, Sharpe AG, O’Neill CM, Bancroft I, Parkin IAP. Conserved microstructure of the Brassica B Genome of Brassica nigra in relation to homologous regions of Arabidopsis thaliana, B. rapa and B. oleracea. BMC Genomics 2013; 14:250. [PMID: 23586706 PMCID: PMC3765694 DOI: 10.1186/1471-2164-14-250] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 04/04/2013] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The Brassica B genome is known to carry several important traits, yet there has been limited analyses of its underlying genome structure, especially in comparison to the closely related A and C genomes. A bacterial artificial chromosome (BAC) library of Brassica nigra was developed and screened with 17 genes from a 222 kb region of A. thaliana that had been well characterised in both the Brassica A and C genomes. RESULTS Fingerprinting of 483 apparently non-redundant clones defined physical contigs for the corresponding regions in B. nigra. The target region is duplicated in A. thaliana and six homologous contigs were found in B. nigra resulting from the whole genome triplication event shared by the Brassiceae tribe. BACs representative of each region were sequenced to elucidate the level of microscale rearrangements across the Brassica species divide. CONCLUSIONS Although the B genome species separated from the A/C lineage some 6 Mya, comparisons between the three paleopolyploid Brassica genomes revealed extensive conservation of gene content and sequence identity. The level of fractionation or gene loss varied across genomes and genomic regions; however, the greatest loss of genes was observed to be common to all three genomes. One large-scale chromosomal rearrangement differentiated the B genome suggesting such events could contribute to the lack of recombination observed between B genome species and those of the closely related A/C lineage.
Collapse
Affiliation(s)
- Zahra-Katy Navabi
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
| | - Terry Huebert
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
| | - Andrew G Sharpe
- DNA Technologies Laboratory, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
| | - Carmel M O’Neill
- John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK
| | - Ian Bancroft
- John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK
| | - Isobel AP Parkin
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
| |
Collapse
|
13
|
Bozdag S, Close TJ, Lonardi S. A graph-theoretical approach to the selection of the minimum tiling path from a physical map. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:352-360. [PMID: 23929859 DOI: 10.1109/tcbb.2013.26] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The problem of computing the minimum tiling path (MTP) from a set of clones arranged in a physical map is a cornerstone of hierarchical (clone-by-clone) genome sequencing projects. We formulate this problem in a graph theoretical framework, and then solve by a combination of minimum hitting set and minimum spanning tree algorithms. The tool implementing this strategy, called FMTP, shows improved performance compared to the widely used software FPC. When we execute FMTP and FPC on the same physical map, the MTP produced by FMTP covers a higher portion of the genome, and uses a smaller number of clones. For instance, on the rice genome the MTP produced by our tool would reduce by about 11 percent the cost of a clone-by-clone sequencing project. Source code, benchmark data sets, and documentation of FMTP are freely available at >http://code.google.com/p/fingerprint-based-minimal-tiling-path/ under MIT license.
Collapse
Affiliation(s)
- Serdar Bozdag
- Department of Mathematics, Statistics and Computer Science, Marquette University, PO Box 1881, Milwaukee, WI 53201-1881, USA.
| | | | | |
Collapse
|
14
|
Palti Y, Genet C, Gao G, Hu Y, You FM, Boussaha M, Rexroad CE, Luo MC. A second generation integrated map of the rainbow trout (Oncorhynchus mykiss) genome: analysis of conserved synteny with model fish genomes. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2012; 14:343-357. [PMID: 22101344 DOI: 10.1007/s10126-011-9418-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2011] [Accepted: 10/18/2011] [Indexed: 05/31/2023]
Abstract
DNA fingerprints and end sequences from bacterial artificial chromosomes (BACs) from two new libraries were generated to improve the first generation integrated physical and genetic map of the rainbow trout (Oncorhynchus mykiss) genome. The current version of the physical map is composed of 167,989 clones of which 158,670 are assembled into contigs and 9,319 are singletons. The number of contigs was reduced from 4,173 to 3,220. End sequencing of clones from the new libraries generated a total of 11,958 high quality sequence reads. The end sequences were used to develop 238 new microsatellites of which 42 were added to the genetic map. Conserved synteny between the rainbow trout genome and model fish genomes was analyzed using 188,443 BAC end sequence (BES) reads. The fractions of BES reads with significant BLASTN hits against the zebrafish, medaka, and stickleback genomes were 8.8%, 9.7%, and 10.5%, respectively, while the fractions of significant BLASTX hits against the zebrafish, medaka, and stickleback protein databases were 6.2%, 5.8%, and 5.5%, respectively. The overall number of unique regions of conserved synteny identified through grouping of the rainbow trout BES into fingerprinting contigs was 2,259, 2,229, and 2,203 for stickleback, medaka, and zebrafish, respectively. These numbers are approximately three to five times greater than those we have previously identified using BAC paired ends. Clustering of the conserved synteny analysis results by linkage groups as derived from the integrated physical and genetic map revealed that despite the low sequence homology, large blocks of macrosynteny are conserved between chromosome arms of rainbow trout and the model fish species.
Collapse
Affiliation(s)
- Yniv Palti
- National Center for Cool and Cold Water Aquaculture, ARS-USDA, 11861 Leetown Road, Kearneysville, WV 25430, USA.
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Advances in BAC-based physical mapping and map integration strategies in plants. J Biomed Biotechnol 2012; 2012:184854. [PMID: 22500080 PMCID: PMC3303678 DOI: 10.1155/2012/184854] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2011] [Revised: 10/26/2011] [Accepted: 11/11/2011] [Indexed: 12/29/2022] Open
Abstract
In the advent of next-generation sequencing (NGS) platforms, map-based sequencing strategy has been recently suppressed being too expensive and laborious. The detailed studies on NGS drafts alone indicated these assemblies remain far from gold standard reference quality, especially when applied on complex genomes. In this context the conventional BAC-based physical mapping has been identified as an important intermediate layer in current hybrid sequencing strategy. BAC-based physical map construction and its integration with high-density genetic maps have benefited from NGS and high-throughput array platforms. This paper addresses the current advancements of BAC-based physical mapping and high-throughput map integration strategies to obtain densely anchored well-ordered physical maps. The resulted maps are of immediate utility while providing a template to harness the maximum benefits of the current NGS platforms.
Collapse
|
16
|
de Boer JM, Borm TJA, Jesse T, Brugmans B, Wiggers-Perebolte L, de Leeuw L, Tang X, Bryan GJ, Bakker J, van Eck HJ, Visser RGF. A hybrid BAC physical map of potato: a framework for sequencing a heterozygous genome. BMC Genomics 2011; 12:594. [PMID: 22142254 PMCID: PMC3261212 DOI: 10.1186/1471-2164-12-594] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2011] [Accepted: 12/05/2011] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Potato is the world's third most important food crop, yet cultivar improvement and genomic research in general remain difficult because of the heterozygous and tetraploid nature of its genome. The development of physical map resources that can facilitate genomic analyses in potato has so far been very limited. Here we present the methods of construction and the general statistics of the first two genome-wide BAC physical maps of potato, which were made from the heterozygous diploid clone RH89-039-16 (RH). RESULTS First, a gel electrophoresis-based physical map was made by AFLP fingerprinting of 64478 BAC clones, which were aligned into 4150 contigs with an estimated total length of 1361 Mb. Screening of BAC pools, followed by the KeyMaps in silico anchoring procedure, identified 1725 AFLP markers in the physical map, and 1252 BAC contigs were anchored the ultradense potato genetic map. A second, sequence-tag-based physical map was constructed from 65919 whole genome profiling (WGP) BAC fingerprints and these were aligned into 3601 BAC contigs spanning 1396 Mb. The 39733 BAC clones that overlap between both physical maps provided anchors to 1127 contigs in the WGP physical map, and reduced the number of contigs to around 2800 in each map separately. Both physical maps were 1.64 times longer than the 850 Mb potato genome. Genome heterozygosity and incomplete merging of BAC contigs are two factors that can explain this map inflation. The contig information of both physical maps was united in a single table that describes hybrid potato physical map. CONCLUSIONS The AFLP physical map has already been used by the Potato Genome Sequencing Consortium for sequencing 10% of the heterozygous genome of clone RH on a BAC-by-BAC basis. By layering a new WGP physical map on top of the AFLP physical map, a genetically anchored genome-wide framework of 322434 sequence tags has been created. This reference framework can be used for anchoring and ordering of genomic sequences of clone RH (and other potato genotypes), and opens the possibility to finish sequencing of the RH genome in a more efficient way via high throughput next generation approaches.
Collapse
Affiliation(s)
- Jan M de Boer
- Wageningen UR Plant Breeding, Wageningen University and Research Centre, Droevendaalstesteeg 1, 6708 PD Wageningen, The Netherlands.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Zuccolo A, Bowers JE, Estill JC, Xiong Z, Luo M, Sebastian A, Goicoechea JL, Collura K, Yu Y, Jiao Y, Duarte J, Tang H, Ayyampalayam S, Rounsley S, Kudrna D, Paterson AH, Pires JC, Chanderbali A, Soltis DE, Chamala S, Barbazuk B, Soltis PS, Albert VA, Ma H, Mandoli D, Banks J, Carlson JE, Tomkins J, dePamphilis CW, Wing RA, Leebens-Mack J. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure. Genome Biol 2011; 12:R48. [PMID: 21619600 PMCID: PMC3219971 DOI: 10.1186/gb-2011-12-5-r48] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2010] [Revised: 05/19/2011] [Accepted: 05/27/2011] [Indexed: 01/19/2023] Open
Abstract
Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
Collapse
Affiliation(s)
- Andrea Zuccolo
- Arizona Genomics Institute, School of Plant Sciences and BIO5 Institute for Collaborative Research, University of Arizona, 1657 East Helen Street, Tucson, AZ 85721, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Palti Y, Genet C, Luo MC, Charlet A, Gao G, Hu Y, Castaño-Sánchez C, Tabet-Canale K, Krieg F, Yao J, Vallejo RL, Rexroad CE. A first generation integrated map of the rainbow trout genome. BMC Genomics 2011; 12:180. [PMID: 21473775 PMCID: PMC3079668 DOI: 10.1186/1471-2164-12-180] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2010] [Accepted: 04/07/2011] [Indexed: 01/13/2023] Open
Abstract
Background Rainbow trout (Oncorhynchus mykiss) are the most-widely cultivated cold freshwater fish in the world and an important model species for many research areas. Coupling great interest in this species as a research model with the need for genetic improvement of aquaculture production efficiency traits justifies the continued development of genomics research resources. Many quantitative trait loci (QTL) have been identified for production and life-history traits in rainbow trout. An integrated physical and genetic map is needed to facilitate fine mapping of QTL and the selection of positional candidate genes for incorporation in marker-assisted selection (MAS) programs for improving rainbow trout aquaculture production. Results The first generation integrated map of the rainbow trout genome is composed of 238 BAC contigs anchored to chromosomes of the genetic map. It covers more than 10% of the genome across segments from all 29 chromosomes. Anchoring of 203 contigs to chromosomes of the National Center for Cool and Cold Water Aquaculture (NCCCWA) genetic map was achieved through mapping of 288 genetic markers derived from BAC end sequences (BES), screening of the BAC library with previously mapped markers and matching of SNPs with BES reads. In addition, 35 contigs were anchored to linkage groups of the INRA (French National Institute of Agricultural Research) genetic map through markers that were not informative for linkage analysis in the NCCCWA mapping panel. The ratio of physical to genetic linkage distances varied substantially among chromosomes and BAC contigs with an average of 3,033 Kb/cM. Conclusions The integrated map described here provides a framework for a robust composite genome map for rainbow trout. This resource is needed for genomic analyses in this research model and economically important species and will facilitate comparative genome mapping with other salmonids and with model fish species. This resource will also facilitate efforts to assemble a whole-genome reference sequence for rainbow trout.
Collapse
Affiliation(s)
- Yniv Palti
- National Center for Cool and Cold Water Aquaculture, ARS-USDA, Kearneysville, WV 25430, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
van Oeveren J, de Ruiter M, Jesse T, van der Poel H, Tang J, Yalcin F, Janssen A, Volpin H, Stormo KE, Bogden R, van Eijk MJT, Prins M. Sequence-based physical mapping of complex genomes by whole genome profiling. Genome Res 2011; 21:618-25. [PMID: 21324881 DOI: 10.1101/gr.112094.110] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
We present whole genome profiling (WGP), a novel next-generation sequencing-based physical mapping technology for construction of bacterial artificial chromosome (BAC) contigs of complex genomes, using Arabidopsis thaliana as an example. WGP leverages short read sequences derived from restriction fragments of two-dimensionally pooled BAC clones to generate sequence tags. These sequence tags are assigned to individual BAC clones, followed by assembly of BAC contigs based on shared regions containing identical sequence tags. Following in silico analysis of WGP sequence tags and simulation of a map of Arabidopsis chromosome 4 and maize, a WGP map of Arabidopsis thaliana ecotype Columbia was constructed de novo using a six-genome equivalent BAC library. Validation of the WGP map using the Columbia reference sequence confirmed that 350 BAC contigs (98%) were assembled correctly, spanning 97% of the 102-Mb calculated genome coverage. We demonstrate that WGP maps can also be generated for more complex plant genomes and will serve as excellent scaffolds to anchor genetic linkage maps and integrate whole genome sequence data.
Collapse
|
20
|
Frenkel Z, Paux E, Mester D, Feuillet C, Korol A. LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes. BMC Bioinformatics 2010; 11:584. [PMID: 21118513 PMCID: PMC3098104 DOI: 10.1186/1471-2105-11-584] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 11/30/2010] [Indexed: 11/25/2022] Open
Abstract
Background Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs). Results To address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize. Conclusions The results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods.
Collapse
Affiliation(s)
- Zeev Frenkel
- University of Haifa, Institute of Evolution, Haifa 31905, Israel.
| | | | | | | | | |
Collapse
|
21
|
A first generation BAC-based physical map of the Asian seabass (Lates calcarifer). PLoS One 2010; 5:e11974. [PMID: 20700486 PMCID: PMC2916840 DOI: 10.1371/journal.pone.0011974] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Accepted: 07/12/2010] [Indexed: 11/19/2022] Open
Abstract
Background The Asian seabass (Lates calcarifer) is an important marine foodfish species in Southeast Asia and Australia. Genetic improvement of this species has been achieved to some extent through selective breeding programs since 1990s. Several genomic tools such as DNA markers, a linkage map, cDNA and BAC libraries have been developed to assist selective breeding. A physical map is still lacking, although it is essential for positional cloning of genes located in quantitative trait loci (QTL) and assembly of whole genome sequences. Methodology/Principal Findings A genome-wide physical map of the Asian seabass was constructed by restriction fingerprinting of 38,208 BAC clones with SNaPshot HICF FPC technique. A total of 30,454 were assembled into 2,865 contigs. The physical length of the assembled contigs summed up to 665 Mb. Analyses of some contigs using different methods demonstrated the reliability of the assembly. Conclusions/Significance The present physical map is the first physical map for Asian seabass. This physical map will facilitate the fine mapping of QTL for economically important traits and the positional cloning of genes located in QTL. It will also be useful for the whole genome sequencing and assembly. Detailed information about BAC-contigs and BAC clones are available upon request.
Collapse
|
22
|
Scalabrin S, Troggio M, Moroldo M, Pindo M, Felice N, Coppola G, Prete G, Malacarne G, Marconi R, Faes G, Jurman I, Grando S, Jesse T, Segala C, Valle G, Policriti A, Fontana P, Morgante M, Velasco R. Physical mapping in highly heterozygous genomes: a physical contig map of the Pinot Noir grapevine cultivar. BMC Genomics 2010; 11:204. [PMID: 20346114 PMCID: PMC2865496 DOI: 10.1186/1471-2164-11-204] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2008] [Accepted: 03/26/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most of the grapevine (Vitis vinifera L.) cultivars grown today are those selected centuries ago, even though grapevine is one of the most important fruit crops in the world. Grapevine has therefore not benefited from the advances in modern plant breeding nor more recently from those in molecular genetics and genomics: genes controlling important agronomic traits are practically unknown. A physical map is essential to positionally clone such genes and instrumental in a genome sequencing project. RESULTS We report on the first whole genome physical map of grapevine built using high information content fingerprinting of 49,104 BAC clones from the cultivar Pinot Noir. Pinot Noir, as most grape varieties, is highly heterozygous at the sequence level. This resulted in the two allelic haplotypes sometimes assembling into separate contigs that had to be accommodated in the map framework or in local expansions of contig maps. We performed computer simulations to assess the effects of increasing levels of sequence heterozygosity on BAC fingerprint assembly and showed that the experimental assembly results are in full agreement with the theoretical expectations, given the heterozygosity levels reported for grape. The map is anchored to a dense linkage map consisting of 994 markers. 436 contigs are anchored to the genetic map, covering 342 of the 475 Mb that make up the grape haploid genome. CONCLUSIONS We have developed a resource that makes it possible to access the grapevine genome, opening the way to a new era both in grape genetics and breeding and in wine making. The effects of heterozygosity on the assembly have been analyzed and characterized by using several complementary approaches which could be easily transferred to the study of other genomes which present the same features.
Collapse
Affiliation(s)
- Simone Scalabrin
- Istituto di Genomica Applicata, Parco Scientifico e Tecnologico di Udine Luigi Danieli, Via J Linussio 51, 33100 Udine, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Luo MC, Ma Y, You FM, Anderson OD, Kopecký D, Simková H, Safár J, Dolezel J, Gill B, McGuire PE, Dvorak J. Feasibility of physical map construction from fingerprinted bacterial artificial chromosome libraries of polyploid plant species. BMC Genomics 2010; 11:122. [PMID: 20170511 PMCID: PMC2836288 DOI: 10.1186/1471-2164-11-122] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2009] [Accepted: 02/19/2010] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND The presence of closely related genomes in polyploid species makes the assembly of total genomic sequence from shotgun sequence reads produced by the current sequencing platforms exceedingly difficult, if not impossible. Genomes of polyploid species could be sequenced following the ordered-clone sequencing approach employing contigs of bacterial artificial chromosome (BAC) clones and BAC-based physical maps. Although BAC contigs can currently be constructed for virtually any diploid organism with the SNaPshot high-information-content-fingerprinting (HICF) technology, it is currently unknown if this is also true for polyploid species. It is possible that BAC clones from orthologous regions of homoeologous chromosomes would share numerous restriction fragments and be therefore included into common contigs. Because of this and other concerns, physical mapping utilizing the SNaPshot HICF of BAC libraries of polyploid species has not been pursued and the possibility of doing so has not been assessed. The sole exception has been in common wheat, an allohexaploid in which it is possible to construct single-chromosome or single-chromosome-arm BAC libraries from DNA of flow-sorted chromosomes and bypass the obstacles created by polyploidy. RESULTS The potential of the SNaPshot HICF technology for physical mapping of polyploid plants utilizing global BAC libraries was evaluated by assembling contigs of fingerprinted clones in an in silico merged BAC library composed of single-chromosome libraries of two wheat homoeologous chromosome arms, 3AS and 3DS, and complete chromosome 3B. Because the chromosome arm origin of each clone was known, it was possible to estimate the fidelity of contig assembly. On average 97.78% or more clones, depending on the library, were from a single chromosome arm. A large portion of the remaining clones was shown to be library contamination from other chromosomes, a feature that is unavoidable during the construction of single-chromosome BAC libraries. CONCLUSIONS The negligibly low level of incorporation of clones from homoeologous chromosome arms into a contig during contig assembly suggested that it is feasible to construct contigs and physical maps using global BAC libraries of wheat and almost certainly also of other plant polyploid species with genome sizes comparable to that of wheat. Because of the high purity of the resulting assembled contigs, they can be directly used for genome sequencing. It is currently unknown but possible that equally good BAC contigs can be also constructed for polyploid species containing smaller, more gene-rich genomes.
Collapse
Affiliation(s)
- Ming-Cheng Luo
- Department of Plant Sciences, University of California, Davis, CA 95616, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Zhou S, Wei F, Nguyen J, Bechner M, Potamousis K, Goldstein S, Pape L, Mehan MR, Churas C, Pasternak S, Forrest DK, Wise R, Ware D, Wing RA, Waterman MS, Livny M, Schwartz DC. A single molecule scaffold for the maize genome. PLoS Genet 2009; 5:e1000711. [PMID: 19936062 PMCID: PMC2774507 DOI: 10.1371/journal.pgen.1000711] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2009] [Accepted: 10/05/2009] [Indexed: 11/18/2022] Open
Abstract
About 85% of the maize genome consists of highly repetitive sequences that are interspersed by low-copy, gene-coding sequences. The maize community has dealt with this genomic complexity by the construction of an integrated genetic and physical map (iMap), but this resource alone was not sufficient for ensuring the quality of the current sequence build. For this purpose, we constructed a genome-wide, high-resolution optical map of the maize inbred line B73 genome containing >91,000 restriction sites (averaging 1 site/∼23 kb) accrued from mapping genomic DNA molecules. Our optical map comprises 66 contigs, averaging 31.88 Mb in size and spanning 91.5% (2,103.93 Mb/∼2,300 Mb) of the maize genome. A new algorithm was created that considered both optical map and unfinished BAC sequence data for placing 60/66 (2,032.42 Mb) optical map contigs onto the maize iMap. The alignment of optical maps against numerous data sources yielded comprehensive results that proved revealing and productive. For example, gaps were uncovered and characterized within the iMap, the FPC (fingerprinted contigs) map, and the chromosome-wide pseudomolecules. Such alignments also suggested amended placements of FPC contigs on the maize genetic map and proactively guided the assembly of chromosome-wide pseudomolecules, especially within complex genomic regions. Lastly, we think that the full integration of B73 optical maps with the maize iMap would greatly facilitate maize sequence finishing efforts that would make it a valuable reference for comparative studies among cereals, or other maize inbred lines and cultivars. The maize genome contains abundant repeats interspersed by low-copy, gene-coding sequences that make it a challenge to sequence; consequently, current BAC sequence assemblies average 11 contigs per clone. The iMap deals with such complexity by the judicious integration of IBM genetic and B73 physical maps, but the B73 genome structure could differ from the IBM population because of genetic recombination and subsequent rearrangements. Accordingly, we report a genome-wide, high-resolution optical map of maize B73 genome that was constructed from the direct analysis of genomic DNA molecules without using genetic markers. The integration of optical and iMap resources with comparisons to FPC maps enabled a uniquely comprehensive and scalable assessment of a given BAC's sequence assembly, its placement within a FPC contig, and the location of this FPC contig within a chromosome-wide pseudomolecule. As such, the overall utility of the maize optical map for the validation of sequence assemblies has been significant and demonstrates the inherent advantages of single molecule platforms. Construction of the maize optical map represents the first physical map of a eukaryotic genome larger than 400 Mb that was created de novo from individual genomic DNA molecules.
Collapse
Affiliation(s)
- Shiguo Zhou
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Fusheng Wei
- Department of Plant Sciences, Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
| | - John Nguyen
- Departments of Mathematics, Biology, and Computer Science, University of Southern California, Los Angeles, California, United States of America
| | - Mike Bechner
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Konstantinos Potamousis
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Steve Goldstein
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Louise Pape
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Michael R. Mehan
- Departments of Mathematics, Biology, and Computer Science, University of Southern California, Los Angeles, California, United States of America
| | - Chris Churas
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Shiran Pasternak
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Dan K. Forrest
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Roger Wise
- Corn Insects and Crop Genetics Research, United States Department of Agriculture–Agricultural Research Service and Department of Plant Pathology, Iowa State University, Ames, Iowa, United States of America
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- Plant, Soil, and Nutrition Research, United States Department of Agriculture–Agricultural Research Service, Ithaca, New York, United States of America
| | - Rod A. Wing
- Department of Plant Sciences, Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Michael S. Waterman
- Departments of Mathematics, Biology, and Computer Science, University of Southern California, Los Angeles, California, United States of America
| | - Miron Livny
- Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - David C. Schwartz
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- * E-mail:
| |
Collapse
|
25
|
The physical and genetic framework of the maize B73 genome. PLoS Genet 2009; 5:e1000715. [PMID: 19936061 PMCID: PMC2774505 DOI: 10.1371/journal.pgen.1000715] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2009] [Accepted: 10/12/2009] [Indexed: 11/19/2022] Open
Abstract
Maize is a major cereal crop and an important model system for basic biological research. Knowledge gained from maize research can also be used to genetically improve its grass relatives such as sorghum, wheat, and rice. The primary objective of the Maize Genome Sequencing Consortium (MGSC) was to generate a reference genome sequence that was integrated with both the physical and genetic maps. Using a previously published integrated genetic and physical map, combined with in-coming maize genomic sequence, new sequence-based genetic markers, and an optical map, we dynamically picked a minimum tiling path (MTP) of 16,910 bacterial artificial chromosome (BAC) and fosmid clones that were used by the MGSC to sequence the maize genome. The final MTP resulted in a significantly improved physical map that reduced the number of contigs from 721 to 435, incorporated a total of 8,315 mapped markers, and ordered and oriented the majority of FPC contigs. The new integrated physical and genetic map covered 2,120 Mb (93%) of the 2,300-Mb genome, of which 405 contigs were anchored to the genetic map, totaling 2,103.4 Mb (99.2% of the 2,120 Mb physical map). More importantly, 336 contigs, comprising 94.0% of the physical map (∼1,993 Mb), were ordered and oriented. Finally we used all available physical, sequence, genetic, and optical data to generate a golden path (AGP) of chromosome-based pseudomolecules, herein referred to as the B73 Reference Genome Sequence version 1 (B73 RefGen_v1). Maize has been a cultural icon and staple food crop of Americans since the discovery of the new world in 1492. Contemporary society is now faced with growing demands for food and fuel in the face of global climate change and the potential for increased disease pressure. To provide a comprehensive foundation to systematically understand maize biology with the goal of breeding higher yielding, disease-resistant, and drought-tolerant cultivars, our consortium sequenced the B73 genome of maize. In this study, we used a comprehensive physical and genetic framework map to develop a minimum tiling path (MTP) of over 16,000 BAC clones across the genome. The MTP was generated dynamically and integrated numerous data types, such as in-coming genome sequence, over 8,000 sequence-based genetic markers, and the maize optical map. This allowed us to genetically anchor, order, and orient the majority of the maize physical map and genome sequence to the genetic map. Post-genome sequencing, we constructed a golden path (AGP) of sequence-based pseudomolecules representing the ten chromosomes of the maize B73 genome (B73 RefGen_v1). This unprecedented integration of genetic, physical, and genomic sequence into one framework will greatly facilitate all aspects of plant biological research.
Collapse
|
26
|
Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs. PLoS Genet 2009; 5:e1000740. [PMID: 19936069 PMCID: PMC2774520 DOI: 10.1371/journal.pgen.1000740] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2009] [Accepted: 10/24/2009] [Indexed: 11/29/2022] Open
Abstract
Full-length cDNA (FLcDNA) sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5′ and 3′ UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs), only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org). To complement the completion of sequencing the maize B73 genome, we sequenced 27,455 full-length cDNAs (FLcDNA) from two maize B73 libraries representing the gene transcripts from most tissues and common abiotic stress conditions. The FLcDNAs are beneficial in determining the exon/intron structure of genes by aligning them to the sequenced genome; 94% of our FLcDNAs aligned to the maize genome. The 27,455 FLcDNAs were compared to gene sequences for rice, sorghum, Arabidopsis, and poplar; 22,874 were found in all four sets, and 1,737 were unique to maize. Two-thirds of the maize genome is composed of a type of repetitive sequence called “transposable elements”; only 5.6% of the FLcDNA sequence contained any segment homologous to these repeats. In addition to our set, there are three other sets of maize FLcDNAs for a total of 69,306 gene transcripts, where many of them are from different maize lines (i.e. FLcDNAs often have only slight differences reflecting divergence). We assembled these together using parameters that would allow most alleles and recently diverged gene transcripts to align together, resulting in 46,739 unique gene transcripts.
Collapse
|
27
|
Gu YQ, Ma Y, Huo N, Vogel JP, You FM, Lazo GR, Nelson WM, Soderlund C, Dvorak J, Anderson OD, Luo MC. A BAC-based physical map of Brachypodium distachyon and its comparative analysis with rice and wheat. BMC Genomics 2009; 10:496. [PMID: 19860896 PMCID: PMC2774330 DOI: 10.1186/1471-2164-10-496] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2009] [Accepted: 10/27/2009] [Indexed: 11/13/2022] Open
Abstract
Background Brachypodium distachyon (Brachypodium) has been recognized as a new model species for comparative and functional genomics of cereal and bioenergy crops because it possesses many biological attributes desirable in a model, such as a small genome size, short stature, self-pollinating habit, and short generation cycle. To maximize the utility of Brachypodium as a model for basic and applied research it is necessary to develop genomic resources for it. A BAC-based physical map is one of them. A physical map will facilitate analysis of genome structure, comparative genomics, and assembly of the entire genome sequence. Results A total of 67,151 Brachypodium BAC clones were fingerprinted with the SNaPshot HICF fingerprinting method and a genome-wide physical map of the Brachypodium genome was constructed. The map consisted of 671 contigs and 2,161 clones remained as singletons. The contigs and singletons spanned 414 Mb. A total of 13,970 gene-related sequences were detected in the BAC end sequences (BES). These gene tags aligned 345 contigs with 336 Mb of rice genome sequence, showing that Brachypodium and rice genomes are generally highly colinear. Divergent regions were mainly in the rice centromeric regions. A dot-plot of Brachypodium contigs against the rice genome sequences revealed remnants of the whole-genome duplication caused by paleotetraploidy, which were previously found in rice and sorghum. Brachypodium contigs were anchored to the wheat deletion bin maps with the BES gene-tags, opening the door to Brachypodium-Triticeae comparative genomics. Conclusion The construction of the Brachypodium physical map, and its comparison with the rice genome sequence demonstrated the utility of the SNaPshot-HICF method in the construction of BAC-based physical maps. The map represents an important genomic resource for the completion of Brachypodium genome sequence and grass comparative genomics. A draft of the physical map and its comparisons with rice and wheat are available at .
Collapse
Affiliation(s)
- Yong Q Gu
- 1Genomics and Gene Discovery Research Unit, USDA-ARS, Western Regional Research Center, 800 Buchanan Street, Albany, CA 94710,USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Palti Y, Luo MC, Hu Y, Genet C, You FM, Vallejo RL, Thorgaard GH, Wheeler PA, Rexroad CE. A first generation BAC-based physical map of the rainbow trout genome. BMC Genomics 2009; 10:462. [PMID: 19814815 PMCID: PMC2763887 DOI: 10.1186/1471-2164-10-462] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2009] [Accepted: 10/08/2009] [Indexed: 01/09/2023] Open
Abstract
Background Rainbow trout (Oncorhynchus mykiss) are the most-widely cultivated cold freshwater fish in the world and an important model species for many research areas. Coupling great interest in this species as a research model with the need for genetic improvement of aquaculture production efficiency traits justifies the continued development of genomics research resources. Many quantitative trait loci (QTL) have been identified for production and life-history traits in rainbow trout. A bacterial artificial chromosome (BAC) physical map is needed to facilitate fine mapping of QTL and the selection of positional candidate genes for incorporation in marker-assisted selection (MAS) for improving rainbow trout aquaculture production. This resource will also facilitate efforts to obtain and assemble a whole-genome reference sequence for this species. Results The physical map was constructed from DNA fingerprinting of 192,096 BAC clones using the 4-color high-information content fingerprinting (HICF) method. The clones were assembled into physical map contigs using the finger-printing contig (FPC) program. The map is composed of 4,173 contigs and 9,379 singletons. The total number of unique fingerprinting fragments (consensus bands) in contigs is 1,185,157, which corresponds to an estimated physical length of 2.0 Gb. The map assembly was validated by 1) comparison with probe hybridization results and agarose gel fingerprinting contigs; and 2) anchoring large contigs to the microsatellite-based genetic linkage map. Conclusion The production and validation of the first BAC physical map of the rainbow trout genome is described in this paper. We are currently integrating this map with the NCCCWA genetic map using more than 200 microsatellites isolated from BAC end sequences and by identifying BACs that harbor more than 300 previously mapped markers. The availability of an integrated physical and genetic map will enable detailed comparative genome analyses, fine mapping of QTL, positional cloning, selection of positional candidate genes for economically important traits and the incorporation of MAS into rainbow trout breeding programs.
Collapse
Affiliation(s)
- Yniv Palti
- National Center for Cool and Cold Water Aquaculture, ARS-USDA, Kearneysville, WV 25430, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Bozdag S, Close TJ, Lonardi S. A compartmentalized approach to the assembly of physical maps. BMC Bioinformatics 2009; 10:217. [PMID: 19604400 PMCID: PMC2717093 DOI: 10.1186/1471-2105-10-217] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2008] [Accepted: 07/15/2009] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Physical maps have been historically one of the cornerstones of genome sequencing and map-based cloning strategies. They also support marker assisted breeding and EST mapping. The problem of building a high quality physical map is computationally challenging due to unavoidable noise in the input fingerprint data. RESULTS We propose a novel compartmentalized method for the assembly of high quality physical maps from fingerprinted clones. The knowledge of genetic markers enables us to group clones into clusters so that clones in the same cluster are more likely to overlap. For each cluster of clones, a local physical map is first constructed using FingerPrinted Contigs (FPC). Then, all the individual maps are carefully merged into the final physical map. Experimental results on the genomes of rice and barley demonstrate that the compartmentalized assembly produces significantly more accurate maps, and that it can detect and isolate clones that would induce "chimeric" contigs if used in the final assembly. CONCLUSION The software is available for download at http://www.cs.ucr.edu/~sbozdag/assembler/
Collapse
Affiliation(s)
- Serdar Bozdag
- National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | |
Collapse
|
30
|
Aggarwal R, Benatti TR, Gill N, Zhao C, Chen MS, Fellers JP, Schemerhorn BJ, Stuart JJ. A BAC-based physical map of the Hessian fly genome anchored to polytene chromosomes. BMC Genomics 2009; 10:293. [PMID: 19573234 PMCID: PMC2709663 DOI: 10.1186/1471-2164-10-293] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2009] [Accepted: 07/02/2009] [Indexed: 11/27/2022] Open
Abstract
Background The Hessian fly (Mayetiola destructor) is an important insect pest of wheat. It has tractable genetics, polytene chromosomes, and a small genome (158 Mb). Investigation of the Hessian fly presents excellent opportunities to study plant-insect interactions and the molecular mechanisms underlying genome imprinting and chromosome elimination. A physical map is needed to improve the ability to perform both positional cloning and comparative genomic analyses with the fully sequenced genomes of other dipteran species. Results An FPC-based genome wide physical map of the Hessian fly was constructed and anchored to the insect's polytene chromosomes. Bacterial artificial chromosome (BAC) clones corresponding to 12-fold coverage of the Hessian fly genome were fingerprinted, using high information content fingerprinting (HIFC) methodology, and end-sequenced. Fluorescence in situ hybridization (FISH) co-localized two BAC clones from each of the 196 longest contigs on the polytene chromosomes. An additional 70 contigs were positioned using a single FISH probe. The 266 FISH mapped contigs were evenly distributed and covered 60% of the genome (95,668 kb). The ends of the fingerprinted BACs were then sequenced to develop the capacity to create sequenced tagged site (STS) markers on the BACs in the map. Only 3.64% of the BAC-end sequence was composed of transposable elements, helicases, ribosomal repeats, simple sequence repeats, and sequences of low complexity. A relatively large fraction (14.27%) of the BES was comprised of multi-copy gene sequences. Nearly 1% of the end sequence was composed of simple sequence repeats (SSRs). Conclusion This physical map provides the foundation for high-resolution genetic mapping, map-based cloning, and assembly of complete genome sequencing data. The results indicate that restriction fragment length heterogeneity in BAC libraries used to construct physical maps lower the length and the depth of the contigs, but is not an absolute barrier to the successful application of the technology. This map will serve as a genomic resource for accelerating gene discovery, genome sequencing, and the assembly of BAC sequences. The Hessian fly BAC-clone assembly, and the names and positions of the BAC clones used in the FISH experiments are publically available at .
Collapse
Affiliation(s)
- Rajat Aggarwal
- Department of Entomology, Purdue University, West Lafayette, IN 47907, USA.
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Scalabrin S, Morgante M, Policriti A. Automated FingerPrint Background removal: FPB. BMC Bioinformatics 2009; 10:127. [PMID: 19405935 PMCID: PMC2689866 DOI: 10.1186/1471-2105-10-127] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2008] [Accepted: 04/30/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The construction of a whole-genome physical map has been an essential component of numerous genome projects initiated since the inception of the Human Genome Project. Its usefulness has been proved for whole-genome shotgun projects as a post-assembly validation and recently it has also been used in the assembly step to constrain on BACs positions. Fingerprinting is usually the method of choice for construction of physical maps. A clone fingerprint is composed of true peaks representing real fragments and background peaks, mainly composed of E. coli genomic DNA, partial digestions, star activity by-products, and machine background. High-throughput fingerprinting leads to the production of thousands of BAC clone fingerprints per day. That is why background peaks removal has become an important issue and needs to be automatized, especially in capillary electrophoresis based fingerprints. RESULTS At the moment, the only tools available for such a task are GenoProfiler and its descendant FPMiner. The large variation in the quality of fingerprints that is usually present in large fingerprinting projects represents a major difficulty in the correct removal of background peaks that has only been partially addressed by the methods so far adopted that all require a long manual optimization of parameters. Thus, we implemented a new data-independent tool, FPB (FingerPrint Background removal), suitable for large scale projects as well as mapping of few clones. CONCLUSION FPB is freely available at http://www.appliedgenomics.org/tools.php. FPB was used to remove the background from all fingerprints of three grapevine physical map projects. The first project consists of about 50,000 fingerprints, the second one consists of about 70,000 fingerprints, and the third one consists of about 45,000 fingerprints. In all cases a successful assembly was built.
Collapse
Affiliation(s)
- Simone Scalabrin
- Istituto di Genomica Applicata (IGA), via J, Linussio 51, I-33100 Udine, Italy.
| | | | | |
Collapse
|
32
|
Abstract
Recent advances in both clone fingerprinting and draft sequencing technology have made it increasingly common for species to have a bacterial artificial clone (BAC) fingerprint map, BAC end sequences (BESs) and draft genomic sequence. The FPC (fingerprinted contigs) software package contains three modules that maximize the value of these resources. The BSS (blast some sequence) module provides a way to easily view the results of aligning draft sequence to the BESs, and integrates the results with the following two modules. The MTP (minimal tiling path) module uses sequence and fingerprints to determine a minimal tiling path of clones. The DSI (draft sequence integration) module aligns draft sequences to FPC contigs, displays them alongside the contigs and identifies potential discrepancies; the alignment can be based on either individual BES alignments to the draft, or on the locations of BESs that have been assembled into the draft. FPC also supports high-throughput fingerprint map generation as its time-intensive functions have been parallelized for Unix-based desktops or servers with multiple CPUs. Simulation results are provided for the MTP, DSI and parallelization. These features are in the FPC V9.3 software package, which is freely available.
Collapse
Affiliation(s)
- William Nelson
- Arizona Genomics Computational Laboratory, BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | | |
Collapse
|
33
|
Messing J. Synergy of two reference genomes for the grass family. PLANT PHYSIOLOGY 2009; 149:117-24. [PMID: 19126702 PMCID: PMC2613724 DOI: 10.1104/pp.108.128520] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2008] [Accepted: 10/10/2008] [Indexed: 05/19/2023]
Affiliation(s)
- Joachim Messing
- Waksman Institute of Microbiology, Rutgers University, Piscataway, New Jersey 08854-8020, USA.
| |
Collapse
|
34
|
Nelson W, Luo M, Ma J, Estep M, Estill J, He R, Talag J, Sisneros N, Kudrna D, Kim H, Ammiraju JSS, Collura K, Bharti AK, Messing J, Wing RA, SanMiguel P, Bennetzen JL, Soderlund C. Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains. BMC Genomics 2008; 9:621. [PMID: 19099592 PMCID: PMC2628917 DOI: 10.1186/1471-2164-9-621] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2008] [Accepted: 12/19/2008] [Indexed: 11/30/2022] Open
Abstract
Background Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR) and methylation spanning linker libraries (MSLL). These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends. Results A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig), while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%). These two techniques were compared with other gene-enrichment methods, and shown to be complementary. Conclusion MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of epigenetic boundaries are barely understood at this time, MSLL technology flags both approximate boundaries and methylated genes that deserve additional investigation. MSLL and HMPR sequences provide a valuable resource for maize genome annotation, and are a uniquely valuable complement to any plant genome sequencing project. In order to make these results fully accessible to the community, a web display was developed that shows the alignment of MSLL, HMPR, and other gene-rich sequences to the BACs; this display is continually updated with the latest ESTs and BAC sequences.
Collapse
Affiliation(s)
- William Nelson
- Arizona Genomics Computational Laboratory, BIO5 Institute, University of Arizona, Tucson, Arizona, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Holding DR, Larkins BA. Zein Storage Proteins. MOLECULAR GENETIC APPROACHES TO MAIZE IMPROVEMENT 2008. [DOI: 10.1007/978-3-540-68922-5_19] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
36
|
Kurtz S, Narechania A, Stein JC, Ware D. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics 2008; 9:517. [PMID: 18976482 PMCID: PMC2613927 DOI: 10.1186/1471-2164-9-517] [Citation(s) in RCA: 168] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Accepted: 10/31/2008] [Indexed: 12/02/2022] Open
Abstract
Background The challenges of accurate gene prediction and enumeration are further aggravated in large genomes that contain highly repetitive transposable elements (TEs). Yet TEs play a substantial role in genome evolution and are themselves an important subject of study. Repeat annotation, based on counting occurrences of k-mers, has been previously used to distinguish TEs from low-copy genic regions; but currently available software solutions are impractical due to high memory requirements or specialization for specific user-tasks. Results Here we introduce the Tallymer software, a flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the k-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set of whole genome shotgun sequences from maize (B73) (total size 109 bp.). We analyzed k-mer frequencies for a wide range of k. At this low genome coverage (≈ 0.45×) highly repetitive 20-mers constituted 44% of the genome but represented only 1% of all possible k-mers. Similar low-complexity was seen in the repeat fractions of sorghum and rice. When applying our method to other maize data sets, High-C0t derived sequences showed the greatest enrichment for low-copy sequences. Among annotated TEs, the most highly repetitive were of the Ty3/gypsy class of retrotransposons, followed by the Ty1/copia class, and DNA transposons. Among expressed sequence tags (EST), a notable fraction contained high-copy k-mers, suggesting that transposons are still active in maize. Retrotransposons in Mo17 and McC cultivars were readily detected using the B73 20-mer frequency index, indicating their conservation despite extensive rearrangement across cultivars. Among one hundred annotated bacterial artificial chromosomes (BACs), k-mer frequency could be used to detect transposon-encoded genes with 92% sensitivity, compared to 96% using alignment-based repeat masking, while both methods showed 92% specificity. Conclusion The Tallymer software was effective in a variety of applications to aid genome annotation in maize, despite limitations imposed by the relatively low coverage of sequence available. For more information on the software, see .
Collapse
Affiliation(s)
- Stefan Kurtz
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany.
| | | | | | | |
Collapse
|
37
|
Moroldo M, Paillard S, Marconi R, Fabrice L, Canaguier A, Cruaud C, De Berardinis V, Guichard C, Brunaud V, Le Clainche I, Scalabrin S, Testolin R, Di Gaspero G, Morgante M, Adam-Blondon AF. A physical map of the heterozygous grapevine 'Cabernet Sauvignon' allows mapping candidate genes for disease resistance. BMC PLANT BIOLOGY 2008; 8:66. [PMID: 18554400 PMCID: PMC2442077 DOI: 10.1186/1471-2229-8-66] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2008] [Accepted: 06/13/2008] [Indexed: 05/18/2023]
Abstract
BACKGROUND Whole-genome physical maps facilitate genome sequencing, sequence assembly, mapping of candidate genes, and the design of targeted genetic markers. An automated protocol was used to construct a Vitis vinifera 'Cabernet Sauvignon' physical map. The quality of the result was addressed with regard to the effect of high heterozygosity on the accuracy of contig assembly. Its usefulness for the genome-wide mapping of genes for disease resistance, which is an important trait for grapevine, was then assessed. RESULTS The physical map included 29,727 BAC clones assembled into 1,770 contigs, spanning 715,684 kbp, and corresponding to 1.5-fold the genome size. Map inflation was due to high heterozygosity, which caused either the separation of allelic BACs in two different contigs, or local mis-assembly in contigs containing BACs from the two haplotypes. Genetic markers anchored 395 contigs or 255,476 kbp to chromosomes. The fully automated assembly and anchorage procedures were validated by BAC-by-BAC blast of the end sequences against the grape genome sequence, unveiling 7.3% of chimerical contigs. The distribution across the physical map of candidate genes for non-host and host resistance, and for defence signalling pathways was then studied. NBS-LRR and RLK genes for host resistance were found in 424 contigs, 133 of them (32%) were assigned to chromosomes, on which they are mostly organised in clusters. Non-host and defence signalling genes were found in 99 contigs dispersed without a discernable pattern across the genome. CONCLUSION Despite some limitations that interfere with the correct assembly of heterozygous clones into contigs, the 'Cabernet Sauvignon' physical map is a useful and reliable intermediary step between a genetic map and the genome sequence. This tool was successfully exploited for a quick mapping of complex families of genes, and it strengthened previous clues of co-localisation of major NBS-LRR clusters and disease resistance loci in grapevine.
Collapse
Affiliation(s)
- Marco Moroldo
- UMR de Génomique Végétale, INRA-CNRS-UEVE, 2, Rue Gaston Crémieux, CP5708, 91057 Evry Cedex, France
| | - Sophie Paillard
- UMR de Génomique Végétale, INRA-CNRS-UEVE, 2, Rue Gaston Crémieux, CP5708, 91057 Evry Cedex, France
- UMR118, INRA-Agrocampus, University of Rennes, Amélioration des Plantes et Biotechnologies Végétales, F-35650 Le Rheu, France
| | - Raffaella Marconi
- Dipartimento di Scienze Agrarie e Ambientali, University of Udine, via delle Scienze 208, 33100 Udine, Italy
| | - Legeai Fabrice
- Unité de Recherche Génomique-Info, URGI, Tour Evry 2, 523, Place des Terrasses de l'Agora, 91034 Evry Cedex, France
| | - Aurelie Canaguier
- UMR de Génomique Végétale, INRA-CNRS-UEVE, 2, Rue Gaston Crémieux, CP5708, 91057 Evry Cedex, France
| | - Corinne Cruaud
- Gnoscope, 2, rue Gaston Crémieux, CP5706, 91057 Evry Cedex, France
| | | | - Cecile Guichard
- UMR de Génomique Végétale, INRA-CNRS-UEVE, 2, Rue Gaston Crémieux, CP5708, 91057 Evry Cedex, France
| | - Veronique Brunaud
- UMR de Génomique Végétale, INRA-CNRS-UEVE, 2, Rue Gaston Crémieux, CP5708, 91057 Evry Cedex, France
| | - Isabelle Le Clainche
- UMR de Génomique Végétale, INRA-CNRS-UEVE, 2, Rue Gaston Crémieux, CP5708, 91057 Evry Cedex, France
| | - Simone Scalabrin
- Dipartimento di Scienze Matematiche, University of Udine, via delle Scienze 208, 33100 Udine, Italy
- Istituto di Genomica Applicata, Parco Scientifico e Tecnologico Luigi Danieli, via Jacopo Linussio 51, 33100 Udine, Italy
| | - Raffaele Testolin
- Dipartimento di Scienze Agrarie e Ambientali, University of Udine, via delle Scienze 208, 33100 Udine, Italy
- Istituto di Genomica Applicata, Parco Scientifico e Tecnologico Luigi Danieli, via Jacopo Linussio 51, 33100 Udine, Italy
| | - Gabriele Di Gaspero
- Dipartimento di Scienze Agrarie e Ambientali, University of Udine, via delle Scienze 208, 33100 Udine, Italy
- Istituto di Genomica Applicata, Parco Scientifico e Tecnologico Luigi Danieli, via Jacopo Linussio 51, 33100 Udine, Italy
| | - Michele Morgante
- Dipartimento di Scienze Agrarie e Ambientali, University of Udine, via delle Scienze 208, 33100 Udine, Italy
- Istituto di Genomica Applicata, Parco Scientifico e Tecnologico Luigi Danieli, via Jacopo Linussio 51, 33100 Udine, Italy
| | | |
Collapse
|
38
|
Mun JH, Kwon SJ, Yang TJ, Kim HS, Choi BS, Baek S, Kim JS, Jin M, Kim JA, Lim MH, Lee SI, Kim HI, Kim H, Lim YP, Park BS. The first generation of a BAC-based physical map of Brassica rapa. BMC Genomics 2008; 9:280. [PMID: 18549474 PMCID: PMC2432078 DOI: 10.1186/1471-2164-9-280] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2007] [Accepted: 06/12/2008] [Indexed: 11/30/2022] Open
Abstract
Background The genus Brassica includes the most extensively cultivated vegetable crops worldwide. Investigation of the Brassica genome presents excellent challenges to study plant genome evolution and divergence of gene function associated with polyploidy and genome hybridization. A physical map of the B. rapa genome is a fundamental tool for analysis of Brassica "A" genome structure. Integration of a physical map with an existing genetic map by linking genetic markers and BAC clones in the sequencing pipeline provides a crucial resource for the ongoing genome sequencing effort and assembly of whole genome sequences. Results A genome-wide physical map of the B. rapa genome was constructed by the capillary electrophoresis-based fingerprinting of 67,468 Bacterial Artificial Chromosome (BAC) clones using the five restriction enzyme SNaPshot technique. The clones were assembled into contigs by means of FPC v8.5.3. After contig validation and manual editing, the resulting contig assembly consists of 1,428 contigs and is estimated to span 717 Mb in physical length. This map provides 242 anchored contigs on 10 linkage groups to be served as seed points from which to continue bidirectional chromosome extension for genome sequencing. Conclusion The map reported here is the first physical map for Brassica "A" genome based on the High Information Content Fingerprinting (HICF) technique. This physical map will serve as a fundamental genomic resource for accelerating genome sequencing, assembly of BAC sequences, and comparative genomics between Brassica genomes. The current build of the B. rapa physical map is available at the B. rapa Genome Project website for the user community.
Collapse
Affiliation(s)
- Jeong-Hwan Mun
- Brassica Genomics Team, National Institute of Agricultural Biotechnology, Rural Development Administration, 225 Seodun-dong, Gwonseon-gu, Suwon 441-707, South Korea.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Wei F, Coe E, Nelson W, Bharti AK, Engler F, Butler E, Kim H, Goicoechea JL, Chen M, Lee S, Fuks G, Sanchez-Villeda H, Schroeder S, Fang Z, McMullen M, Davis G, Bowers JE, Paterson AH, Schaeffer M, Gardiner J, Cone K, Messing J, Soderlund C, Wing RA. Physical and genetic structure of the maize genome reflects its complex evolutionary history. PLoS Genet 2008; 3:e123. [PMID: 17658954 PMCID: PMC1934398 DOI: 10.1371/journal.pgen.0030123] [Citation(s) in RCA: 228] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2007] [Accepted: 06/11/2007] [Indexed: 11/21/2022] Open
Abstract
Maize (Zea mays L.) is one of the most important cereal crops and a model for the study of genetics, evolution, and domestication. To better understand maize genome organization and to build a framework for genome sequencing, we constructed a sequence-ready fingerprinted contig-based physical map that covers 93.5% of the genome, of which 86.1% is aligned to the genetic map. The fingerprinted contig map contains 25,908 genic markers that enabled us to align nearly 73% of the anchored maize genome to the rice genome. The distribution pattern of expressed sequence tags correlates to that of recombination. In collinear regions, 1 kb in rice corresponds to an average of 3.2 kb in maize, yet maize has a 6-fold genome size expansion. This can be explained by the fact that most rice regions correspond to two regions in maize as a result of its recent polyploid origin. Inversions account for the majority of chromosome structural variations during subsequent maize diploidization. We also find clear evidence of ancient genome duplication predating the divergence of the progenitors of maize and rice. Reconstructing the paleoethnobotany of the maize genome indicates that the progenitors of modern maize contained ten chromosomes. As a cash crop and a model biological system, maize is of great public interest. To facilitate maize molecular breeding and its basic biology research, we built a high-resolution physical map with two different fingerprinting methods on the same set of bacterial artificial chromosome clones. The physical map was integrated to a high-density genetic map and further serves as a framework for the maize genome-sequencing project. Comparative genomics showed that the euchromatic regions between rice and maize are very conserved. Physically we delimited these conserved regions and thus detected many genome rearrangements. We defined extensively the duplication blocks within the maize genome. These blocks allowed us to reconstruct the chromosomes of the maize progenitor. We detected that maize genome has experienced two rounds of genome duplications, an ancient one before maize–rice divergence and a recent one after tetraploidization.
Collapse
Affiliation(s)
- Fusheng Wei
- Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Ed Coe
- Division of Plant Sciences, University of Missouri, Columbia, Missouri, United States of America
- Plant Genetics Research Unit, Agricultural Research Service, United States Department of Agriculture, Columbia, Missouri, United States of America
| | - William Nelson
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
- Arizona Genomics Computational Laboratory, University of Arizona, Tucson, Arizona, United States of America
| | - Arvind K Bharti
- Plant Genome Initiative at Rutgers, Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
| | - Fred Engler
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
- Arizona Genomics Computational Laboratory, University of Arizona, Tucson, Arizona, United States of America
| | - Ed Butler
- Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - HyeRan Kim
- Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Jose Luis Goicoechea
- Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Mingsheng Chen
- Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Seunghee Lee
- Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Galina Fuks
- Plant Genome Initiative at Rutgers, Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
| | - Hector Sanchez-Villeda
- Division of Plant Sciences, University of Missouri, Columbia, Missouri, United States of America
| | - Steven Schroeder
- Division of Plant Sciences, University of Missouri, Columbia, Missouri, United States of America
| | - Zhiwei Fang
- Division of Plant Sciences, University of Missouri, Columbia, Missouri, United States of America
| | - Michael McMullen
- Division of Plant Sciences, University of Missouri, Columbia, Missouri, United States of America
- Plant Genetics Research Unit, Agricultural Research Service, United States Department of Agriculture, Columbia, Missouri, United States of America
| | - Georgia Davis
- Division of Plant Sciences, University of Missouri, Columbia, Missouri, United States of America
| | - John E Bowers
- Plant Genome Mapping Laboratory, Departments of Crop and Soil Science, Plant Biology, and Genetics, University of Georgia, Athens, Georgia, United States of America
| | - Andrew H Paterson
- Plant Genome Mapping Laboratory, Departments of Crop and Soil Science, Plant Biology, and Genetics, University of Georgia, Athens, Georgia, United States of America
| | - Mary Schaeffer
- Division of Plant Sciences, University of Missouri, Columbia, Missouri, United States of America
- Plant Genetics Research Unit, Agricultural Research Service, United States Department of Agriculture, Columbia, Missouri, United States of America
| | - Jack Gardiner
- Division of Plant Sciences, University of Missouri, Columbia, Missouri, United States of America
| | - Karen Cone
- Division of Biological Sciences, University of Missouri, Columbia, Missouri, Arizona, United States of America
| | - Joachim Messing
- Plant Genome Initiative at Rutgers, Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
| | - Carol Soderlund
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
- Arizona Genomics Computational Laboratory, University of Arizona, Tucson, Arizona, United States of America
- * To whom correspondence should be addressed. E-mail: (CS); (RAW)
| | - Rod A Wing
- Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
- Department of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
- * To whom correspondence should be addressed. E-mail: (CS); (RAW)
| |
Collapse
|
40
|
Mathewson CA, Schein JE, Marra MA. Large-scale BAC clone restriction digest fingerprinting. ACTA ACUST UNITED AC 2008; Chapter 5:Unit 5.19. [PMID: 18428413 DOI: 10.1002/0471142905.hg0519s53] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Restriction digest fingerprinting is a common method for characterizing large insert genomic clones, e.g., bacterial artificial chromosome (BAC), P1 artificial chromosome (PAC) and Fosmid clones. This clone fingerprinting method has been widely applied in the construction of clone-based physical maps, which have been used as positional cloning resources as well as to support directed and genome-wide sequencing efforts. This unit describes a robust, large-scale procedure for generation of agarose gel-based clone fingerprints from BAC clones.
Collapse
Affiliation(s)
- Carrie A Mathewson
- Canada's Michael Smith Genome Sciences Center Vancouver, British Columbia, Canada
| | | | | |
Collapse
|
41
|
Chanderbali AS, Albert VA, Ashworth VETM, Clegg MT, Litz RE, Soltis DE, Soltis PS. Persea americana (avocado): bringing ancient flowers to fruit in the genomics era. Bioessays 2008; 30:386-96. [PMID: 18348249 DOI: 10.1002/bies.20721] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The avocado (Persea americana) is a major crop commodity worldwide. Moreover, avocado, a paleopolyploid, is an evolutionary "outpost" among flowering plants, representing a basal lineage (the magnoliid clade) near the origin of the flowering plants themselves. Following centuries of selective breeding, avocado germplasm has been characterized at the level of microsatellite and RFLP markers. Nonetheless, little is known beyond these general diversity estimates, and much work remains to be done to develop avocado as a major subtropical-zone crop. Among the goals of avocado improvement are to develop varieties with fruit that will "store" better on the tree, show uniform ripening and have better post-harvest storage. Avocado transcriptome sequencing, genome mapping and partial genomic sequencing will represent a major step toward the goal of sequencing the entire avocado genome, which is expected to aid in improving avocado varieties and production, as well as understanding the evolution of flowers from non-flowering seed plants (gymnosperms). Additionally, continued evolutionary and other comparative studies of flower and fruit development in different avocado strains can be accomplished at the gene expression level, including in comparison with avocado relatives, and these should provide important insights into the genetic regulation of fruit development in basal angiosperms.
Collapse
|
42
|
Choi JH, Kim S, Tang H, Andrews J, Gilbert DG, Colbourne JK. A machine-learning approach to combined evidence validation of genome assemblies. ACTA ACUST UNITED AC 2008; 24:744-50. [PMID: 18204064 DOI: 10.1093/bioinformatics/btm608] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
MOTIVATION While it is common to refer to 'the genome sequence' as if it were a single, complete and contiguous DNA string, it is in fact an assembly of millions of small, partially overlapping DNA fragments. Sophisticated computer algorithms (assemblers and scaffolders) merge these DNA fragments into contigs, and place these contigs into sequence scaffolds using the paired-end sequences derived from large-insert DNA libraries. Each step in this automated process is susceptible to producing errors; hence, the resulting draft assembly represents (in practice) only a likely assembly that requires further validation. Knowing which parts of the draft assembly are likely free of errors is critical if researchers are to draw reliable conclusions from the assembled sequence data. RESULTS We develop a machine-learning method to detect assembly errors in sequence assemblies. Several in silico measures for assembly validation have been proposed by various researchers. Using three benchmarking Drosophila draft genomes, we evaluate these techniques along with some new measures that we propose, including the good-minus-bad coverage (GMB), the good-to-bad-ratio (RGB), the average Z-score (AZ) and the average absolute Z-score (ASZ). Our results show that the GMB measure performs better than the others in both its sensitivity and its specificity for assembly error detection. Nevertheless, no single method performs sufficiently well to reliably detect genomic regions requiring attention for further experimental verification. To utilize the advantages of all these measures, we develop a novel machine learning approach that combines these individual measures to achieve a higher prediction accuracy (i.e. greater than 90%). Our combined evidence approach avoids the difficult and often ad hoc selection of many parameters the individual measures require, and significantly improves the overall precisions on the benchmarking data sets.
Collapse
Affiliation(s)
- Jeong-Hyeon Choi
- The Center for Genomics and Bioinformatics, School of Informatics and Department of Biology, Indiana University, IN 47405, USA
| | | | | | | | | | | |
Collapse
|
43
|
Han Y, Gasic K, Korban SS. Multiple-copy cluster-type organization and evolution of genes encoding O-methyltransferases in the apple. Genetics 2007; 176:2625-35. [PMID: 17717198 PMCID: PMC1950660 DOI: 10.1534/genetics.107.073650] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Plant O-methyltransferases (OMTs) play important roles in secondary metabolism. Two clusters of genes coding for caffeic acid OMT (COMT) have been identified in the apple genome. Three genes from one cluster and two genes from another cluster were isolated. These five genes encoding COMT, designated Mdomt1-Mdomt5 (GenBank accession nos. DQ886018-DQ886022), were distinguished by a (CT)(n) microsatellite in the 5'-UTR and two transposon-like sequences present in the promoter region and intron 1, respectively. The transposon-like sequence in intron 1 unambiguously traced the five Mdomt genes in the apple to a common ancestor. The ancestor must have undergone an initial duplication generating two progenitors, and this was followed by further duplication of these progenitors resulting in the two clusters identified in this study. The distal regions of the transposon-like sequences in promoter regions of Mdomt genes are capable of forming palindromic hairpin-like structures. The hairpin formation is likely responsible for nucleotide sequence differences observed in the promoter regions of these genes as it plays a destabilizing role in eukaryotic chromosomes. In addition, the possible mechanism of amplification of Mdomt genes in the apple genome is also discussed.
Collapse
Affiliation(s)
- Yuepeng Han
- Department of Natural Resources and Environmental Sciences, University of Illinois, Urbana, IL 61801, USA
| | | | | |
Collapse
|
44
|
A transgenomic cytogenetic sorghum (Sorghum propinquum) bacterial artificial chromosome fluorescence in situ hybridization map of maize (Zea mays L.) pachytene chromosome 9, evidence for regions of genome hyperexpansion. Genetics 2007; 177:1509-26. [PMID: 17947405 DOI: 10.1534/genetics.107.080846] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A cytogenetic FISH map of maize pachytene-stage chromosome 9 was produced with 32 maize marker-selected sorghum BACs as probes. The genetically mapped markers used are distributed along the linkage maps at an average spacing of 5 cM. Each locus was mapped by means of multicolor direct FISH with a fluorescently labeled probe mix containing a whole-chromosome paint, a single sorghum BAC clone, and the centromeric sequence, CentC. A maize-chromosome-addition line of oat was used for bright unambiguous identification of the maize 9 fiber within pachytene chromosome spreads. The locations of the sorghum BAC-FISH signals were determined, and each new cytogenetic locus was assigned a centiMcClintock position on the short (9S) or long (9L) arm. Nearly all of the markers appeared in the same order on linkage and cytogenetic maps but at different relative positions on the two. The CentC FISH signal was localized between cdo17 (at 9L.03) and tda66 (at 9S.03). Several regions of genome hyperexpansion on maize chromosome 9 were found by comparative analysis of relative marker spacing in maize and sorghum. This transgenomic cytogenetic FISH map creates anchors between various maps of maize and sorghum and creates additional tools and information for understanding the structure and evolution of the maize genome.
Collapse
|
45
|
Li Y, Uhm T, Ren C, Wu C, Santos TS, Lee MK, Yan B, Santos F, Zhang A, Scheuring C, Sanchez A, Millena AC, Nguyen HT, Kou H, Liu D, Zhang HB. A plant-transformation-competent BIBAC/BAC-based map of rice for functional analysis and genetic engineering of its genomic sequence. Genome 2007; 50:278-88. [PMID: 17502901 DOI: 10.1139/g07-006] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Sequencing of the rice genome has provided a platform for functional genomics research of rice and other cereal species. However, multiple approaches are needed to determine the functions of its genes and sequences and to use the genome sequencing results for genetic improvement of cereal crops. Here, we report a plant-transformation-competent, binary bacterial artificial chromosome (BIBAC) and bacterial artificial chromosome (BAC) based map of rice to facilitate these studies. The map was constructed from 20 835 BIBAC and BAC clones, and consisted of 579 overlapping BIBAC/BAC contigs. To facilitate functional analysis of chromosome 8 genomic sequence and cloning of the genes and QTLs mapped to the chromosome, we anchored the chromosomal contigs to the existing rice genetic maps. The chromosomal map consists of 11 contigs, 59 genetic markers, and 36 sequence tagged sites, spanning a total of ca. 38 Mb in physical length. Comparative analysis between the genetic and physical maps of chromosome 8 showed that there are 3 "hot" and 2 "cold" spots of genetic recombination along the chromosomal arms in addition to the "cold spot" in the centromeric region, suggesting that the sequence component contents of a chromosome may affect its local genetic recombination frequencies. Because of its plant transformability, the BIBAC/BAC map could provide a platform for functional analysis of the rice genome sequence and effective use of the sequencing results for gene and QTL cloning and molecular breeding.
Collapse
Affiliation(s)
- Yaning Li
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843-2474, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Troggio M, Malacarne G, Coppola G, Segala C, Cartwright DA, Pindo M, Stefanini M, Mank R, Moroldo M, Morgante M, Grando MS, Velasco R. A dense single-nucleotide polymorphism-based genetic linkage map of grapevine (Vitis vinifera L.) anchoring Pinot Noir bacterial artificial chromosome contigs. Genetics 2007; 176:2637-50. [PMID: 17603124 PMCID: PMC1950661 DOI: 10.1534/genetics.106.067462] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2006] [Accepted: 06/14/2007] [Indexed: 11/18/2022] Open
Abstract
The construction of a dense genetic map for Vitis vinifera and its anchoring to a BAC-based physical map is described: it includes 994 loci mapped onto 19 linkage groups, corresponding to the basic chromosome number of Vitis. Spanning 1245 cM with an average distance of 1.3 cM between adjacent markers, the map was generated from the segregation of 483 single-nucleotide polymorphism (SNP)-based genetic markers, 132 simple sequence repeats (SSRs), and 379 AFLP markers in a mapping population of 94 F(1) individuals derived from a V. vinifera cross of the cultivars Syrah and Pinot Noir. Of these markers, 623 were anchored to 367 contigs that are included in a physical map produced from the same clone of Pinot Noir and covering 352 Mbp. On the basis of contigs containing two or more genetically mapped markers, region-dependent estimations of physical and recombinational distances are presented. The markers used in this study include 118 SSRs common to an integrated map derived from five segregating populations of V. vinifera. The positions of these SSR markers in the two maps are conserved across all Vitis linkage groups. The addition of SNP-based markers introduces polymorphisms that are easy to database, are useful for evolutionary studies, and significantly increase the density of the map. The map provides the most comprehensive view of the Vitis genome reported to date and will be relevant for future studies on structural and functional genomics and genetic improvement.
Collapse
Affiliation(s)
- Michela Troggio
- IASMA Research Center, Via E. Mach 1, 38010 San Michele all'Adige (TN), Italy.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Xu P, Wang S, Liu L, Thorsen J, Kucuktas H, Liu Z. A BAC-based physical map of the channel catfish genome. Genomics 2007; 90:380-8. [PMID: 17582737 DOI: 10.1016/j.ygeno.2007.05.008] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2007] [Revised: 05/11/2007] [Accepted: 05/16/2007] [Indexed: 01/12/2023]
Abstract
Catfish is the major aquaculture species in the United States. To enhance its genome studies involving genetic linkage and comparative mapping, a bacterial artificial chromosome (BAC) contig-based physical map of the channel catfish (Ictalurus punctatus) genome was generated using four-color fluorescence-based fingerprints. Fingerprints of 34,580 BAC clones (5.6x genome coverage) were generated for the FPC assembly of the BAC contigs. A total of 3307 contigs were assembled using a cutoff value of 1x10(-20). Each contig contains an average of 9.25 clones with an average size of 292 kb. The combined contig size for all contigs was 0.965 Gb, approximately the genome size of the channel catfish. The reliability of the contig assembly was assessed by both hybridization of gene probes to BAC clones contained in the fingerprinted assembly and validation of randomly selected contigs using overgo probes designed from BAC end sequences. The presented physical map should greatly enhance genome research in the catfish, particularly aiding in the identification of genomic regions containing genes underlying important performance traits.
Collapse
Affiliation(s)
- Peng Xu
- The Fish Molecular Genetics and Biotechnology Laboratory, Department of Fisheries and Allied Aquacultures, and Program of Cell and Molecular Biosciences, Aquatic Genomics Unit, Auburn University, Auburn, AL 36849, USA
| | | | | | | | | | | |
Collapse
|
48
|
Kelleher CT, Chiu R, Shin H, Bosdet IE, Krzywinski MI, Fjell CD, Wilkin J, Yin T, DiFazio SP, Ali J, Asano JK, Chan S, Cloutier A, Girn N, Leach S, Lee D, Mathewson CA, Olson T, O'connor K, Prabhu AL, Smailus DE, Stott JM, Tsai M, Wye NH, Yang GS, Zhuang J, Holt RA, Putnam NH, Vrebalov J, Giovannoni JJ, Grimwood J, Schmutz J, Rokhsar D, Jones SJM, Marra MA, Tuskan GA, Bohlmann J, Ellis BE, Ritland K, Douglas CJ, Schein JE. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2007; 50:1063-78. [PMID: 17488239 DOI: 10.1111/j.1365-313x.2007.03112.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 +/- 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.
Collapse
Affiliation(s)
- Colin T Kelleher
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Kim H, San Miguel P, Nelson W, Collura K, Wissotski M, Walling JG, Kim JP, Jackson SA, Soderlund C, Wing RA. Comparative physical mapping between Oryza sativa (AA genome type) and O. punctata (BB genome type). Genetics 2007; 176:379-90. [PMID: 17339227 PMCID: PMC1893071 DOI: 10.1534/genetics.106.068783] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2006] [Accepted: 02/09/2007] [Indexed: 11/18/2022] Open
Abstract
A comparative physical map of the AA genome (Oryza sativa) and the BB genome (O. punctata) was constructed by aligning a physical map of O. punctata, deduced from 63,942 BAC end sequences (BESs) and 34,224 fingerprints, onto the O. sativa genome sequence. The level of conservation of each chromosome between the two species was determined by calculating a ratio of BES alignments. The alignment result suggests more divergence of intergenic and repeat regions in comparison to gene-rich regions. Further, this characteristic enabled localization of heterochromatic and euchromatic regions for each chromosome of both species. The alignment identified 16 locations containing expansions, contractions, inversions, and transpositions. By aligning 40% of the punctata BES on the map, 87% of the punctata FPC map covered 98% of the O. sativa genome sequence. The genome size of O. punctata was estimated to be 8% larger than that of O. sativa with individual chromosome differences of 1.5-16.5%. The sum of expansions and contractions observed in regions >500 kb were similar, suggesting that most of the contractions/expansions contributing to the genome size difference between the two species are small, thus preserving the macro-collinearity between these species, which diverged approximately 2 million years ago.
Collapse
Affiliation(s)
- HyeRan Kim
- Arizona Genomics Institute, University of Arizona, Tucson, Arizona 85721, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Wendl MC. Algebraic correction methods for computational assessment of clone overlaps in DNA fingerprint mapping. BMC Bioinformatics 2007; 8:127. [PMID: 17442113 PMCID: PMC1868038 DOI: 10.1186/1471-2105-8-127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2007] [Accepted: 04/18/2007] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The Sulston score is a well-established, though approximate metric for probabilistically evaluating postulated clone overlaps in DNA fingerprint mapping. It is known to systematically over-predict match probabilities by various orders of magnitude, depending upon project-specific parameters. Although the exact probability distribution is also available for the comparison problem, it is rather difficult to compute and cannot be used directly in most cases. A methodology providing both improved accuracy and computational economy is required. RESULTS We propose a straightforward algebraic correction procedure, which takes the Sulston score as a provisional value and applies a power-law equation to obtain an improved result. Numerical comparisons indicate dramatically increased accuracy over the range of parameters typical of traditional agarose fingerprint mapping. Issues with extrapolating the method into parameter ranges characteristic of newer capillary electrophoresis-based projects are also discussed. CONCLUSION Although only marginally more expensive to compute than the raw Sulston score, the correction provides a vastly improved probabilistic description of hypothesized clone overlaps. This will clearly be important in overlap assessment and perhaps for other tasks as well, for example in using the ranking of overlap probabilities to assist in clone ordering.
Collapse
Affiliation(s)
- Michael C Wendl
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA.
| |
Collapse
|