1
|
Meng A, Li X, Li Z, Miao F, Ma L, Li S, Sun W, Huang J, Yang G. Genome assembly of Melilotus officinalis provides a new reference genome for functional genomics. BMC Genom Data 2024; 25:37. [PMID: 38637749 PMCID: PMC11025269 DOI: 10.1186/s12863-024-01224-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 04/10/2024] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND Sweet yellow clover (Melilotus officinalis) is a diploid plant (2n = 16) that is native to Europe. It is an excellent legume forage. It can both fix nitrogen and serve as a medicine. A genome assembly of Melilotus officinalis that was collected from Best corporation in Beijing is available based on Nanopore sequencing. The genome of Melilotus officinalis was sequenced, assembled, and annotated. RESULTS The latest PacBio third generation HiFi assembly and sequencing strategies were used to produce a Melilotus officinalis genome assembly size of 1,066 Mbp, contig N50 = 5 Mbp, scaffold N50 = 130 Mbp, and complete benchmarking universal single-copy orthologs (BUSCOs) = 96.4%. This annotation produced 47,873 high-confidence gene models, which will substantially aid in our research on molecular breeding. A collinear analysis showed that Melilotus officinalis and Medicago truncatula shared conserved synteny. The expansion and contraction of gene families showed that Melilotus officinalis expanded by 565 gene families and shrank by 56 gene families. The contacted gene families were associated with response to stimulus, nucleotide binding, and small molecule binding. Thus, it is related to a family of genes associated with peptidase activity, which could lead to better stress tolerance in plants. CONCLUSIONS In this study, the latest PacBio technology was used to assemble and sequence the genome of the Melilotus officinalis and annotate its protein-coding genes. These results will expand the genomic resources available for Melilotus officinalis and should assist in subsequent research on sweet yellow clover plants.
Collapse
Affiliation(s)
- Aoran Meng
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Xinru Li
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Zhiguang Li
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Fuhong Miao
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Lichao Ma
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Shuo Li
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Wenfei Sun
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | | | - Guofeng Yang
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China.
| |
Collapse
|
2
|
Guiglielmoni N, Villegas LI, Kirangwa J, Schiffer PH. Revisiting genomes of non-model species with long reads yields new insights into their biology and evolution. Front Genet 2024; 15:1308527. [PMID: 38384712 PMCID: PMC10879605 DOI: 10.3389/fgene.2024.1308527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 01/04/2024] [Indexed: 02/23/2024] Open
Abstract
High-quality genomes obtained using long-read data allow not only for a better understanding of heterozygosity levels, repeat content, and more accurate gene annotation and prediction when compared to those obtained with short-read technologies, but also allow to understand haplotype divergence. Advances in long-read sequencing technologies in the last years have made it possible to produce such high-quality assemblies for non-model organisms. This allows us to revisit genomes, which have been problematic to scaffold to chromosome-scale with previous generations of data and assembly software. Nematoda, one of the most diverse and speciose animal phyla within metazoans, remains poorly studied, and many previously assembled genomes are fragmented. Using long reads obtained with Nanopore R10.4.1 and PacBio HiFi, we generated highly contiguous assemblies of a diploid nematode of the Mermithidae family, for which no closely related genomes are available to date, as well as a collapsed assembly and a phased assembly for a triploid nematode from the Panagrolaimidae family. Both genomes had been analysed before, but the fragmented assemblies had scaffold sizes comparable to the length of long reads prior to assembly. Our new assemblies illustrate how long-read technologies allow for a much better representation of species genomes. We are now able to conduct more accurate downstream assays based on more complete gene and transposable element predictions.
Collapse
|
3
|
Nestor BJ, Bayer PE, Fernandez CGT, Edwards D, Finnegan PM. Approaches to increase the validity of gene family identification using manual homology search tools. Genetica 2023; 151:325-338. [PMID: 37817002 PMCID: PMC10692271 DOI: 10.1007/s10709-023-00196-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/01/2023] [Indexed: 10/12/2023]
Abstract
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Collapse
Affiliation(s)
- Benjamin J Nestor
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia.
| | - Philipp E Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Cassandria G Tay Fernandez
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Patrick M Finnegan
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| |
Collapse
|
4
|
Chen X, Wang Z, Zhang C, Hu J, Lu Y, Zhou H, Mei Y, Cong Y, Guo F, Wang Y, He K, Liu Y, Li F. Unraveling the complex evolutionary history of lepidopteran chromosomes through ancestral chromosome reconstruction and novel chromosome nomenclature. BMC Biol 2023; 21:265. [PMID: 37981687 PMCID: PMC10658929 DOI: 10.1186/s12915-023-01762-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 11/06/2023] [Indexed: 11/21/2023] Open
Abstract
BACKGROUND Lepidoptera is one of the most species-rich animal groups, with substantial karyotype variations among species due to chromosomal rearrangements. Knowledge of the evolutionary patterns of lepidopteran chromosomes still needs to be improved. RESULTS Here, we used chromosome-level genome assemblies of 185 lepidopteran insects to reconstruct an ancestral reference genome and proposed a new chromosome nomenclature. Thus, we renamed over 5000 extant chromosomes with this system, revealing the historical events of chromosomal rearrangements and their features. Additionally, our findings indicate that, compared with autosomes, the Z chromosome in Lepidoptera underwent a fast loss of conserved genes, rapid acquisition of lineage-specific genes, and a low rate of gene duplication. Moreover, we presented evidence that all available 67 W chromosomes originated from a common ancestor chromosome, with four neo-W chromosomes identified, including one generated by fusion with an autosome and three derived through horizontal gene transfer. We also detected nearly 4000 inter-chromosomal gene movement events. Notably, Geminin is transferred from the autosome to the Z chromosome. When located on the autosome, Geminin shows female-biased expression, but on the Z chromosome, it exhibits male-biased expression. This contributes to the sexual dimorphism of body size in silkworms. CONCLUSIONS Our study sheds light on the complex evolutionary history of lepidopteran chromosomes based on ancestral chromosome reconstruction and novel chromosome nomenclature.
Collapse
Affiliation(s)
- Xi Chen
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Zuoqi Wang
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Chaowei Zhang
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jingheng Hu
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Yueqi Lu
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Hang Zhou
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Yang Mei
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Yuyang Cong
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Fangyuan Guo
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Yaqin Wang
- State Key Laboratory of Rice Biology, Institute of Biotechnology, Zhejiang University, Hangzhou, China
| | - Kang He
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Ying Liu
- Key Laboratory of Green Prevention and Control of Agricultural Transboundary Pests of Yunnan Province and Agricultural Environment/ Agriculture Environment and Resources Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Fei Li
- State Key Laboratory of Rice Biology & Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China.
| |
Collapse
|
5
|
Sato MP, Iwakami S, Fukunishi K, Sugiura K, Yasuda K, Isobe S, Shirasawa K. Telomere-to-telomere genome assembly of an allotetraploid pernicious weed, Echinochloa phyllopogon. DNA Res 2023; 30:dsad023. [PMID: 37943179 PMCID: PMC10634394 DOI: 10.1093/dnares/dsad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 09/27/2023] [Accepted: 10/25/2023] [Indexed: 11/10/2023] Open
Abstract
Echinochloa phyllopogon is an allotetraploid pernicious weed species found in rice fields worldwide that often exhibit resistance to multiple herbicides. An accurate genome sequence is essential to comprehensively understand the genetic basis underlying the traits of this species. Here, the telomere-to-telomere genome sequence of E. phyllopogon was presented. Eighteen chromosome sequences spanning 1.0 Gb were constructed using the PacBio highly fidelity long technology. Of the 18 chromosomes, 12 sequences were entirely assembled into telomere-to-telomere and gap-free contigs, whereas the remaining six sequences were constructed at the chromosomal level with only eight gaps. The sequences were assigned to the A and B genome with total lengths of 453 and 520 Mb, respectively. Repetitive sequences occupied 42.93% of the A genome and 48.47% of the B genome, although 32,337, and 30,889 high-confidence genes were predicted in the A and B genomes, respectively. This suggested that genome extensions and gene disruptions caused by repeated sequence accumulation often occur in the B genome before polyploidization to establish a tetraploid genome. The highly accurate and comprehensive genome sequence could be a milestone in understanding the molecular mechanisms of the pernicious traits and in developing effective weed control strategies to avoid yield loss in rice production.
Collapse
Affiliation(s)
- Mitsuhiko P Sato
- Department of Frontier Research and Development, Kazusa DNA Research Institute, Chiba 292-0818, Japan
| | - Satoshi Iwakami
- Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan
| | - Kanade Fukunishi
- Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan
| | - Kai Sugiura
- Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan
| | - Kentaro Yasuda
- Agri-Innovation Education and Research Center, Akita Prefectural University, Akita 010-0451, Japan
| | - Sachiko Isobe
- Department of Frontier Research and Development, Kazusa DNA Research Institute, Chiba 292-0818, Japan
| | - Kenta Shirasawa
- Department of Frontier Research and Development, Kazusa DNA Research Institute, Chiba 292-0818, Japan
| |
Collapse
|
6
|
Wang J, Veldsman WP, Fang X, Huang Y, Xie X, Lyu A, Zhang L. Benchmarking multi-platform sequencing technologies for human genome assembly. Brief Bioinform 2023; 24:bbad300. [PMID: 37594299 DOI: 10.1093/bib/bbad300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 07/12/2023] [Accepted: 07/26/2023] [Indexed: 08/19/2023] Open
Abstract
Genome assembly is a computational technique that involves piecing together deoxyribonucleic acid (DNA) fragments generated by sequencing technologies to create a comprehensive and precise representation of the entire genome. Generating a high-quality human reference genome is a crucial prerequisite for comprehending human biology, and it is also vital for downstream genomic variation analysis. Many efforts have been made over the past few decades to create a complete and gapless reference genome for humans by using a diverse range of advanced sequencing technologies. Several available tools are aimed at enhancing the quality of haploid and diploid human genome assemblies, which include contig assembly, polishing of contig errors, scaffolding and variant phasing. Selecting the appropriate tools and technologies remains a daunting task despite several studies have investigated the pros and cons of different assembly strategies. The goal of this paper was to benchmark various strategies for human genome assembly by combining sequencing technologies and tools on two publicly available samples (NA12878 and NA24385) from Genome in a Bottle. We then compared their performances in terms of continuity, accuracy, completeness, variant calling and phasing. We observed that PacBio HiFi long-reads are the optimal choice for generating an assembly with low base errors. On the other hand, we were able to produce the most continuous contigs with Oxford Nanopore long-reads, but they may require further polishing to improve on quality. We recommend using short-reads rather than long-reads themselves to improve the base accuracy of contigs from Oxford Nanopore long-reads. Hi-C is the best choice for chromosome-level scaffolding because it can capture the longest-range DNA connectedness compared to 10× linked-reads and Bionano optical maps. However, a combination of multiple technologies can be used to further improve the quality and completeness of genome assembly. For diploid assembly, hifiasm is the best tool for human diploid genome assembly using PacBio HiFi and Hi-C data. Looking to the future, we expect that further advancements in human diploid assemblers will leverage the power of PacBio HiFi reads and other technologies with long-range DNA connectedness to enable the generation of high-quality, chromosome-level and haplotype-resolved human genome assemblies.
Collapse
Affiliation(s)
- Jingjing Wang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Werner Pieter Veldsman
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | | | | | | | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
| |
Collapse
|
7
|
Chen J, Wang Z, Tan K, Huang W, Shi J, Li T, Hu J, Wang K, Wang C, Xin B, Zhao H, Song W, Hufford MB, Schnable JC, Jin W, Lai J. A complete telomere-to-telomere assembly of the maize genome. Nat Genet 2023:10.1038/s41588-023-01419-6. [PMID: 37322109 DOI: 10.1038/s41588-023-01419-6] [Citation(s) in RCA: 43] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 05/05/2023] [Indexed: 06/17/2023]
Abstract
A complete telomere-to-telomere (T2T) finished genome has been the long pursuit of genomic research. Through generating deep coverage ultralong Oxford Nanopore Technology (ONT) and PacBio HiFi reads, we report here a complete genome assembly of maize with each chromosome entirely traversed in a single contig. The 2,178.6 Mb T2T Mo17 genome with a base accuracy of over 99.99% unveiled the structural features of all repetitive regions of the genome. There were several super-long simple-sequence-repeat arrays having consecutive thymine-adenine-guanine (TAG) tri-nucleotide repeats up to 235 kb. The assembly of the entire nucleolar organizer region of the 26.8 Mb array with 2,974 45S rDNA copies revealed the enormously complex patterns of rDNA duplications and transposon insertions. Additionally, complete assemblies of all ten centromeres enabled us to precisely dissect the repeat compositions of both CentC-rich and CentC-poor centromeres. The complete Mo17 genome represents a major step forward in understanding the complexity of the highly recalcitrant repetitive regions of higher plant genomes.
Collapse
Affiliation(s)
- Jian Chen
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Zijian Wang
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Kaiwen Tan
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Wei Huang
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Junpeng Shi
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Tong Li
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Jiang Hu
- Grandomics Biosciences, Wuhan, P. R. China
| | - Kai Wang
- Grandomics Biosciences, Wuhan, P. R. China
| | - Chao Wang
- Grandomics Biosciences, Wuhan, P. R. China
| | - Beibei Xin
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Haiming Zhao
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Weibin Song
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - James C Schnable
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Weiwei Jin
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China
| | - Jinsheng Lai
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, P. R. China.
- Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing, P. R. China.
- Sanya Institute of China Agricultural University, Sanya, P. R. China.
- Hainan Yazhou Bay Seed Laboratory, Sanya, P. R. China.
| |
Collapse
|
8
|
Shi X, Cao S, Wang X, Huang S, Wang Y, Liu Z, Liu W, Leng X, Peng Y, Wang N, Wang Y, Ma Z, Xu X, Zhang F, Xue H, Zhong H, Wang Y, Zhang K, Velt A, Avia K, Holtgräwe D, Grimplet J, Matus JT, Ware D, Wu X, Wang H, Liu C, Fang Y, Rustenholz C, Cheng Z, Xiao H, Zhou Y. The complete reference genome for grapevine ( Vitis vinifera L.) genetics and breeding. HORTICULTURE RESEARCH 2023; 10:uhad061. [PMID: 37213686 PMCID: PMC10199708 DOI: 10.1093/hr/uhad061] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/02/2023] [Indexed: 05/23/2023]
Abstract
Grapevine is one of the most economically important crops worldwide. However, the previous versions of the grapevine reference genome tipically consist of thousands of fragments with missing centromeres and telomeres, limiting the accessibility of the repetitive sequences, the centromeric and telomeric regions, and the study of inheritance of important agronomic traits in these regions. Here, we assembled a telomere-to-telomere (T2T) gap-free reference genome for the cultivar PN40024 using PacBio HiFi long reads. The T2T reference genome (PN_T2T) is 69 Mb longer with 9018 more genes identified than the 12X.v0 version. We annotated 67% repetitive sequences, 19 centromeres and 36 telomeres, and incorporated gene annotations of previous versions into the PN_T2T assembly. We detected a total of 377 gene clusters, which showed associations with complex traits, such as aroma and disease resistance. Even though PN40024 derives from nine generations of selfing, we still found nine genomic hotspots of heterozygous sites associated with biological processes, such as the oxidation-reduction process and protein phosphorylation. The fully annotated complete reference genome therefore constitutes an important resource for grapevine genetic studies and breeding programs.
Collapse
Affiliation(s)
| | | | | | - Siyang Huang
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- National Demonstration Center for Experimental Plant Science Education, College of Agriculture, Guangxi University, Nanning 530004, China
| | - Yue Wang
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- State Key Laboratory of Resource Insects, Southwest University, Chongqing 400715, China
| | - Zhongjie Liu
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Wenwen Liu
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Xiangpeng Leng
- College of Horticulture, Qingdao Agricultural University, Qingdao 266109, China
| | - Yanling Peng
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Nan Wang
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Yiwen Wang
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Zhiyao Ma
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Xiaodong Xu
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Fan Zhang
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Hui Xue
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Haixia Zhong
- Institute of Horticulture Crops, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China
| | - Yi Wang
- Beijing Key Laboratory of Grape Science and Enology, Institute of Botany, Chinese Academy of Sciences, Xiangshan, Beijing 100093, China
| | - Kekun Zhang
- College of Enology, Northwest A&F University, Yangling 712100, China
| | - Amandine Velt
- SVQV, INRAE - University of Strasbourg, 68000 Colmar, France
| | - Komlan Avia
- SVQV, INRAE - University of Strasbourg, 68000 Colmar, France
| | - Daniela Holtgräwe
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
| | - Jérôme Grimplet
- Unidad de Hortofruticultura, Centro de Investigación y Tecnología Agroalimentaria de Aragón (CITA), 50059 Zaragoza, Spain
| | - José Tomás Matus
- Institute for Integrative Systems Biology (I2SysBio), Systems Biotech Program, Universitat de València-CSIC, Paterna, 46908, Valencia, Spain
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
- USDA ARS NEA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
| | - Xinyu Wu
- Institute of Horticulture Crops, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China
| | - Haibo Wang
- Fruit Research Institute, Chinese Academy of Agricultural Sciences/Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (Germplasm Resources Utilization), Ministry of Agriculture/Key Laboratory of Mineral Nutrition and Fertilizers Efficient Utilization of Deciduous Fruit Tree, Liaoning Province, Xingcheng 125100, China
| | - Chonghuai Liu
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou 450004, China
| | - Yuling Fang
- College of Enology, Northwest A&F University, Yangling 712100, China
| | | | | | - Hua Xiao
- Corresponding authors: E-mail: ; ; ;
| | | |
Collapse
|
9
|
Nowoshilow S, Tanaka EM. Navigation and Use of Custom Tracks within the Axolotl Genome Browser. Methods Mol Biol 2023; 2562:273-289. [PMID: 36272083 DOI: 10.1007/978-1-0716-2659-7_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The availability of the chromosome-scale axolotl genome sequences has made it possible to explore genome evolution, perform cross-species comparisons, and use additional sequencing data to analyze both genome-wide features and individual genes. Here, we will focus on the UCSC genome browser and demonstrate in a step-by-step manner how to use it to integrate different data to approach a broad question of the Fgf8 locus evolution and analyze the neighborhood of a gene that was reported missing in axolotl - Pax3.
Collapse
Affiliation(s)
| | - Elly M Tanaka
- Research Institute of Molecular Pathology, Vienna, Austria.
| |
Collapse
|
10
|
Blackman C, Subramaniam R. A Bioinformatic Guide to Identify Protein Effectors from Phytopathogens. Methods Mol Biol 2023; 2659:95-101. [PMID: 37249888 DOI: 10.1007/978-1-0716-3159-1_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Phytopathogenic fungi are a diverse and widespread group that has a significant detrimental impact on crops with an estimated annual average loss of 15% worldwide. Understanding the interaction between host plants and pathogenic fungi is critical to delineate underlying mechanisms of plant defense to mitigate agricultural losses. Fungal pathogens utilize suites of secreted molecules, called effectors, to modulate plant metabolism and immune response to overcome host defenses and promote colonization. Effectors come in many flavors including proteinaceous products, small RNAs, and metabolites such as mycotoxins. This review will focus on methods for identifying protein effectors from fungi. Excellent reviews have been published to identify secondary metabolites and small RNAs from fungi and therefore will not be part of this review.
Collapse
Affiliation(s)
- Christopher Blackman
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Rajagopal Subramaniam
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, Canada.
| |
Collapse
|
11
|
Papa Y, Wellenreuther M, Morrison MA, Ritchie PA. Genome assembly and isoform analysis of a highly heterozygous New Zealand fisheries species, the tarakihi (Nemadactylus macropterus). G3 (BETHESDA, MD.) 2022; 13:6883520. [PMID: 36477875 PMCID: PMC9911067 DOI: 10.1093/g3journal/jkac315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/01/2022] [Accepted: 11/08/2022] [Indexed: 12/14/2022]
Abstract
Although being some of the most valuable and heavily exploited wild organisms, few fisheries species have been studied at the whole-genome level. This is especially the case in New Zealand, where genomics resources are urgently needed to assist fisheries management. Here, we generated 55 Gb of short Illumina reads (92× coverage) and 73 Gb of long Nanopore reads (122×) to produce the first genome assembly of the marine teleost tarakihi [Nemadactylus macropterus (Forster, 1801)], a highly valuable fisheries species in New Zealand. An additional 300 Mb of Iso-Seq reads were obtained to assist in gene annotation. The final genome assembly was 568 Mb long with an N50 of 3.37 Mb. The genome completeness was high, with 97.8% of complete Actinopterygii Benchmarking Universal Single-Copy Orthologs. Heterozygosity values estimated through k-mer counting (1.00%) and bi-allelic SNPs (0.64%) were high compared with the same values reported for other fishes. Iso-Seq analysis recovered 91,313 unique transcripts from 15,515 genes (mean ratio of 5.89 transcripts per gene), and the most common alternative splicing event was intron retention. This highly contiguous genome assembly and the isoform-resolved transcriptome will provide a useful resource to assist the study of population genomics and comparative eco-evolutionary studies in teleosts and related organisms.
Collapse
Affiliation(s)
- Yvan Papa
- School of Biological Sciences, Victoria University of Wellington, Wellington 6012, New Zealand
| | - Maren Wellenreuther
- Seafood Production Group, The New Zealand Institute for Plant and Food Research Limited, Nelson 7010, New Zealand,School of Biological Sciences, The University of Auckland, Auckland 1010, New Zealand
| | - Mark A Morrison
- National Institute of Water and Atmospheric Research, Auckland 1010, New Zealand
| | - Peter A Ritchie
- Corresponding author: Te Toki A Rata, Gate 7, Kelburn Parade, Wellington 6012, New Zealand.
| |
Collapse
|
12
|
Guo L, Yao H, Chen W, Wang X, Ye P, Xu Z, Zhang S, Wu H. Natural products of medicinal plants: biosynthesis and bioengineering in post-genomic era. HORTICULTURE RESEARCH 2022; 9:uhac223. [PMID: 36479585 PMCID: PMC9720450 DOI: 10.1093/hr/uhac223] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 09/22/2022] [Indexed: 06/01/2023]
Abstract
Globally, medicinal plant natural products (PNPs) are a major source of substances used in traditional and modern medicine. As we human race face the tremendous public health challenge posed by emerging infectious diseases, antibiotic resistance and surging drug prices etc., harnessing the healing power of medicinal plants gifted from mother nature is more urgent than ever in helping us survive future challenge in a sustainable way. PNP research efforts in the pre-genomic era focus on discovering bioactive molecules with pharmaceutical activities, and identifying individual genes responsible for biosynthesis. Critically, systemic biological, multi- and inter-disciplinary approaches integrating and interrogating all accessible data from genomics, metabolomics, structural biology, and chemical informatics are necessary to accelerate the full characterization of biosynthetic and regulatory circuitry for producing PNPs in medicinal plants. In this review, we attempt to provide a brief update on the current research of PNPs in medicinal plants by focusing on how different state-of-the-art biotechnologies facilitate their discovery, the molecular basis of their biosynthesis, as well as synthetic biology. Finally, we humbly provide a foresight of the research trend for understanding the biology of medicinal plants in the coming decades.
Collapse
Affiliation(s)
- Li Guo
- Corresponding authors. E-mails: ;
| | | | | | - Xumei Wang
- School of Pharmacy, Xi’an Jiaotong University, Xi’an 710061, China
| | - Peng Ye
- State Key laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangdong Laboratory For Lingnan Modern Agriculture, College of Life Sciences, South China Agricultural University, Guangzhou 510642, China
| | - Zhichao Xu
- College of Life Science, Northeast Forestry University, Harbin 150040, China
| | - Sisheng Zhang
- State Key laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangdong Laboratory For Lingnan Modern Agriculture, College of Life Sciences, South China Agricultural University, Guangzhou 510642, China
| | - Hong Wu
- Corresponding authors. E-mails: ;
| |
Collapse
|
13
|
Ko BJ, Lee C, Kim J, Rhie A, Yoo DA, Howe K, Wood J, Cho S, Brown S, Formenti G, Jarvis ED, Kim H. Widespread false gene gains caused by duplication errors in genome assemblies. Genome Biol 2022; 23:205. [PMID: 36167596 PMCID: PMC9516828 DOI: 10.1186/s13059-022-02764-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/02/2022] [Indexed: 12/22/2022] Open
Abstract
Background False duplications in genome assemblies lead to false biological conclusions. We quantified false duplications in popularly used previous genome assemblies for platypus, zebra finch, and Anna’s Hummingbird, and their new counterparts of the same species generated by the Vertebrate Genomes Project, of which the Vertebrate Genomes Project pipeline attempted to eliminate false duplications through haplotype phasing and purging. These assemblies are among the first generated by the Vertebrate Genomes Project where there was a prior chromosomal level reference assembly to compare with. Results Whole genome alignments revealed that 4 to 16% of the sequences are falsely duplicated in the previous assemblies, impacting hundreds to thousands of genes. These lead to overestimated gene family expansions. The main source of the false duplications is heterotype duplications, where the haplotype sequences were relatively more divergent than other parts of the genome leading the assembly algorithms to classify them as separate genes or genomic regions. A minor source is sequencing errors. Ancient ATP nucleotide binding gene families have a higher prevalence of false duplications compared to other gene families. Although present in a smaller proportion, we observe false duplications remaining in the Vertebrate Genomes Project assemblies that can be identified and purged. Conclusions This study highlights the need for more advanced assembly methods that better separate haplotypes and sequence errors, and the need for cautious analyses on gene gains. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02764-1.
Collapse
Affiliation(s)
- Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, USA
| | - Dong Ahn Yoo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | | | | | - Seoae Cho
- eGnome, Inc, Seoul, Republic of Korea
| | - Samara Brown
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Giulio Formenti
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Erich D Jarvis
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA. .,Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - Heebal Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea. .,eGnome, Inc, Seoul, Republic of Korea.
| |
Collapse
|
14
|
Shi Y, Chen B, Kong S, Zeng Q, Li L, Liu B, Pu F, Xu P. Comparative genomics analysis and genome assembly integration with the recombination landscape contribute to Takifugu bimaculatus assembly refinement. Gene 2022; 849:146910. [PMID: 36167181 DOI: 10.1016/j.gene.2022.146910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 09/13/2022] [Accepted: 09/19/2022] [Indexed: 11/28/2022]
Abstract
Takifugu genus has been brought to the fore in scientific and practical research due to its compact genome, explosive speciation progress and economic value. Here we updated the chromosome-level genome of Takifugu bimaculatus by an ultra-high-density linkage map, a classic and accurate way of chromosome assembly. The map constituted a robust assembly frame, with 92.2% (372.77 Mb) of the draft genome cumulatively placed. With intraspecies and interspecies comparative genomic analysis, we developed a criterion to quantify the differences between assemblies and established a novel way to integrate information from multiple assemblies. The integrated assembly rectified potential mis-assemblies, greatly improving the genome contiguity and correctness. Our results rendered profound information on the genetic recombination of T. bimaculatus and provided new insights into effective genome assembly. The consolidated assembly will be a contributory tool of T. bimaculatus and broadly across the Takifugu by providing a convincing reference for genomic research.
Collapse
Affiliation(s)
- Yue Shi
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Baohua Chen
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Shengnan Kong
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Qingmin Zeng
- Fisheries Research Institute of Fujian, Xiamen 361000, China
| | - Leibin Li
- Fisheries Research Institute of Fujian, Xiamen 361000, China
| | - Bo Liu
- Fisheries Research Institute of Fujian, Xiamen 361000, China
| | - Fei Pu
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Peng Xu
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China; Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China.
| |
Collapse
|
15
|
Drown MK, DeLiberto AN, Flack N, Doyle M, Westover AG, Proefrock JC, Heilshorn S, D’Alessandro E, Crawford DL, Faulk C, Oleksiak MF. Sequencing Bait: Nuclear and Mitogenome Assembly of an Abundant Coastal Tropical and Subtropical Fish, Atherinomorus stipes. Genome Biol Evol 2022; 14:6648392. [PMID: 35866575 PMCID: PMC9348626 DOI: 10.1093/gbe/evac111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/13/2022] [Indexed: 02/01/2023] Open
Abstract
Genetic data from nonmodel species can inform ecology and physiology, giving insight into a species' distribution and abundance as well as their responses to changing environments, all of which are important for species conservation and management. Moreover, reduced sequencing costs and improved long-read sequencing technology allows researchers to readily generate genomic resources for nonmodel species. Here, we apply Oxford Nanopore long-read sequencing and low-coverage (∼1x) whole genome short-read sequencing technology (Illumina) to assemble a genome and examine population genetics of an abundant tropical and subtropical fish, the hardhead silverside (Atherinomorus stipes). These fish are found in shallow coastal waters and are frequently included in ecological models because they serve as abundant prey for commercially and ecologically important species. Despite their importance in sub-tropical and tropical ecosystems, little is known about their population connectivity and genetic diversity. Our A. stipes genome assembly is about 1.2 Gb with comparable repetitive element content (∼47%), number of protein duplication events, and DNA methylation patterns to other teleost fish species. Among five sampled populations spanning 43 km of South Florida and the Florida Keys, we find little population structure suggesting high population connectivity.
Collapse
Affiliation(s)
| | | | - Nicole Flack
- Department of Veterinary and Biomedical Sciences, University of Minnesota, Minnesota, USA
| | - Meghan Doyle
- The Rosenstiel School, University of Miami, Florida, USA
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Liu SC, Ju YR, Lu CL. Multi-CSAR: a web server for scaffolding contigs using multiple reference genomes. Nucleic Acids Res 2022; 50:W500-W509. [PMID: 35524553 PMCID: PMC9252826 DOI: 10.1093/nar/gkac301] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/09/2022] [Accepted: 04/15/2022] [Indexed: 11/12/2022] Open
Abstract
Multi-CSAR is a web server that can efficiently and more accurately order and orient the contigs in the assembly of a target genome into larger scaffolds based on multiple reference genomes. Given a target genome and multiple reference genomes, Multi-CSAR first identifies sequence markers shared between the target genome and each reference genome, then utilizes these sequence markers to compute a scaffold for the target genome based on each single reference genome, and finally combines all the single reference-derived scaffolds into a multiple reference-derived scaffold. To run Multi-CSAR, the users need to upload a target genome to be scaffolded and one or more reference genomes in multi-FASTA format. The users can also choose to use the ‘weighting scheme of reference genomes’ for Multi-CSAR to automatically calculate different weights for the reference genomes and choose either ‘NUCmer on nucleotides’ or ‘PROmer on translated amino acids’ for Multi-CSAR to identify sequence markers. In the output page, Multi-CSAR displays its multiple reference-derived scaffold in two graphical representations (i.e. Circos plot and dotplot) for the users to visually validate the correctness of scaffolded contigs and in a tabular representation to further validate the scaffold in detail. Multi-CSAR is available online at http://genome.cs.nthu.edu.tw/Multi-CSAR/.
Collapse
Affiliation(s)
- Shu-Cheng Liu
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan
| | - Yan-Ru Ju
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan
| | - Chin Lung Lu
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan
| |
Collapse
|
17
|
Walve R, Salmela L. HGGA: hierarchical guided genome assembler. BMC Bioinformatics 2022; 23:167. [PMID: 35525918 PMCID: PMC9077837 DOI: 10.1186/s12859-022-04701-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 04/25/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND De novo genome assembly typically produces a set of contigs instead of the complete genome. Thus additional data such as genetic linkage maps, optical maps, or Hi-C data is needed to resolve the complete structure of the genome. Most of the previous work uses the additional data to order and orient contigs. RESULTS Here we introduce a framework to guide genome assembly with additional data. Our approach is based on clustering the reads, such that each read in each cluster originates from nearby positions in the genome according to the additional data. These sets are then assembled independently and the resulting contigs are further assembled in a hierarchical manner. We implemented our approach for genetic linkage maps in a tool called HGGA. CONCLUSIONS Our experiments on simulated and real Pacific Biosciences long reads and genetic linkage maps show that HGGA produces a more contiguous assembly with less contigs and from 1.2 to 9.8 times higher NGA50 or N50 than a plain assembly of the reads and 1.03 to 6.5 times higher NGA50 or N50 than a previous approach integrating genetic linkage maps with contig assembly. Furthermore, also the correctness of the assembly remains similar or improves as compared to an assembly using only the read data.
Collapse
Affiliation(s)
- Riku Walve
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Leena Salmela
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
18
|
Anopheles mosquitoes reveal new principles of 3D genome organization in insects. Nat Commun 2022; 13:1960. [PMID: 35413948 PMCID: PMC9005712 DOI: 10.1038/s41467-022-29599-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 03/24/2022] [Indexed: 11/24/2022] Open
Abstract
Chromosomes are hierarchically folded within cell nuclei into territories, domains and subdomains, but the functional importance and evolutionary dynamics of these hierarchies are poorly defined. Here, we comprehensively profile genome organizations of five Anopheles mosquito species and show how different levels of chromatin architecture influence each other. Patterns observed on Hi-C maps are associated with known cytological structures, epigenetic profiles, and gene expression levels. Evolutionary analysis reveals conservation of chromatin architecture within synteny blocks for tens of millions of years and enrichment of synteny breakpoints in regions with increased genomic insulation. However, in-depth analysis shows a confounding effect of gene density on both insulation and distribution of synteny breakpoints, suggesting limited causal relationship between breakpoints and regions with increased genomic insulation. At the level of individual loci, we identify specific, extremely long-ranged looping interactions, conserved for ~100 million years. We demonstrate that the mechanisms underlying these looping contacts differ from previously described Polycomb-dependent interactions and clustering of active chromatin. Anopheles mosquitoes are vectors of human malaria, and better understanding of them has implications for public health. Here, the authors apply Hi-C, FISH, RNA-seq, and ChIP-seq techniques to comprehensively characterize chromatin architecture and its evolutionary dynamics in five Anopheles species.
Collapse
|
19
|
Oba Y, Schultz DT. Firefly genomes illuminate the evolution of beetle bioluminescent systems. CURRENT OPINION IN INSECT SCIENCE 2022; 50:100879. [PMID: 35091104 DOI: 10.1016/j.cois.2022.100879] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 12/30/2021] [Accepted: 01/20/2022] [Indexed: 06/14/2023]
Abstract
Fireflies are one of the best-known bioluminescent organisms, and the reaction mechanism and ecological utility of bioluminescence have been well-studied. Genome assemblies of six species of bioluminescent beetles have recently been published. These studies have focused on the evolution of novelties; luciferase, and the biosynthesis of luciferin and defensive chemicals. For example, clustering of the luciferase gene with acyl-CoA synthetase genes on a chromosome in luminous beetle genomes suggests the involvement of tandem gene duplications and neofunctionalization during the evolution of beetle bioluminescence. Several candidate genes for critical roles in beetle bioluminescence have been identified, but their functional analyses are still ongoing. The establishment of a long-term mass-rearing system and strain will be the key for the post-genome research on bioluminescent beetles. Lastly, the application of contemporary chromosome-scale genome assembly techniques to luminous beetles will help resolve outstanding evolutionary questions, such as how many times bioluminescence evolved in this clade.
Collapse
Affiliation(s)
- Yuichi Oba
- Department of Environmental Biology, Chubu University, Kasugai 487-8501, Japan.
| | - Darrin T Schultz
- Monterey Bay Aquarium Research Institute, Moss Landing, CA 95039, United States
| |
Collapse
|
20
|
Baud A, McPeek S, Chen N, Hughes KA. Indirect Genetic Effects: A Cross-disciplinary Perspective on Empirical Studies. J Hered 2022; 113:1-15. [PMID: 34643239 PMCID: PMC8851665 DOI: 10.1093/jhered/esab059] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Indirect genetic effects (IGE) occur when an individual's phenotype is influenced by genetic variation in conspecifics. Opportunities for IGE are ubiquitous, and, when present, IGE have profound implications for behavioral, evolutionary, agricultural, and biomedical genetics. Despite their importance, the empirical study of IGE lags behind the development of theory. In large part, this lag can be attributed to the fact that measuring IGE, and deconvoluting them from the direct genetic effects of an individual's own genotype, is subject to many potential pitfalls. In this Perspective, we describe current challenges that empiricists across all disciplines will encounter in measuring and understanding IGE. Using ideas and examples spanning evolutionary, agricultural, and biomedical genetics, we also describe potential solutions to these challenges, focusing on opportunities provided by recent advances in genomic, monitoring, and phenotyping technologies. We hope that this cross-disciplinary assessment will advance the goal of understanding the pervasive effects of conspecific interactions in biology.
Collapse
Affiliation(s)
- Amelie Baud
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,the Universitat Pompeu Fabra (UPF), Barcelona,Spain
| | - Sarah McPeek
- the Department of Biology, University of Virginia, Charlottesville, VA 22904, USA
| | - Nancy Chen
- the Department of Biology, University of Rochester, Rochester, NY 14627,USA
| | - Kimberly A Hughes
- the Department of Biological Science, Florida State University, Tallahassee, FL 32303,USA
| |
Collapse
|
21
|
Discordant Genome Assemblies Drastically Alter the Interpretation of Single-Cell RNA Sequencing Data Which Can Be Mitigated by a Novel Integration Method. Cells 2022; 11:cells11040608. [PMID: 35203259 PMCID: PMC8870202 DOI: 10.3390/cells11040608] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 01/27/2022] [Accepted: 02/07/2022] [Indexed: 02/04/2023] Open
Abstract
Advances in sequencing and assembly technology have led to the creation of genome assemblies for a wide variety of non-model organisms. The rapid production and proliferation of updated, novel assembly versions can create vexing problems for researchers when multiple-genome assembly versions are available at once, requiring researchers to work with more than one reference genome. Multiple-genome assemblies are especially problematic for researchers studying the genetic makeup of individual cells, as single-cell RNA sequencing (scRNAseq) requires sequenced reads to be mapped and aligned to a single reference genome. Using the Astyanax mexicanus, this study highlights how the interpretation of a single-cell dataset from the same sample changes when aligned to its two different available genome assemblies. We found that the number of cells and expressed genes detected were drastically different when aligning to the different assemblies. When the genome assemblies were used in isolation with their respective annotations, cell-type identification was confounded, as some classic cell-type markers were assembly-specific, whilst other genes showed differential patterns of expression between the two assemblies. To overcome the problems posed by multiple-genome assemblies, we propose that researchers align to each available assembly and then integrate the resultant datasets to produce a final dataset in which all genome alignments can be used simultaneously. We found that this approach increased the accuracy of cell-type identification and maximised the amount of data that could be extracted from our single-cell sample by capturing all possible cells and transcripts. As scRNAseq becomes more widely available, it is imperative that the single-cell community is aware of how genome assembly alignment can alter single-cell data and their interpretation, especially when reviewing studies on non-model organisms.
Collapse
|
22
|
Vidal-Limon A, Aguilar-Toalá JE, Liceaga AM. Integration of Molecular Docking Analysis and Molecular Dynamics Simulations for Studying Food Proteins and Bioactive Peptides. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2022; 70:934-943. [PMID: 34990125 DOI: 10.1021/acs.jafc.1c06110] [Citation(s) in RCA: 95] [Impact Index Per Article: 47.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In silico tools, such as molecular docking, are widely applied to study interactions and binding affinity of biological activity of proteins and peptides. However, restricted sampling of both ligand and receptor conformations and use of approximated scoring functions can produce results that do not correlate with actual experimental binding affinities. Molecular dynamics simulations (MDS) can provide valuable information in deciphering functional mechanisms of proteins/peptides and other biomolecules, overcoming the rigid sampling limitations in docking analysis. This review will discuss the information related to the traditional use of in silico models, such as molecular docking, and its application for studying food proteins and bioactive peptides, followed by an in-depth introduction to the theory of MDS and description of why these molecular simulation techniques are important in the theoretical prediction of structural and functional dynamics of food proteins and bioactive peptides. Applications, limitations, and future prospects of MDS will also be discussed.
Collapse
Affiliation(s)
- Abraham Vidal-Limon
- Red de Estudios Moleculares Avanzados, Clúster Científico y Tecnológico BioMimic, Instituto de Ecología A.C. (INECOL), Carretera Antigua a Coatepec 351, El Haya, Xalapa, Veracruz 91073, Mexico
| | - José E Aguilar-Toalá
- Departamento de Ciencias de la Alimentación, División de Ciencias Biológicas y de la Salud, Universidad Autónoma Metropolitana Unidad Lerma, Avenida de las Garzas 10, Colonia El Panteón, Lerma de Villada, Estado de México 52005, Mexico
| | - Andrea M Liceaga
- Protein Chemistry and Bioactive Peptides Laboratory. Department of Food Science, Purdue University, 745 Agriculture Mall Drive, West Lafayette, Indiana 47907, United States
| |
Collapse
|
23
|
Ludwig A, Pippel M, Myers G, Hiller M. DENTIST-using long reads for closing assembly gaps at high accuracy. Gigascience 2022; 11:6514926. [PMID: 35077539 PMCID: PMC8848313 DOI: 10.1093/gigascience/giab100] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 12/07/2021] [Accepted: 12/15/2021] [Indexed: 12/15/2022] Open
Abstract
Background Long sequencing reads allow increasing contiguity and completeness of fragmented, short-read–based genome assemblies by closing assembly gaps, ideally at high accuracy. While several gap-closing methods have been developed, these methods often close an assembly gap with sequence that does not accurately represent the true sequence. Findings Here, we present DENTIST, a sensitive, highly accurate, and automated pipeline method to close gaps in short-read assemblies with long error-prone reads. DENTIST comprehensively determines repetitive assembly regions to identify reliable and unambiguous alignments of long reads to the correct loci, integrates a consensus sequence computation step to obtain a high base accuracy for the inserted sequence, and validates the accuracy of closed gaps. Unlike previous benchmarks, we generated test assemblies that have gaps at the exact positions where real short-read assemblies have gaps. Generating such realistic benchmarks for Drosophila (134 Mb genome), Arabidopsis (119 Mb), hummingbird (1 Gb), and human (3 Gb) and using simulated or real PacBio continuous long reads, we show that DENTIST consistently achieves a substantially higher accuracy compared to previous methods, while having a similar sensitivity. Conclusion DENTIST provides an accurate approach to improve the contiguity and completeness of fragmented assemblies with long reads. DENTIST's source code including a Snakemake workflow, conda package, and Docker container is available at https://github.com/a-ludi/dentist. All test assemblies as a resource for future benchmarking are at https://bds.mpi-cbg.de/hillerlab/DENTIST/.
Collapse
Affiliation(s)
- Arne Ludwig
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany.,Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany.,Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany
| | - Gene Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany.,Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany.,Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresden, Germany.,LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325 Frankfurt, Germany.,Senckenberg Research Institute, Senckenberganlage 25, 60325 Frankfurt, Germany.,Goethe-University, Faculty of Biosciences, Max-von-Laue-Str. 9, 60438 Frankfurt, Germany
| |
Collapse
|
24
|
Wierzbicki F, Schwarz F, Cannalonga O, Kofler R. Novel quality metrics allow identifying and generating high-quality assemblies of piRNA clusters. Mol Ecol Resour 2022; 22:102-121. [PMID: 34181811 DOI: 10.1111/1755-0998.13455] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 04/30/2021] [Accepted: 06/14/2021] [Indexed: 12/30/2022]
Abstract
In most animals, it is thought that the proliferation of a transposable element (TE) is stopped when the TE jumps into a piRNA cluster. Despite this central importance, little is known about the composition and the evolutionary dynamics of piRNA clusters. This is largely because piRNA clusters are notoriously difficult to assemble as they are frequently composed of highly repetitive DNA. With long reads, we may finally be able to obtain reliable assemblies of piRNA clusters. Unfortunately, it is unclear how to generate and identify the best assemblies, as many assembly strategies exist and standard quality metrics are ignorant of TEs. To address these problems, we introduce several novel quality metrics that assess: (a) the fraction of completely assembled piRNA clusters, (b) the quality of the assembled clusters and (c) whether an assembly captures the overall TE landscape of an organisms (i.e. the abundance, the number of SNPs and internal deletions of all TE families). The requirements for computing these metrics vary, ranging from annotations of piRNA clusters to consensus sequences of TEs and genomic sequencing data. Using these novel metrics, we evaluate the effect of assembly algorithm, polishing, read length, coverage, residual polymorphisms and finally identify strategies that yield reliable assemblies of piRNA clusters. Based on an optimized approach, we provide assemblies for the two Drosophila melanogaster strains Canton-S and Pi2. About 80% of known piRNA clusters were assembled in both strains. Finally, we demonstrate the generality of our approach by extending our metrics to humans and Arabidopsis thaliana.
Collapse
Affiliation(s)
- Filip Wierzbicki
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Florian Schwarz
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | | | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
25
|
Delorme Q, Costa R, Mansour Y, Fiston-Lavier AS, Chateau A. Involving repetitive regions in scaffolding improvement. J Bioinform Comput Biol 2021; 19:2140016. [PMID: 34923926 DOI: 10.1142/s0219720021400163] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
In this paper, we investigate througth a premilinary study the influence of repeat elements during the assembly process. We analyze the link between the presence and the nature of one type of repeat element, called transposable element (TE) and misassembly events in genome assemblies. We propose to improve assemblies by taking into account the presence of repeat elements, including TEs, during the scaffolding step. We analyze the results and relate the misassemblies to TEs before and after correction.
Collapse
Affiliation(s)
- Quentin Delorme
- LIRMM, Univ Montpellier, CNRS, Montpellier, France.,Laboratoire MIVEGEC (Université de Montpellier, CNRS 5290, IRD 229), Centre de Recherche en Écologie et Évolution de la Santé (CREES), Institut de Recherche pour le Développement (IRD), F-34394, Montpellier, France
| | - Rémy Costa
- LIRMM, Univ Montpellier, CNRS, Montpellier, France.,IGH-UMR9002, Univ Montpellier, CNRS, Montpellier, France
| | - Yasmine Mansour
- LIRMM, Univ Montpellier, CNRS, Montpellier, France.,ISEM, Univ Montpellier, CNRS, IRD, Montpellier, France
| | - Anna-Sophie Fiston-Lavier
- ISEM, Univ Montpellier, CNRS, IRD, Montpellier, France.,Institut Universitaire de France (IUF), France
| | | |
Collapse
|
26
|
CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure. PLoS Comput Biol 2021; 17:e1009631. [PMID: 34813594 PMCID: PMC8651127 DOI: 10.1371/journal.pcbi.1009631] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 12/07/2021] [Accepted: 11/11/2021] [Indexed: 11/19/2022] Open
Abstract
With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a classification system to describe the complexity of underlying graphs from which contigs are created. Each contig is labelled with one of three levels, indicating whether or not ambiguous paths exist. A by-product of this is information on the range of complexity of the underlying gene families present. As a demonstration of CStones ability to assemble high-quality contigs, and to label them in this manner, both simulated and real data were used. For simulated data, ten million read pairs were generated from cDNA libraries representing four species, Drosophila melanogaster, Panthera pardus, Rattus norvegicus and Serinus canaria. These were assembled using CStone, Trinity and rnaSPAdes; the latter two being high-quality, well established, de novo assembers. For real data, two RNA-Seq datasets, each consisting of ≈30 million read pairs, representing two adult D. melanogaster whole-body samples were used. The contigs that CStone produced were comparable in quality to those of Trinity and rnaSPAdes in terms of length, sequence identity of aligned regions and the range of cDNA transcripts represented, whilst providing additional information on chimerism. Here we describe the details of CStones assembly and classification process, and propose that similar classification systems can be incorporated into other de novo assembly tools. Within a related side study, we explore the effects that chimera’s within reference sets have on the identification of differentially expression genes. CStone is available at: https://sourceforge.net/projects/cstone/. Within transcriptome reference sets, non-chimeric sequences are representations of transcribed genes, while artificially generated chimeric ones are mosaics of two or more pieces of DNA incorrectly pieced together. One area where such sets are utilized is in the quantification of gene expression patterns; where RNA-Seq reads are mapped to the sequences within, and subsequent count values reflect expression levels. Artificial chimeras can have a negative impact on count values by erroneously increasing variation in relation to the reads being mapped. Reference sets can be created from de novo assembled contigs, but chimeras can be introduced during the assembly process via the required traversal of graphs, representing gene families, constructed from the RNA-Seq data. Graph complexity determines how likely chimeras will arise. We have created CStone, a de novo assembler that utilizes a classification system to describe such complexity. Contigs created by CStone are labelled in a manner that indicates whether or not they are non-chimeric. This encourages contig dependent results to be presented with increased objectivity by maintaining the context of ambiguity associated with the assembly process. CStone has been tested extensively. Additionally, we have quantified the relationship between chimeras within reference sets and the identification of differentially expressed genes.
Collapse
|
27
|
Schultz DT, Francis WR, McBroome JD, Christianson LM, Haddock SHD, Green RE. A chromosome-scale genome assembly and karyotype of the ctenophore Hormiphora californensis. G3 (BETHESDA, MD.) 2021; 11:jkab302. [PMID: 34545398 PMCID: PMC8527503 DOI: 10.1093/g3journal/jkab302] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 08/18/2021] [Indexed: 11/12/2022]
Abstract
Here, we present a karyotype, a chromosome-scale genome assembly, and a genome annotation from the ctenophore Hormiphora californensis (Ctenophora: Cydippida: Pleurobrachiidae). The assembly spans 110 Mb in 44 scaffolds and 99.47% of the bases are contained in 13 scaffolds. Chromosome micrographs and Hi-C heatmaps support a karyotype of 13 diploid chromosomes. Hi-C data reveal three large heterozygous inversions on chromosome 1, and one heterozygous inversion shares the same gene order found in the genome of the ctenophore Pleurobrachia bachei. We find evidence that H. californensis and P. bachei share thirteen homologous chromosomes, and the same karyotype of 1n = 13. The manually curated PacBio Iso-Seq-based genome annotation reveals complex gene structures, including nested genes and trans-spliced leader sequences. This chromosome-scale assembly is a useful resource for ctenophore biology and will aid future studies of metazoan evolution and phylogenetics.
Collapse
Affiliation(s)
- Darrin T Schultz
- Department of Biomolecular Engineering and Bioinformatics, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Monterey Bay Aquarium Research Institute, Moss Landing, CA 95039, USA
| | - Warren R Francis
- Department of Biology, University of Southern Denmark, Odense 5230, Denmark
| | - Jakob D McBroome
- Department of Biomolecular Engineering and Bioinformatics, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Steven H D Haddock
- Monterey Bay Aquarium Research Institute, Moss Landing, CA 95039, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard E Green
- Department of Biomolecular Engineering and Bioinformatics, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
28
|
Tsai H, Kippes N, Firl A, Lieberman M, Comai L, Henry IM. Efficient construction of a linkage map and haplotypes for Mentha suaveolens using sequence capture. G3-GENES GENOMES GENETICS 2021; 11:6321234. [PMID: 34544134 PMCID: PMC8496254 DOI: 10.1093/g3journal/jkab232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 06/25/2021] [Indexed: 11/12/2022]
Abstract
The sustainability of many crops is hindered by the lack of genomic resources and a poor understanding of natural genetic diversity. Particularly, application of modern breeding requires high-density linkage maps that are integrated into a highly contiguous reference genome. Here, we present a rapid method for deriving haplotypes and developing linkage maps, and its application to Mentha suaveolens, one of the diploid progenitors of cultivated mints. Using sequence-capture via DNA hybridization to target single nucleotide polymorphisms (SNPs), we successfully genotyped ∼5000 SNPs within the genome of >400 individuals derived from a self cross. After stringent quality control, and identification of nonredundant SNPs, 1919 informative SNPs were retained for linkage map construction. The resulting linkage map defined a total genetic space of 942.17 cM divided among 12 linkage groups, ranging from 56.32 to 122.61 cM in length. The linkage map is in good agreement with pseudomolecules from our preliminary genome assembly, proving this resource effective for the correction and validation of the reference genome. We discuss the advantages of this method for the rapid creation of linkage maps.
Collapse
Affiliation(s)
- Helen Tsai
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Nestor Kippes
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Alana Firl
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Meric Lieberman
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Luca Comai
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Isabelle M Henry
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| |
Collapse
|
29
|
Mitchell LJ, Cheney KL, Luehrmann M, Marshall NJ, Michie K, Cortesi F. Molecular evolution of ultraviolet visual opsins and spectral tuning of photoreceptors in anemonefishes (Amphiprioninae). Genome Biol Evol 2021; 13:6347585. [PMID: 34375382 PMCID: PMC8511661 DOI: 10.1093/gbe/evab184] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/05/2021] [Indexed: 11/29/2022] Open
Abstract
Many animals including birds, reptiles, insects, and teleost fishes can see ultraviolet (UV) light (shorter than 400 nm), which has functional importance for foraging and communication. For coral reef fishes, shallow reef environments transmit a broad spectrum of light, rich in UV, driving the evolution of diverse spectral sensitivities. However, the identities and sites of the specific visual genes that underly vision in reef fishes remain elusive and are useful in determining how evolution has tuned vision to suit life on the reef. We investigated the visual systems of 11 anemonefish (Amphiprioninae) species, specifically probing for the molecular pathways that facilitate UV-sensitivity. Searching the genomes of anemonefishes, we identified a total of eight functional opsin genes from all five vertebrate visual opsin subfamilies. We found rare instances of teleost UV-sensitive SWS1 opsin gene duplications that produced two functionally coding paralogs (SWS1α and SWS1β) and a pseudogene. We also found separate green sensitive RH2A opsin gene duplicates not yet reported in the family Pomacentridae. Transcriptome analysis revealed false clown anemonefish (Amphiprion ocellaris) expressed one rod opsin (RH1) and six cone opsins (SWS1β, SWS2B, RH2B, RH2A-1, RH2A-2, LWS) in the retina. Fluorescent in situ hybridization highlighted the (co-)expression of SWS1β with SWS2B in single cones, and either RH2B, RH2A, or RH2A together with LWS in different members of double cone photoreceptors (two single cones fused together). Our study provides the first in-depth characterization of visual opsin genes found in anemonefishes and provides a useful basis for the further study of UV-vision in reef fishes.
Collapse
Affiliation(s)
- Laurie J Mitchell
- School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Karen L Cheney
- School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Martin Luehrmann
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| | - N Justin Marshall
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Kyle Michie
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia.,King's College, Cambridge, CB2 1ST, UK
| | - Fabio Cortesi
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
30
|
The genomics of ecological flexibility, large brains, and long lives in capuchin monkeys revealed with fecalFACS. Proc Natl Acad Sci U S A 2021; 118:2010632118. [PMID: 33574059 PMCID: PMC7896301 DOI: 10.1073/pnas.2010632118] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Surviving challenging environments, living long lives, and engaging in complex cognitive processes are hallmark human characteristics. Similar traits have evolved in parallel in capuchin monkeys, but their genetic underpinnings remain unexplored. We developed and annotated a reference assembly for white-faced capuchin monkeys to explore the evolution of these phenotypes. By comparing populations of capuchins inhabiting rainforest versus dry forests with seasonal droughts, we detected selection in genes associated with kidney function, muscular wasting, and metabolism, suggesting adaptation to periodic resource scarcity. When comparing capuchins to other mammals, we identified evidence of selection in multiple genes implicated in longevity and brain development. Our research was facilitated by our method to generate high- and low-coverage genomes from noninvasive biomaterials. Ecological flexibility, extended lifespans, and large brains have long intrigued evolutionary biologists, and comparative genomics offers an efficient and effective tool for generating new insights into the evolution of such traits. Studies of capuchin monkeys are particularly well situated to shed light on the selective pressures and genetic underpinnings of local adaptation to diverse habitats, longevity, and brain development. Distributed widely across Central and South America, they are inventive and extractive foragers, known for their sensorimotor intelligence. Capuchins have among the largest relative brain size of any monkey and a lifespan that exceeds 50 y, despite their small (3 to 5 kg) body size. We assemble and annotate a de novo reference genome for Cebus imitator. Through high-depth sequencing of DNA derived from blood, various tissues, and feces via fluorescence-activated cell sorting (fecalFACS) to isolate monkey epithelial cells, we compared genomes of capuchin populations from tropical dry forests and lowland rainforests and identified population divergence in genes involved in water balance, kidney function, and metabolism. Through a comparative genomics approach spanning a wide diversity of mammals, we identified genes under positive selection associated with longevity and brain development. Additionally, we provide a technological advancement in the use of noninvasive genomics for studies of free-ranging mammals. Our intra- and interspecific comparative study of capuchin genomics provides insights into processes underlying local adaptation to diverse and physiologically challenging environments, as well as the molecular basis of brain evolution and longevity.
Collapse
|
31
|
Considerations for Initiating a Wildlife Genomics Research Project in South and South-East Asia. J Indian Inst Sci 2021. [DOI: 10.1007/s41745-021-00243-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
32
|
Bai S, Wu H, Zhang J, Pan Z, Zhao W, Li Z, Tong C. Genome Assembly of Salicaceae Populus deltoides (Eastern Cottonwood) I-69 Based on Nanopore Sequencing and Hi-C Technologies. J Hered 2021; 112:303-310. [PMID: 33730157 PMCID: PMC8141683 DOI: 10.1093/jhered/esab010] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/16/2021] [Indexed: 12/30/2022] Open
Abstract
Populus deltoides has important ecological and economic values, widely used in poplar breeding programs due to its superior characteristics such as rapid growth and resistance to disease. Although the genome sequence of P. deltoides WV94 is available, the assembly is fragmented. Here, we reported an improved chromosome-level assembly of the P. deltoides cultivar I-69 by combining Nanopore sequencing and chromosome conformation capture (Hi-C) technologies. The assembly was 429.3 Mb in size and contained 657 contigs with a contig N50 length of 2.62 Mb. Hi-C scaffolding of the contigs generated 19 chromosome-level sequences, which covered 97.4% (418 Mb) of the total assembly size. Moreover, repetitive sequences annotation showed that 39.28% of the P. deltoides genome was composed of interspersed elements, including retroelements (23.66%), DNA transposons (6.83%), and unclassified elements (8.79%). We also identified a total of 44 362 protein-coding genes in the current P. deltoides assembly. Compared with the previous genome assembly of P. deltoides WV94, the current assembly had some significantly improved qualities: the contig N50 increased 3.5-fold and the proportion of gaps decreased from 3.2% to 0.08%. This high-quality, well-annotated genome assembly provides a reliable genomic resource for identifying genome variants among individuals, mining candidate genes that control growth and wood quality traits, and facilitating further application of genomics-assisted breeding in populations related to P. deltoides.
Collapse
Affiliation(s)
- Shengjun Bai
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Hainan Wu
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Jinpeng Zhang
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Zhiliang Pan
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Wei Zhao
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Zhiting Li
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Chunfa Tong
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| |
Collapse
|
33
|
Kivikoski M, Rastas P, Löytynoja A, Merilä J. Automated improvement of stickleback reference genome assemblies with Lep-Anchor software. Mol Ecol Resour 2021; 21:2166-2176. [PMID: 33955177 DOI: 10.1111/1755-0998.13404] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 04/12/2021] [Accepted: 04/13/2021] [Indexed: 01/06/2023]
Abstract
We describe an integrative approach to improve contiguity and haploidy of a reference genome assembly and demonstrate its impact with practical examples. With two novel features of Lep-Anchor software and a combination of dense linkage maps, overlap detection and bridging long reads, we generated an improved assembly of the nine-spined stickleback (Pungitius pungitius) reference genome. We were able to remove a significant number of haplotypic contigs, detect more genetic variation and improve the contiguity of the genome, especially that of X chromosome. However, improved scaffolding cannot correct for mosaicism of erroneously assembled contigs, demonstrated by a de novo assembly of a 1.6-Mbp inversion. Qualitatively similar gains were obtained with the genome of three-spined stickleback (Gasterosteus aculeatus). Since the utility of genome-wide sequencing data in biological research depends heavily on the quality of the reference genome, the improved and fully automated approach described here should be helpful in refining reference genome assemblies.
Collapse
Affiliation(s)
- Mikko Kivikoski
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
| | - Pasi Rastas
- Institute of Biotechnology, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Ari Löytynoja
- Institute of Biotechnology, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Juha Merilä
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland.,Division of Ecology and Biodiversity, The University of Hong Kong, Hong Kong, Hong Kong, SAR
| |
Collapse
|
34
|
Seixas FA, Edelman NB, Mallet J. Synteny-Based Genome Assembly for 16 Species of Heliconius Butterflies, and an Assessment of Structural Variation across the Genus. Genome Biol Evol 2021; 13:6207971. [PMID: 33792688 PMCID: PMC8290116 DOI: 10.1093/gbe/evab069] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/29/2021] [Indexed: 12/11/2022] Open
Abstract
Heliconius butterflies (Lepidoptera: Nymphalidae) are a group of 48 neotropical species widely studied in evolutionary research. Despite the wealth of genomic data generated in past years, chromosomal level genome assemblies currently exist for only two species, Heliconius melpomene and Heliconius erato, each a representative of one of the two major clades of the genus. Here, we use these reference genomes to improve the contiguity of previously published draft genome assemblies of 16 Heliconius species. Using a reference-assisted scaffolding approach, we place and order the scaffolds of these genomes onto chromosomes, resulting in 95.7-99.9% of their genomes anchored to chromosomes. Genome sizes are somewhat variable among species (270-422 Mb) and in one small group of species (Heliconius hecale, Heliconius elevatus, and Heliconius pardalinus) expansions in genome size are driven mainly by repetitive sequences that map to four small regions in the H. melpomene reference genome. Genes from these repeat regions show an increase in exon copy number, an absence of internal stop codons, evidence of constraint on nonsynonymous changes, and increased expression, all of which suggest that at least some of the extra copies are functional. Finally, we conducted a systematic search for inversions and identified five moderately large inversions fixed between the two major Heliconius clades. We infer that one of these inversions was transferred by introgression between the lineages leading to the erato/sara and burneyi/doris clades. These reference-guided assemblies represent a major improvement in Heliconius genomic resources that enable further genetic and evolutionary discoveries in this genus.
Collapse
Affiliation(s)
- Fernando A Seixas
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Nathaniel B Edelman
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA.,Yale Institute for Biospheric Studies, Yale University, New Haven, Connecticut, USA
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
35
|
Patro R, Salmela L. Algorithms meet sequencing technologies - 10th edition of the RECOMB-Seq workshop. iScience 2021; 24:101956. [PMID: 33437938 PMCID: PMC7788091 DOI: 10.1016/j.isci.2020.101956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
DNA and RNA sequencing is a core technology in biological and medical research. The high throughput of these technologies and the consistent development of new experimental assays and biotechnologies demand the continuous development of methods to analyze the resulting data. The RECOMB Satellite Workshop on Massively Parallel Sequencing brings together leading researchers in computational genomics to discuss emerging frontiers in algorithm development for massively parallel sequencing data. The 10th meeting in this series, RECOMB-Seq 2020, was scheduled to be held in Padua, Italy, but due to the ongoing COVID-19 pandemic, the meeting was carried out virtually instead. The online workshop featured keynote talks by Paola Bonizzoni and Zamin Iqbal, two highlight talks, ten regular talks, and three short talks. Seven of the works presented in the workshop are featured in this edition of iScience, and many of the talks are available online in the RECOMB-Seq 2020 YouTube channel.
Collapse
Affiliation(s)
- Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Leena Salmela
- Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| |
Collapse
|
36
|
Whibley A, Kelley JL, Narum SR. The changing face of genome assemblies: Guidance on achieving high-quality reference genomes. Mol Ecol Resour 2021; 21:641-652. [PMID: 33326691 DOI: 10.1111/1755-0998.13312] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 12/08/2020] [Accepted: 12/11/2020] [Indexed: 12/20/2022]
Abstract
The quality of genome assemblies has improved rapidly in recent years due to continual advances in sequencing technology, assembly approaches, and quality control. In the field of molecular ecology, this has led to the development of exceptional quality genome assemblies that will be important long-term resources for broader studies into ecological, conservation, evolutionary, and population genomics of naturally occurring species. Moreover, the extent to which a single reference genome represents the diversity within a species varies: pan-genomes will become increasingly important ecological genomics resources, particularly in systems found to have considerable presence-absence variation in their functional content. Here, we highlight advances in technology that have raised the bar for genome assembly and provide guidance on standards to achieve exceptional quality reference genomes. Key recommendations include the following: (a) Genome assemblies should include long-read sequencing except in rare cases where it is effectively impossible to acquire adequately preserved samples needed for high molecular weight DNA standards. (b) At least one scaffolding approach should be included with genome assembly such as Hi-C or optical mapping. (c) Genome assemblies should be carefully evaluated, this may involve utilising short read data for genome polishing, error correction, k-mer analyses, and estimating the percent of reads that map back to an assembly. Finally, a genome assembly is most valuable if all data and methods are made publicly available and the utility of a genome for further studies is verified through examples. While these recommendations are based on current technology, we anticipate that future advances will push the field further and the molecular ecology community should continue to adopt new approaches that attain the highest quality genome assemblies.
Collapse
Affiliation(s)
| | | | - Shawn R Narum
- University of Idaho, Moscow, ID, USA.,Columbia River Inter-Tribal Fish Commission, Hagerman, ID, USA
| |
Collapse
|
37
|
Yáñez Feliú G, Earle Gómez B, Codoceo Berrocal V, Muñoz Silva M, Nuñez IN, Matute TF, Arce Medina A, Vidal G, Vitalis C, Dahlin J, Federici F, Rudge TJ. Flapjack: Data Management and Analysis for Genetic Circuit Characterization. ACS Synth Biol 2021; 10:183-191. [PMID: 33382586 DOI: 10.1021/acssynbio.0c00554] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Characterization is fundamental to the design, build, test, learn (DBTL) cycle for engineering synthetic genetic circuits. Components must be described in such a way as to account for their behavior in a range of contexts. Measurements and associated metadata, including part composition, constitute the test phase of the DBTL cycle. These data may consist of measurements of thousands of circuits, measured in hundreds of conditions, in multiple assays potentially performed in different laboratories and using different techniques. In order to inform the learn phase this large volume of data must be filtered, collated, and analyzed. Characterization consists of using this data to parametrize models of component function in different contexts, and combining them to predict behaviors of novel circuits. Tools to store, organize, share, and analyze large volumes of measurement and metadata are therefore essential to linking the test phase to the build and learn phases, closing the loop of the DBTL cycle. Here we present such a system, implemented as a web app with a backend data registry and analysis engine. An interactive frontend provides powerful querying, plotting, and analysis tools, and we provide a REST API and Python package for full integration with external build and learn software. All measurements are associated with circuit part composition via SBOL (Synthetic Biology Open Language). We demonstrate our tool by characterizing a range of genetic components and circuits according to composition and context.
Collapse
Affiliation(s)
- Guillermo Yáñez Feliú
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
| | - Benjamín Earle Gómez
- Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
| | - Verner Codoceo Berrocal
- Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
| | - Macarena Muñoz Silva
- Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
| | - Isaac N Nuñez
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
- Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Pontificia Universidad Católica de Chile, Santiago 8330005, Chile
| | - Tamara F Matute
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
- Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Pontificia Universidad Católica de Chile, Santiago 8330005, Chile
| | - Anibal Arce Medina
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Pontificia Universidad Católica de Chile, Santiago 8330005, Chile
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago 8330005, Chile
| | - Gonzalo Vidal
- Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
| | - Carlos Vitalis
- Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
| | - Jonathan Dahlin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Fernán Federici
- Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Pontificia Universidad Católica de Chile, Santiago 8330005, Chile
- FONDAP, Center for Genome Regulation, Pontificia Universidad Católica de Chile, Santiago 8330005, Chile
| | - Timothy J Rudge
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
- Institute for Biological and Medical Engineering, Schools of Engineering, Biology and Medicine, Pontificia Universidad Católica de Chile, Santiago 7820244, Chile
| |
Collapse
|
38
|
Du H, Diao C, Zhao P, Zhou L, Liu JF. Integrated hybrid de novo assembly technologies to obtain high-quality pig genome using short and long reads. Brief Bioinform 2021; 22:6082823. [PMID: 33429431 DOI: 10.1093/bib/bbaa399] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 11/20/2020] [Accepted: 12/08/2020] [Indexed: 11/12/2022] Open
Abstract
With the rapid progress of sequencing technologies, various types of sequencing reads and assembly algorithms have been designed to construct genome assemblies. Although recent studies have attempted to evaluate the appropriate type of sequencing reads and algorithms for assembling high-quality genomes, it is still a challenge to set the correct combination for constructing animal genomes. Here, we present a comparative performance assessment of 14 assembly combinations-9 software programs with different short and long reads of Duroc pig. Based on the results of the optimization process for genome construction, we designed an integrated hybrid de novo assembly pipeline, HSCG, and constructed a draft genome for Duroc pig. Comparison between the new genome and Sus scrofa 11.1 revealed important breakpoints in two S. scrofa 11.1 genes. Our findings may provide new insights into the pan-genome analysis studies of agricultural animals, and the integrated assembly pipeline may serve as a guide for the assembly of other animal genomes.
Collapse
Affiliation(s)
- Heng Du
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Chenguang Diao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Pengju Zhao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Lei Zhou
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
39
|
Yamaguchi K, Koyanagi M, Kuraku S. Visual and nonvisual opsin genes of sharks and other nonosteichthyan vertebrates: Genomic exploration of underwater photoreception. J Evol Biol 2020; 34:968-976. [DOI: 10.1111/jeb.13730] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 10/21/2020] [Accepted: 10/21/2020] [Indexed: 12/16/2022]
Affiliation(s)
- Kazuaki Yamaguchi
- Laboratory for Phyloinformatics RIKEN Center for Biosystems Dynamics Research (BDR) Kobe Japan
| | - Mitsumasa Koyanagi
- Department of Biology and Geosciences Graduate School of Science Osaka City University Osaka Japan
| | - Shigehiro Kuraku
- Laboratory for Phyloinformatics RIKEN Center for Biosystems Dynamics Research (BDR) Kobe Japan
| |
Collapse
|
40
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
41
|
Tümmler B. Molecular epidemiology in current times. Environ Microbiol 2020; 22:4909-4918. [PMID: 32945108 DOI: 10.1111/1462-2920.15238] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 09/10/2020] [Accepted: 09/15/2020] [Indexed: 01/04/2023]
Abstract
Motivated to find options for prevention or intervention, molecular epidemiology aims to identify the host and microbial factors that determine the transmission, manifestation and progression of infectious disease. The genotyping of cultivatable bacterial strains is performed by either anonymous fingerprinting techniques or sequence-based exploration of variable genomic sites. Multilocus sequence typing of housekeeping genes and allele profiling of the core genome have become standard techniques of bacterial strain typing that may be supplemented by whole genome sequencing to explore all single nucleotide variants and/or the composition of the accessory genome. Next, novel protocols to investigate host and microbiome based upon smart third generation sequencing technologies are being developed for an effective surveillance, rapid diagnosis and real-time tracking of infectious diseases.
Collapse
Affiliation(s)
- Burkhard Tümmler
- Clinical Research Group, Clinic for Paediatric Pneumology, Allergology and Neonatology, Hannover Medical School, Hannover, Germany
| |
Collapse
|
42
|
He C, Lin G, Wei H, Tang H, White FF, Valent B, Liu S. Factorial estimating assembly base errors using k-mer abundance difference (KAD) between short reads and genome assembled sequences. NAR Genom Bioinform 2020; 2:lqaa075. [PMID: 33575622 PMCID: PMC7671381 DOI: 10.1093/nargab/lqaa075] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 08/02/2020] [Accepted: 09/01/2020] [Indexed: 12/25/2022] Open
Abstract
Genome sequences provide genomic maps with a single-base resolution for exploring genetic contents. Sequencing technologies, particularly long reads, have revolutionized genome assemblies for producing highly continuous genome sequences. However, current long-read sequencing technologies generate inaccurate reads that contain many errors. Some errors are retained in assembled sequences, which are typically not completely corrected by using either long reads or more accurate short reads. The issue commonly exists, but few tools are dedicated for computing error rates or determining error locations. In this study, we developed a novel approach, referred to as k-mer abundance difference (KAD), to compare the inferred copy number of each k-mer indicated by short reads and the observed copy number in the assembly. Simple KAD metrics enable to classify k-mers into categories that reflect the quality of the assembly. Specifically, the KAD method can be used to identify base errors and estimate the overall error rate. In addition, sequence insertion and deletion as well as sequence redundancy can also be detected. Collectively, KAD is valuable for quality evaluation of genome assemblies and, potentially, provides a diagnostic tool to aid in precise error correction. KAD software has been developed to facilitate public uses.
Collapse
Affiliation(s)
- Cheng He
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| | - Guifang Lin
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| | - Hairong Wei
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Haibao Tang
- Center for Genomics and Biotechnology and Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fujian 350002, China
| | - Frank F White
- Department of Plant Pathology, University of Florida, Gainesville, FL 32611-0680, USA
| | - Barbara Valent
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS 66506-5502, USA
| |
Collapse
|
43
|
Adams M, McBroome J, Maurer N, Pepper-Tunick E, Saremi N, Green RE, Vollmers C, Corbett-Detig R. One fly-one genome: chromosome-scale genome assembly of a single outbred Drosophila melanogaster. Nucleic Acids Res 2020; 48:e75. [PMID: 32491177 PMCID: PMC7367183 DOI: 10.1093/nar/gkaa450] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 04/16/2020] [Accepted: 05/18/2020] [Indexed: 02/02/2023] Open
Abstract
A high quality genome assembly is a vital first step for the study of an organism. Recent advances in technology have made the creation of high quality chromosome scale assemblies feasible and low cost. However, the amount of input DNA needed for an assembly project can be a limiting factor for small organisms or precious samples. Here we demonstrate the feasibility of creating a chromosome scale assembly using a hybrid method for a low input sample, a single outbred Drosophila melanogaster. Our approach combines an Illumina shotgun library, Oxford nanopore long reads, and chromosome conformation capture for long range scaffolding. This single fly genome assembly has a N50 of 26 Mb, a length that encompasses entire chromosome arms, contains 95% of expected single copy orthologs, and a nearly complete assembly of this individual's Wolbachia endosymbiont. The methods described here enable the accurate and complete assembly of genomes from small, field collected organisms as well as precious clinical samples.
Collapse
Affiliation(s)
- Matthew Adams
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jakob McBroome
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nicholas Maurer
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Evan Pepper-Tunick
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nedda F Saremi
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard E Green
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Dovetail Genomics, Scotts Valley, CA 95066, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Russell B Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
44
|
instaGRAAL: chromosome-level quality scaffolding of genomes using a proximity ligation-based scaffolder. Genome Biol 2020; 21:148. [PMID: 32552806 PMCID: PMC7386250 DOI: 10.1186/s13059-020-02041-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 05/11/2020] [Indexed: 02/06/2023] Open
Abstract
Hi-C exploits contact frequencies between pairs of loci to bridge and order contigs during genome assembly, resulting in chromosome-level assemblies. Because few robust programs are available for this type of data, we developed instaGRAAL, a complete overhaul of the GRAAL program, which has adapted the latter to allow efficient assembly of large genomes. instaGRAAL features a number of improvements over GRAAL, including a modular correction approach that optionally integrates independent data. We validate the program using data for two brown algae, and human, to generate near-complete assemblies with minimal human intervention.
Collapse
|
45
|
Coombe L, Nikolić V, Chu J, Birol I, Warren RL. ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs. Bioinformatics 2020; 36:3885-3887. [PMID: 32311025 PMCID: PMC7320612 DOI: 10.1093/bioinformatics/btaa253] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 03/23/2020] [Accepted: 04/14/2020] [Indexed: 11/17/2022] Open
Abstract
SUMMARY The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory. AVAILABILITY AND IMPLEMENTATION ntJoin is written in C++ and Python and is freely available at https://github.com/bcgsc/ntjoin. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lauren Coombe
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Vladimir Nikolić
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Justin Chu
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Inanc Birol
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - René L Warren
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| |
Collapse
|
46
|
Orteu A, Jiggins CD. The genomics of coloration provides insights into adaptive evolution. Nat Rev Genet 2020; 21:461-475. [PMID: 32382123 DOI: 10.1038/s41576-020-0234-z] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2020] [Indexed: 01/31/2023]
Abstract
Coloration is an easily quantifiable visual trait that has proven to be a highly tractable system for genetic analysis and for studying adaptive evolution. The application of genomic approaches to evolutionary studies of coloration is providing new insight into the genetic architectures underlying colour traits, including the importance of large-effect mutations and supergenes, the role of development in shaping genetic variation and the origins of adaptive variation, which often involves adaptive introgression. Improved knowledge of the genetic basis of traits can facilitate field studies of natural selection and sexual selection, making it possible for strong selection and its influence on the genome to be demonstrated in wild populations.
Collapse
Affiliation(s)
- Anna Orteu
- Department of Zoology, University of Cambridge, Cambridge, UK.
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
47
|
Exposito-Alonso M, Drost HG, Burbano HA, Weigel D. The Earth BioGenome project: opportunities and challenges for plant genomics and conservation. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 102:222-229. [PMID: 31788877 DOI: 10.1111/tpj.14631] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 11/03/2019] [Accepted: 11/18/2019] [Indexed: 05/28/2023]
Abstract
Sequencing them all. That is the ambitious goal of the recently launched Earth BioGenome project (Proceedings of the National Academy of Sciences of the United States of America, 115, 4325-4333), which aims to produce reference genomes for all eukaryotic species within the next decade. In this perspective, we discuss the opportunities of this project with a plant focus, but highlight also potential limitations. This includes the question of how to best capture all plant diversity, as the green taxon is one of the most complex clades in the tree of life, with over 300 000 species. For this, we highlight four key points: (i) the unique biological insights that could be gained from studying plants, (ii) their apparent underrepresentation in sequencing efforts given the number of threatened species, (iii) the necessity of phylogenomic methods that are aware of differences in genome complexity and quality, and (iv) the accounting for within-species genetic diversity and the historical aspect of conservation genetics.
Collapse
Affiliation(s)
| | - Hajk-Georg Drost
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076, Tübingen, Germany
- The Sainsbury Laboratory, University of Cambridge, 47 Bateman Street, CB2 1LR, Cambridge, UK
| | - Hernán A Burbano
- Centre for Life's Origins and Evolution, Department of Genetics Evolution and Environment, University College London, London, WC1H 0AG, UK
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076, Tübingen, Germany
| |
Collapse
|
48
|
Rice ES, Koren S, Rhie A, Heaton MP, Kalbfleisch TS, Hardy T, Hackett PH, Bickhart DM, Rosen BD, Ley BV, Maurer NW, Green RE, Phillippy AM, Petersen JL, Smith TPL. Continuous chromosome-scale haplotypes assembled from a single interspecies F1 hybrid of yak and cattle. Gigascience 2020; 9:giaa029. [PMID: 32242610 PMCID: PMC7118895 DOI: 10.1093/gigascience/giaa029] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 01/08/2020] [Accepted: 03/10/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The development of trio binning as an approach for assembling diploid genomes has enabled the creation of fully haplotype-resolved reference genomes. Unlike other methods of assembly for diploid genomes, this approach is enhanced, rather than hindered, by the heterozygosity of the individual sequenced. To maximize heterozygosity and simultaneously assemble reference genomes for 2 species, we applied trio binning to an interspecies F1 hybrid of yak (Bos grunniens) and cattle (Bos taurus), 2 species that diverged nearly 5 million years ago. The genomes of both of these species are composed of acrocentric autosomes. RESULTS We produced the most continuous haplotype-resolved assemblies for a diploid animal yet reported. Both the maternal (yak) and paternal (cattle) assemblies have the largest 2 chromosomes in single haplotigs, and more than one-third of the autosomes similarly lack gaps. The maximum length haplotig produced was 153 Mb without any scaffolding or gap-filling steps and represents the longest haplotig reported for any species. The assemblies are also more complete and accurate than those reported for most other vertebrates, with 97% of mammalian universal single-copy orthologs present. CONCLUSIONS The high heterozygosity inherent to interspecies crosses maximizes the effectiveness of the trio binning method. The interspecies trio binning approach we describe is likely to provide the highest-quality assemblies for any pair of species that can interbreed to produce hybrid offspring that develop to sufficient cell numbers for DNA extraction.
Collapse
Affiliation(s)
- Edward S Rice
- Department of Animal Science, University of Nebraska–Lincoln, C203 ANSC, Lincoln, NE 68583, USA
- Bond Life Sciences Center, University of Missouri, 1201 Rollins Street, Columbia, MO 65201, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Michael P Heaton
- US Meat Animal Research Center, US Department of Agriculture, State Spur 18D, Clay Center, NE 68933, USA
| | - Theodore S Kalbfleisch
- Gluck Equine Research Center, University of Kentucky, 1400 Nicholasville Rd., Lexington, KY 40546, USA
| | | | | | - Derek M Bickhart
- Dairy Forage Research Center, 1925 Linden Drive, ARS USDA, Madison, WI 53706, USA
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, 10300 Baltimore Ave., ARS USDA, Beltsville, MD 20705, USA
| | - Brian Vander Ley
- Great Plains Veterinary Educational Center, School of Veterinary Medicine and Biomedical Sciences, University of Nebraska–Lincoln, 820 Road 313, Clay Center, NE 68933, USA
| | - Nicholas W Maurer
- Department of Biomolecular Engineering, University of California, 1156 High St., Santa Cruz, CA 95064, USA
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, 1156 High St., Santa Cruz, CA 95064, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Jessica L Petersen
- Department of Animal Science, University of Nebraska–Lincoln, C203 ANSC, Lincoln, NE 68583, USA
| | - Timothy P L Smith
- US Meat Animal Research Center, US Department of Agriculture, State Spur 18D, Clay Center, NE 68933, USA
| |
Collapse
|
49
|
Choo LQ, Bal TMP, Choquet M, Smolina I, Ramos-Silva P, Marlétaz F, Kopp M, Hoarau G, Peijnenburg KTCA. Novel genomic resources for shelled pteropods: a draft genome and target capture probes for Limacina bulimoides, tested for cross-species relevance. BMC Genomics 2020; 21:11. [PMID: 31900119 PMCID: PMC6942316 DOI: 10.1186/s12864-019-6372-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 12/05/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Pteropods are planktonic gastropods that are considered as bio-indicators to monitor impacts of ocean acidification on marine ecosystems. In order to gain insight into their adaptive potential to future environmental changes, it is critical to use adequate molecular tools to delimit species and population boundaries and to assess their genetic connectivity. We developed a set of target capture probes to investigate genetic variation across their large-sized genome using a population genomics approach. Target capture is less limited by DNA amount and quality than other genome-reduced representation protocols, and has the potential for application on closely related species based on probes designed from one species. RESULTS We generated the first draft genome of a pteropod, Limacina bulimoides, resulting in a fragmented assembly of 2.9 Gbp. Using this assembly and a transcriptome as a reference, we designed a set of 2899 genome-wide target capture probes for L. bulimoides. The set of probes includes 2812 single copy nuclear targets, the 28S rDNA sequence, ten mitochondrial genes, 35 candidate biomineralisation genes, and 41 non-coding regions. The capture reaction performed with these probes was highly efficient with 97% of the targets recovered on the focal species. A total of 137,938 single nucleotide polymorphism markers were obtained from the captured sequences across a test panel of nine individuals. The probes set was also tested on four related species: L. trochiformis, L. lesueurii, L. helicina, and Heliconoides inflatus, showing an exponential decrease in capture efficiency with increased genetic distance from the focal species. Sixty-two targets were sufficiently conserved to be recovered consistently across all five species. CONCLUSION The target capture protocol used in this study was effective in capturing genome-wide variation in the focal species L. bulimoides, suitable for population genomic analyses, while providing insights into conserved genomic regions in related species. The present study provides new genomic resources for pteropods and supports the use of target capture-based protocols to efficiently characterise genomic variation in small non-model organisms with large genomes.
Collapse
Affiliation(s)
- Le Qin Choo
- Marine Biodiversity, Naturalis Biodiversity Center, Leiden, The Netherlands.
- Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam, The Netherlands.
| | - Thijs M P Bal
- Faculty of Biosciences and Aquaculture, Nord University, Bodø, Norway
| | - Marvin Choquet
- Faculty of Biosciences and Aquaculture, Nord University, Bodø, Norway
| | - Irina Smolina
- Faculty of Biosciences and Aquaculture, Nord University, Bodø, Norway
| | - Paula Ramos-Silva
- Marine Biodiversity, Naturalis Biodiversity Center, Leiden, The Netherlands
| | - Ferdinand Marlétaz
- Molecular Genetics Unit, Okinawa Institute of Science and Technology, Onna-son, Japan
| | - Martina Kopp
- Faculty of Biosciences and Aquaculture, Nord University, Bodø, Norway
| | - Galice Hoarau
- Faculty of Biosciences and Aquaculture, Nord University, Bodø, Norway
| | - Katja T C A Peijnenburg
- Marine Biodiversity, Naturalis Biodiversity Center, Leiden, The Netherlands.
- Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam, The Netherlands.
| |
Collapse
|
50
|
Dhar R, Seethy A, Pethusamy K, Singh S, Rohil V, Purkayastha K, Mukherjee I, Goswami S, Singh R, Raj A, Srivastava T, Acharya S, Rajashekhar B, Karmakar S. De novo assembly of the Indian blue peacock (Pavo cristatus) genome using Oxford Nanopore technology and Illumina sequencing. Gigascience 2019; 8:5488106. [PMID: 31077316 PMCID: PMC6511069 DOI: 10.1093/gigascience/giz038] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Revised: 09/30/2018] [Accepted: 03/18/2019] [Indexed: 01/23/2023] Open
Abstract
Background The Indian peafowl (Pavo cristanus) is native to South Asia and is the national bird of India. Here we present a draft genome sequence of the male blue peacock using Illumina and Oxford Nanopore technology (ONT). Results ONT sequencing gave ∼2.3-fold sequencing coverage, whereas Illumina generated 150–base pair paired-end sequence data at 284.6-fold coverage from 5 libraries. Subsequently, we generated a 0.915-gigabase pair de novo assembly of the peacock genome with a scaffold N50 of 0.23 megabase pairs (Mb). We predict that the peacock genome contains 23,153 protein-coding genes and 75.3 Mb (7.33%) of repetitive sequences. Conclusions We report a high-quality assembly of the peacock genome using a hybrid approach of sequences generated by both Illumina and ONT. The long-read chemistry generated by ONT was useful for addressing challenges related to de novo assembly, particularly at regions containing repetitive sequences spanning longer than the read length, and which could not be resolved with only short-read–based assembly. Contig assembly of Illumina short reads gave an N50 of 1,639 bases, whereas with ONT, the N50 increased by >9-fold to 14,749 bases. The initial contig assembly based on Illumina sequencing reads alone gave 685,241 contigs. Further scaffolding on assembled contigs using both Illumina and ONT sequencing reads resulted in a final assembly of 15,025 super-scaffolds, with an N50 of ∼0.23 Mb. Ninety-five percent of proteins predicted by homology matched with those in a public repository, verifying the completeness of our assembly. Like other phylogenetic studies of avian conserved genes, we found P. cristatus to be most closely related to Gallus gallus, followed by Meleagris gallopavo and Anas platyrhynchos. Compared with the recently published peacock genome assembly, the current, superior, hybrid assembly has greater sequencing depth, fewer non-ATGC sequences, and fewer scaffolds.
Collapse
Affiliation(s)
- Ruby Dhar
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Ashikh Seethy
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Karthikeyan Pethusamy
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Sunil Singh
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Vishwajeet Rohil
- Vallabhbhai Patel Chest Institute (VPCI), Delhi University, New Delhi 110007, India
| | - Kakali Purkayastha
- Vallabhbhai Patel Chest Institute (VPCI), Delhi University, New Delhi 110007, India
| | - Indrani Mukherjee
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Sandeep Goswami
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Rakesh Singh
- Kanpur Zoo, Hastings Ave, Azad Nagar, Nawabganj, Kanpur, Uttar Pradesh 208002, India
| | - Ankita Raj
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Tryambak Srivastava
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Sovon Acharya
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Balaji Rajashekhar
- Institute of Computer Science, University of Tartu, J. Liivi, Tartu 50409, Estonia.,Celixa, 19/1 Sankey Road, Bangalore 560020, India
| | - Subhradip Karmakar
- Department of Biochemistry, Room 3020, AIIMS - All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| |
Collapse
|