1
|
Madrigal G, Minhas BF, Catchen J. Klumpy: A tool to evaluate the integrity of long-read genome assemblies and illusive sequence motifs. Mol Ecol Resour 2025; 25:e13982. [PMID: 38800997 PMCID: PMC11646305 DOI: 10.1111/1755-0998.13982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 05/13/2024] [Indexed: 05/29/2024]
Abstract
The improvement and decreasing costs of third-generation sequencing technologies has widened the scope of biological questions researchers can address with de novo genome assemblies. With the increasing number of reference genomes, validating their integrity with minimal overhead is vital for establishing confident results in their applications. Here, we present Klumpy, a tool for detecting and visualizing both misassembled regions in a genome assembly and genetic elements (e.g. genes) of interest in a set of sequences. By leveraging the initial raw reads in combination with their respective genome assembly, we illustrate Klumpy's utility by investigating antifreeze glycoprotein (afgp) loci across two icefishes, by searching for a reported absent gene in the northern snakehead fish, and by scanning the reference genomes of a mudskipper and bumblebee for misassembled regions. In the two former cases, we were able to provide support for the noncanonical placement of an afgp locus in the icefishes and locate the missing snakehead gene. Furthermore, our genome scans were able identify an unmappable locus in the mudskipper reference genome and identify a putative repetitive element shared among several species of bees.
Collapse
Affiliation(s)
- Giovanni Madrigal
- Department of Evolution, Ecology, and BehaviorUniversity of Illinois at Urbana‐ChampaignUrbanaIllinoisUSA
| | - Bushra Fazal Minhas
- Informatics ProgramUniversity of Illinois at Urbana‐ChampaignUrbanaIllinoisUSA
| | - Julian Catchen
- Department of Evolution, Ecology, and BehaviorUniversity of Illinois at Urbana‐ChampaignUrbanaIllinoisUSA
- Informatics ProgramUniversity of Illinois at Urbana‐ChampaignUrbanaIllinoisUSA
| |
Collapse
|
2
|
Li A, Zhao J, Dai H, Zhao M, Zhang M, Wang W, Zhang G, Li L. Chromosome-level genome assembly of the Suminoe oyster Crassostrea ariakensis in south China. Sci Data 2024; 11:1296. [PMID: 39604404 PMCID: PMC11603178 DOI: 10.1038/s41597-024-04145-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 11/13/2024] [Indexed: 11/29/2024] Open
Abstract
The Suminoe oyster Crassostrea ariakensis (Fujita, 1913) is one of the most important ecological and fishery bivalve mollusks with a worldwide distribution. Here, we reported an improved high-quality chromosomal-level genome assembly of C. ariakensis inhabiting the South China Sea, using Nanopore technology, Illumina sequencing, and high-throughput chromosomal conformation capture analysis. The assembled genome size is 631.73 Mb, with contig N50 length of 5.36 Mb and scaffold N50 length of 61.15 Mb, and is assigned to 10 chromosomes. A total of 29,357 protein-coding genes are predicted, 96.68% of which are functionally annotated. The genome contains 347.11 Mb (54.94%) of repetitive elements and 1130 noncoding RNAs. This improved genome assembly of south C. ariakensis is an important resource for understanding oyster diversity and evolution, and provides insights into genetic improvement, protection and management of oyster resource.
Collapse
Affiliation(s)
- Ao Li
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266100, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jinlong Zhao
- Qingdao Agricultural University, Qingdao, 266109, China
| | - He Dai
- Biomarker Technologies Corporation, Beijing, 101301, China
| | - Mingjie Zhao
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Mengshi Zhang
- Qingdao Agricultural University, Qingdao, 266109, China
| | - Wei Wang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- National and Local Joint Engineering Key Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Shandong Center of Technology Innovation for Oyster Seed Industry, Qingdao, 266000, China
| | - Guofan Zhang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266100, China.
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.
- National and Local Joint Engineering Key Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.
- Shandong Center of Technology Innovation for Oyster Seed Industry, Qingdao, 266000, China.
| | - Li Li
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- National and Local Joint Engineering Key Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.
- Shandong Center of Technology Innovation for Oyster Seed Industry, Qingdao, 266000, China.
| |
Collapse
|
3
|
Zhou C, Li J, Duan Y, Fu S, Li H, Zhou Y, Gao H, Zhou X, Liu H, Lei L, Chen J, Yuan D. Genome sequencing and transcriptome analysis provide an insight into the hypoxia resistance of Channa asiatica. Int J Biol Macromol 2024; 282:137306. [PMID: 39515710 DOI: 10.1016/j.ijbiomac.2024.137306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 11/02/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024]
Abstract
Channa asiatica is an economically valuable fish species and excellent model for studying hypoxic tolerance. However, the underlying genetic and molecular mechanisms are poorly understood. In this study, we assembled a high-quality C. asiatica genome (23 chromosomes, totaling 722 Mb) using a combination of Illumina short-read, PacBio long-read, and Hi-C sequencing. Repetitive elements accounted for 28.39%of the C. asiatica genome, and 23,949 protein-coding genes were predicted, with 96.63 % of these functionally annotated. Moreover, a comparative genomic analysis of 12 fish genomes showed that gene families associated with oxygen binding and transport were expanded in C. asiatica. In addition, transcriptome analysis revealed that multiple oxidative stress pathways were activated when C. asiatica was exposed to air. In conclusion, this study provided high-quality genome assembly and transcriptome data, both serving as critical resources for researching the genetic basis of hypoxic tolerance in C. asiatica.
Collapse
Affiliation(s)
- Chaowei Zhou
- College of Fisheries, Southwest University, Chongqing 402460, China; Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China
| | - Junting Li
- College of Fisheries, Southwest University, Chongqing 402460, China; Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China
| | - Yuting Duan
- College of Fisheries, Southwest University, Chongqing 402460, China; Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China
| | - Suxing Fu
- College of Fisheries, Southwest University, Chongqing 402460, China; Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China
| | - Hejiao Li
- College of Fisheries, Southwest University, Chongqing 402460, China; Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China
| | - Yinhua Zhou
- College of Fisheries, Southwest University, Chongqing 402460, China; Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China
| | - He Gao
- Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China; Key Laboratory of Aquatic Science of Chongqing, College of Life Sciences, Southwest University, Chongqing 400715, China
| | - Xinghua Zhou
- College of Fisheries, Southwest University, Chongqing 402460, China; Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China
| | - Haiping Liu
- Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China; Key Laboratory of Aquatic Science of Chongqing, College of Life Sciences, Southwest University, Chongqing 400715, China
| | - Luo Lei
- College of Fisheries, Southwest University, Chongqing 402460, China; Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China.
| | - Jie Chen
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China.
| | - Dengyue Yuan
- College of Fisheries, Southwest University, Chongqing 402460, China; Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education, Key Laboratory of Aquatics Science of Chongqing, Chongqing 400700, China.
| |
Collapse
|
4
|
Bruna P, Núñez-Montero K, Contreras MJ, Leal K, García M, Abanto M, Barrientos L. Biosynthetic gene clusters with biotechnological applications in novel Antarctic isolates from Actinomycetota. Appl Microbiol Biotechnol 2024; 108:325. [PMID: 38717668 PMCID: PMC11078813 DOI: 10.1007/s00253-024-13154-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/11/2024] [Accepted: 04/19/2024] [Indexed: 05/12/2024]
Abstract
Actinomycetota have been widely described as valuable sources for the acquisition of secondary metabolites. Most microbial metabolites are produced via metabolic pathways encoded by biosynthetic gene clusters (BGCs). Although many secondary metabolites are not essential for the survival of bacteria, they play an important role in their adaptation and interactions within microbial communities. This is how bacteria isolated from extreme environments such as Antarctica could facilitate the discovery of new BGCs with biotechnological potential. This study aimed to isolate rare Actinomycetota strains from Antarctic soil and sediment samples and identify their metabolic potential based on genome mining and exploration of biosynthetic gene clusters. To this end, the strains were sequenced using Illumina and Oxford Nanopore Technologies platforms. The assemblies were annotated and subjected to phylogenetic analysis. Finally, the BGCs present in each genome were identified using the antiSMASH tool, and the biosynthetic diversity of the Micrococcaceae family was evaluated. Taxonomic annotation revealed that seven strains were new and two were previously reported in the NCBI database. Additionally, BGCs encoding type III polyketide synthases (T3PKS), beta-lactones, siderophores, and non-ribosomal peptide synthetases (NRPS) have been identified, among others. In addition, the sequence similarity network showed a predominant type of BGCs in the family Micrococcaceae, and some genera were distinctly grouped. The BGCs identified in the isolated strains could be associated with applications such as antimicrobials, anticancer agents, and plant growth promoters, among others, positioning them as excellent candidates for future biotechnological applications and innovations. KEY POINTS: • Novel Antarctic rare Actinomycetota strains were isolated from soil and sediments • Genome-based taxonomic affiliation revealed seven potentially novel species • Genome mining showed metabolic potential for novel natural products.
Collapse
Affiliation(s)
- Pablo Bruna
- Programa de Doctorado en Ciencias mención Biología Celular y Molecular Aplicada, Universidad de La Frontera, Temuco, Chile
- Núcleo Científico y Tecnológico en Biorecursos (BIOREN), Universidad de La Frontera, Avenida Francisco Salazar, 01145, Temuco, Chile
| | - Kattia Núñez-Montero
- Facultad de Ciencias de la Salud, Instituto de Ciencias Aplicadas, Universidad Autónoma de Chile, Avenida Alemania 1090, Temuco, Chile
- Centro de Investigación en Biotecnología, Departamento de Biología, Instituto Tecnológico de Costa Rica, Cartago, Costa Rica
| | - María José Contreras
- Facultad de Ingeniería, Instituto de Ciencias Aplicadas, Universidad Autónoma de Chile, Avenida Alemania 1090, Temuco, Chile
| | - Karla Leal
- Facultad de Ingeniería, Instituto de Ciencias Aplicadas, Universidad Autónoma de Chile, Avenida Alemania 1090, Temuco, Chile
| | - Matías García
- Programa de Doctorado en Ciencias mención Biología Celular y Molecular Aplicada, Universidad de La Frontera, Temuco, Chile
- Núcleo Científico y Tecnológico en Biorecursos (BIOREN), Universidad de La Frontera, Avenida Francisco Salazar, 01145, Temuco, Chile
- Biocontrol Research Laboratory, Facultad de Ciencias Agropecuarias y Medioambiente, Universidad de La Frontera, Temuco, Chile
| | - Michel Abanto
- Núcleo Científico y Tecnológico en Biorecursos (BIOREN), Universidad de La Frontera, Avenida Francisco Salazar, 01145, Temuco, Chile.
| | - Leticia Barrientos
- Facultad de Ciencias de la Salud, Instituto de Ciencias Aplicadas, Universidad Autónoma de Chile, Avenida Alemania 1090, Temuco, Chile.
| |
Collapse
|
5
|
Hu J, Wang Z, Sun Z, Hu B, Ayoola AO, Liang F, Li J, Sandoval JR, Cooper DN, Ye K, Ruan J, Xiao CL, Wang D, Wu DD, Wang S. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol 2024; 25:107. [PMID: 38671502 PMCID: PMC11046930 DOI: 10.1186/s13059-024-03252-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.
Collapse
Affiliation(s)
- Jiang Hu
- GrandOmics Biosciences, Beijing, 102206, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Zhuo Wang
- GrandOmics Biosciences, Beijing, 102206, China
| | - Zongyi Sun
- GrandOmics Biosciences, Beijing, 102206, China
| | - Benxia Hu
- Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Adeola Oluwakemi Ayoola
- Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Fan Liang
- GrandOmics Biosciences, Beijing, 102206, China
| | - Jingjing Li
- GrandOmics Biosciences, Beijing, 102206, China
| | - José R Sandoval
- Centro de Investigación de Genética y Biología Molecular (CIGBM), Instituto de Investigación, Facultad de Medicina, Universidad de San Martín de Porres, Lima, 15102, Peru
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Chuan-Le Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, #7 Jinsui Road, Tianhe District, Guangzhou, China
| | - Depeng Wang
- GrandOmics Biosciences, Beijing, 102206, China.
| | - Dong-Dong Wu
- Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
- Kunming Primate Research Center, and National Research Facility for Phenotypic and Genetic Analysis of Model Animals (Primate Facility), National Resource Center for Non-Human Primates, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650107, China.
- Kunming Natural History Museum of Zoology, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
| | - Sheng Wang
- Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
- Yunnan Key Laboratory of Biodiversity Information, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
6
|
Huang Z, Liu Q, Zeng X, Ni G. High-quality chromosome-level genome assembly of the Northern Pacific sea star Asterias amurensis. DNA Res 2024; 31:dsae007. [PMID: 38416146 PMCID: PMC11090083 DOI: 10.1093/dnares/dsae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/17/2024] [Accepted: 02/26/2024] [Indexed: 02/29/2024] Open
Abstract
Asterias amurensis, a starfish species that is native to countries such as China and Japan, as well as non-native regions like Australia, has raised serious concerns in terms of its impact on ecology and economy. To gain a better understanding of its population genomics and dynamics, we successfully assembled a high-quality chromosome-level genome of A. amurensis using PacBio and Hi-C sequencing technologies. A total of 87 scaffolds assembly with contig N50 length of 10.85 Mb and scaffold N50 length of 23.34 Mb were obtained, with over 98.80% (0.48 Gb) of them anchored to 22 pseudochromosomes. We predicted 16,673 protein-coding genes, 95.19% of which were functionally annotated. Our phylogenetic analysis revealed that A. amurensis and Asterias rubens formed a clade, and their divergence time was estimated ~ 28 million years ago (Mya). The significantly enriched pathways and Gene Ontology terms related to the amplified gene family were mainly associated with immune response and energy metabolism, suggesting that these factors might have contributed to the adaptability of A. amurensis to its environment. This study provides valuable genomic resources for comprehending the genetics, dynamics, and evolution of A. amurensis, especially when population outbreaks or invasions occur.
Collapse
Affiliation(s)
- Zhichao Huang
- Ministry of Education Key Laboratory of Mariculture, Ocean University of China, Qingdao 266003, China
| | - Qi Liu
- Wuhan Onemore-tech Co., Ltd, Wuhan 430000, China
| | - Xiaoqi Zeng
- Ministry of Education Key Laboratory of Mariculture, Ocean University of China, Qingdao 266003, China
- Institute of Evolution and Marine Biodiversity, Ocean University of China, Qingdao 266003, China
| | - Gang Ni
- Ministry of Education Key Laboratory of Mariculture, Ocean University of China, Qingdao 266003, China
| |
Collapse
|
7
|
Mochizuki T, Sakamoto M, Tanizawa Y, Nakayama T, Tanifuji G, Kamikawa R, Nakamura Y. A practical assembly guideline for genomes with various levels of heterozygosity. Brief Bioinform 2023; 24:bbad337. [PMID: 37798248 PMCID: PMC10555665 DOI: 10.1093/bib/bbad337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/06/2023] [Accepted: 09/03/2023] [Indexed: 10/07/2023] Open
Abstract
Although current long-read sequencing technologies have a long-read length that facilitates assembly for genome reconstruction, they have high sequence errors. While various assemblers with different perspectives have been developed, no systematic evaluation of assemblers with long reads for diploid genomes with varying heterozygosity has been performed. Here, we evaluated a series of processes, including the estimation of genome characteristics such as genome size and heterozygosity, de novo assembly, polishing, and removal of allelic contigs, using six genomes with various heterozygosity levels. We evaluated five long-read-only assemblers (Canu, Flye, miniasm, NextDenovo and Redbean) and five hybrid assemblers that combine short and long reads (HASLR, MaSuRCA, Platanus-allee, SPAdes and WENGAN) and proposed a concrete guideline for the construction of haplotype representation according to the degree of heterozygosity, followed by polishing and purging haplotigs, using stable and high-performance assemblers: Redbean, Flye and MaSuRCA.
Collapse
Affiliation(s)
| | - Mika Sakamoto
- Genome Informatics Laboratory, National Institute of Genetics
| | | | - Takuro Nakayama
- Division of Life Sciences Center for Computational Sciences, University of Tsukuba, Japan
| | - Goro Tanifuji
- Department of Zoology, National Museum of Nature and Science
| | | | | |
Collapse
|
8
|
Lu N, Qiao Y, An P, Luo J, Bi C, Li M, Lu Z, Tu J. Exploration of whole genome amplification generated chimeric sequences in long-read sequencing data. Brief Bioinform 2023; 24:bbad275. [PMID: 37529913 DOI: 10.1093/bib/bbad275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/21/2023] [Accepted: 07/10/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION Multiple displacement amplification (MDA) has become the most commonly used method of whole genome amplification, generating a vast amount of DNA with higher molecular weight and greater genome coverage. Coupling with long-read sequencing, it is possible to sequence the amplicons of over 20 kb in length. However, the formation of chimeric sequences (chimeras, expressed as structural errors in sequencing data) in MDA seriously interferes with the bioinformatics analysis but its influence on long-read sequencing data is unknown. RESULTS We sequenced the phi29 DNA polymerase-mediated MDA amplicons on the PacBio platform and analyzed chimeras within the generated data. The 3rd-ChimeraMiner has been constructed as a pipeline for recognizing and restoring chimeras into the original structures in long-read sequencing data, improving the efficiency of using TGS data. Five long-read datasets and one high-fidelity long-read dataset with various amplification folds were analyzed. The result reveals that the mis-priming events in amplification are more frequently occurring than widely perceived, and the propor tion gradually accumulates from 42% to over 78% as the amplification continues. In total, 99.92% of recognized chimeric sequences were demonstrated to be artifacts, whose structures were wrongly formed in MDA instead of existing in original genomes. By restoring chimeras to their original structures, the vast majority of supplementary alignments that introduce false-positive structural variants are recycled, removing 97% of inversions on average and contributing to the analysis of structural variation in MDA-amplified samples. The impact of chimeras in long-read sequencing data analysis should be emphasized, and the 3rd-ChimeraMiner can help to quantify and reduce the influence of chimeras. AVAILABILITY AND IMPLEMENTATION The 3rd-ChimeraMiner is available on GitHub, https://github.com/dulunar/3rdChimeraMiner.
Collapse
Affiliation(s)
- Na Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Yi Qiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Pengfei An
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
- Monash University-Southeast University Joint Research Institute, Suzhou 215123, China
| | - Jiajian Luo
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Changwei Bi
- College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China
| | - Musheng Li
- Department of Physiology and Cell Biology, University of Nevada, Reno School of Medicine, Reno, NV 89511, USA
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Jing Tu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
9
|
Espinosa E, Bautista R, Fernandez I, Larrosa R, Zapata EL, Plata O. Comparing assembly strategies for third-generation sequencing technologies across different genomes. Genomics 2023; 115:110700. [PMID: 37598732 DOI: 10.1016/j.ygeno.2023.110700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 08/07/2023] [Accepted: 08/16/2023] [Indexed: 08/22/2023]
Abstract
The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.
Collapse
Affiliation(s)
- Elena Espinosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| | - Rocio Bautista
- Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Ivan Fernandez
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, C. Jordi Girona, 1-3, Barcelona 08034, Spain.
| | - Rafael Larrosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Emilio L Zapata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Oscar Plata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| |
Collapse
|
10
|
Glick L, Mayrose I. The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes. Genome Biol Evol 2023; 15:evad121. [PMID: 37401440 PMCID: PMC10340445 DOI: 10.1093/gbe/evad121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 06/21/2023] [Accepted: 06/28/2023] [Indexed: 07/05/2023] Open
Abstract
Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple genomes to one another, thereby enabling the detection of genomic sequences and genes not present in the reference, as well as the analysis of gene content diversity. Although multiple studies describing PGs of various plant species have been published in recent years, a better understanding regarding the effect of the computational procedures used for PG construction could guide researchers in making more informed methodological decisions. Here, we examine the effect of several key methodological factors on the obtained gene pool and on gene presence-absence detections by constructing and comparing multiple PGs of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published PGs. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe substantial differences between PGs constructed using three common procedures (de novo assembly and annotation, map-to-pan, and iterative assembly) and that results are dependent on the extent of the input data. Specifically, we report low agreement between the gene content inferred using different procedures and input data. Our results should increase the awareness of the community to the consequences of methodological decisions made during the process of PG construction and emphasize the need for further investigation of commonly applied methodologies.
Collapse
Affiliation(s)
- Lior Glick
- Department of Life Sciences, School of Plant Sciences and Food Security, Tel-Aviv University, Tel Aviv, Israel
| | - Itay Mayrose
- Department of Life Sciences, School of Plant Sciences and Food Security, Tel-Aviv University, Tel Aviv, Israel
| |
Collapse
|
11
|
Mi X, Yang C, Qiao D, Tang M, Guo Y, Liang S, Li Y, Chen Z, Chen J. De novo full length transcriptome analysis of a naturally caffeine-free tea plant reveals specificity in secondary metabolic regulation. Sci Rep 2023; 13:6015. [PMID: 37045909 PMCID: PMC10097665 DOI: 10.1038/s41598-023-32435-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 03/28/2023] [Indexed: 04/14/2023] Open
Abstract
Tea plants are crops with economic, health and cultural value. Catechin, caffeine and theanine are the main secondary metabolites of taste. In the process of germplasm collection, we found a resource in the Sandu Aquatic Autonomous County of Guizhou (SDT) that possessed significantly different characteristic metabolites compared with the cultivar 'Qiancha 1'. SDT is rich in theobromine and theophylline, possesses low levels of (-)-epicatechin-3-gallate, (-)-epigallocatechin-3-gallate, and theanine content, and is almost free of caffeine. However, research on this tea resource is limited. Full-length transcriptome analysis was performed to investigate the transcriptome and gene expression of these metabolites. In total, 78,809 unique transcripts were obtained, of which 65,263 were complete coding sequences. RNA-seq revealed 3415 differentially expressed transcripts in the tender leaves of 'Qiancha 1' and 'SDT'. Furthermore, 2665, 6231, and 2687 differentially expressed transcripts were found in different SDT tissues. These differentially expressed transcripts were enriched in flavonoid and amino acid metabolism processes. Co-expression network analysis identified five modules associated with metabolites and found that genes of caffeine synthase (TCS) may be responsible for the low caffeine content in SDT. Phenylalanine ammonia lyase (PAL), glutamine synthetase (GS), glutamate synthase (GOGAT), and arginine decarboxylase (ADC) play important roles in the synthesis of catechin and theanine. In addition, we identified that ethylene resposive factor (ERF) and WRKY transcription factors may be involved in theanine biosynthesis. Overall, our study provides candidate genes to improve understanding of the synthesis mechanisms of these metabolites and provides a basis for molecular breeding of tea plant.
Collapse
Affiliation(s)
- Xiaozeng Mi
- Tea Research Institute, Guizhou Academy of Agricultural Sciences, 1 Jin'nong Road, Guiyang, 550006, Guizhou, China
| | - Chun Yang
- Tea Research Institute, Guizhou Academy of Agricultural Sciences, 1 Jin'nong Road, Guiyang, 550006, Guizhou, China
| | - Dahe Qiao
- Tea Research Institute, Guizhou Academy of Agricultural Sciences, 1 Jin'nong Road, Guiyang, 550006, Guizhou, China
| | - Mengsha Tang
- Tea Research Institute, Guizhou Academy of Agricultural Sciences, 1 Jin'nong Road, Guiyang, 550006, Guizhou, China
| | - Yan Guo
- Tea Research Institute, Guizhou Academy of Agricultural Sciences, 1 Jin'nong Road, Guiyang, 550006, Guizhou, China
| | - Sihui Liang
- Tea Research Institute, Guizhou Academy of Agricultural Sciences, 1 Jin'nong Road, Guiyang, 550006, Guizhou, China
| | - Yan Li
- Tea Research Institute, Guizhou Academy of Agricultural Sciences, 1 Jin'nong Road, Guiyang, 550006, Guizhou, China
| | - Zhengwu Chen
- Tea Research Institute, Guizhou Academy of Agricultural Sciences, 1 Jin'nong Road, Guiyang, 550006, Guizhou, China
| | - Juan Chen
- Tea Research Institute, Guizhou Academy of Agricultural Sciences, 1 Jin'nong Road, Guiyang, 550006, Guizhou, China.
| |
Collapse
|
12
|
Stuart KC, Edwards RJ, Cheng Y, Warren WC, Burt DW, Sherwin WB, Hofmeister NR, Werner SJ, Ball GF, Bateson M, Brandley MC, Buchanan KL, Cassey P, Clayton DF, De Meyer T, Meddle SL, Rollins LA. Transcript- and annotation-guided genome assembly of the European starling. Mol Ecol Resour 2022; 22:3141-3160. [PMID: 35763352 PMCID: PMC9796300 DOI: 10.1111/1755-0998.13679] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 06/10/2022] [Indexed: 01/01/2023]
Abstract
The European starling, Sturnus vulgaris, is an ecologically significant, globally invasive avian species that is also suffering from a major decline in its native range. Here, we present the genome assembly and long-read transcriptome of an Australian-sourced European starling (S. vulgaris vAU), and a second, North American, short-read genome assembly (S. vulgaris vNA), as complementary reference genomes for population genetic and evolutionary characterization. S. vulgaris vAU combined 10× genomics linked-reads, low-coverage Nanopore sequencing, and PacBio Iso-Seq full-length transcript scaffolding to generate a 1050 Mb assembly on 6222 scaffolds (7.6 Mb scaffold N50, 94.6% busco completeness). Further scaffolding against the high-quality zebra finch (Taeniopygia guttata) genome assigned 98.6% of the assembly to 32 putative nuclear chromosome scaffolds. Species-specific transcript mapping and gene annotation revealed good gene-level assembly and high functional completeness. Using S. vulgaris vAU, we demonstrate how the multifunctional use of PacBio Iso-Seq transcript data and complementary homology-based annotation of sequential assembly steps (assessed using a new tool, saaga) can be used to assess, inform, and validate assembly workflow decisions. We also highlight some counterintuitive behaviour in traditional busco metrics, and present buscomp, a complementary tool for assembly comparison designed to be robust to differences in assembly size and base-calling quality. This work expands our knowledge of avian genomes and the available toolkit for assessing and improving genome quality. The new genomic resources presented will facilitate further global genomic and transcriptomic analysis on this ecologically important species.
Collapse
Affiliation(s)
- Katarina C. Stuart
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental SciencesUNSW SydneySydneyNew South WalesAustralia
| | - Richard J. Edwards
- Evolution & Ecology Research Centre, School of Biotechnology and Biomolecular SciencesUNSW SydneySydneyNew South WalesAustralia
| | - Yuanyuan Cheng
- School of Life and Environmental SciencesThe University of Sydney, SydneyNew South WalesAustralia
| | - Wesley C. Warren
- Department of Animal Sciences, Institute for Data Science and InformaticsThe University of MissouriColumbiaMissouriUSA
| | - David W. Burt
- Office of the Deputy Vice‐Chancellor (Research and Innovation)The University of QueenslandBrisbaneAustralia
| | - William B. Sherwin
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental SciencesUNSW SydneySydneyNew South WalesAustralia
| | - Natalie R. Hofmeister
- Department of Ecology and Evolutionary BiologyCornell UniversityNew YorkUSA,Fuller Evolutionary Biology ProgramCornell Lab of OrnithologyNew YorkUSA
| | - Scott J. Werner
- United States Department of Agriculture, Animal and Plant Health Inspection Service, Wildlife ServicesNational Wildlife Research CenterFort CollinsColoradoUSA
| | | | - Melissa Bateson
- Institute of NeuroscienceNewcastle UniversityNewcastle upon TyneUK
| | - Matthew C. Brandley
- Section of Amphibians and ReptilesCarnegie Museum of Natural HistoryPittsburghPennsylvaniaUSA
| | - Katherine L. Buchanan
- School of Life and Environmental SciencesDeakin UniversityWaurn PondsVictoriaAustralia
| | - Phillip Cassey
- Invasion Science & Wildlife Ecology LabUniversity of AdelaideAdelaideAustralia
| | - David F. Clayton
- Department of Genetics & BiochemistryClemson UniversitySouth CarolinaUSA
| | - Tim De Meyer
- Department of Data Analysis & Mathematical Modelling, Faculty of Bioscience EngineeringGhent UniversityGhentBelgium
| | - Simone L. Meddle
- The Roslin Institute, The Royal (Dick) School of Veterinary StudiesThe University of EdinburghMidlothianUK
| | - Lee A. Rollins
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental SciencesUNSW SydneySydneyNew South WalesAustralia,School of Life and Environmental SciencesDeakin UniversityWaurn PondsVictoriaAustralia
| |
Collapse
|
13
|
Zhao L, Shi Y, Lau HCH, Liu W, Luo G, Wang G, Liu C, Pan Y, Zhou Q, Ding Y, Sung JJY, Yu J. Uncovering 1058 Novel Human Enteric DNA Viruses Through Deep Long-Read Third-Generation Sequencing and Their Clinical Impact. Gastroenterology 2022; 163:699-711. [PMID: 35679948 DOI: 10.1053/j.gastro.2022.05.048] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Revised: 05/17/2022] [Accepted: 05/31/2022] [Indexed: 01/26/2023]
Abstract
BACKGROUND & AIMS Lack of viral reference genomes poses a challenge to virome study. We investigated human gut virome and its clinical implication by ultra-deep metagenomic sequencing. METHODS We extracted sufficient viral DNA from human feces for ultra-deep PacBio sequencing (>10 μg) and Illumina sequencing (>1 μg). Upon de novo assembly and 6 stages of strict filtering, viral genomes were generated and validated in 3 cohorts of 2819 published fecal metagenomes. Diagnostic performance of assembled viruses for colorectal cancer were tested in a training cohort and 2 independent validation cohorts. Virus mapping ratio, evolutionary history, and virus status (lytic or temperate) were also examined. RESULTS The mean amount of extracted viral DNA increased by 14-fold compared with previous protocols. We obtained PacBio long reads and Illumina short reads with 290-fold higher depth than previous studies. We assembled and validated 1178 contigs as complete viral genomes, of which 1058 were newly identified. Thirteen viral genomes (398-839 kb) that are longer than the largest bacteriophage found in humans (393 kb) were discovered. Phylogenetic tree was constructed based on Hidden Markov Models alignment scores of 4 conserved viral proteins. Incorporating our assembled genomes into the National Center for Biotechnology Information database improved the mapping ratio of published metagenomes ≤18 times. Lytic viruses (75.9% ± 12.2% of total) were predominantly present in our sample. A biomarker panel of 14 novel viruses could discriminate patients with colorectal cancer from controls with an area under the receiver operating characteristics curve of 0.87 in the training cohort, which was validated with areas under the receiver operating characteristics curve of 0.85 and 0.73 in 2 independent cohorts. CONCLUSIONS We uncovered 1058 novel human gut viruses. These findings can contribute to clinical diagnosis, current viral reference genome, and future virome investigation.
Collapse
Affiliation(s)
- Liuyang Zhao
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Yu Shi
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Harry Cheuk-Hay Lau
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Weixin Liu
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Guangwen Luo
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Guoping Wang
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Changan Liu
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Yasi Pan
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Qiming Zhou
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Yanqiang Ding
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong
| | - Joseph Jao-Yiu Sung
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Jun Yu
- Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong.
| |
Collapse
|
14
|
Patra AK, Kwon YM, Yang Y. Complete gammaproteobacterial endosymbiont genome assembly from a seep tubeworm Lamellibrachia satsuma. J Microbiol 2022; 60:916-927. [DOI: 10.1007/s12275-022-2057-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 05/09/2022] [Accepted: 05/24/2022] [Indexed: 11/27/2022]
|
15
|
Dmitriev AA, Pushkova EN, Melnikova NV. Plant Genome Sequencing: Modern Technologies and Novel Opportunities for Breeding. Mol Biol 2022. [DOI: 10.1134/s0026893322040045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
16
|
Zhou K, Chen Z, Du X, Huang Y, Qin J, Wen L, Pan X, Lin Y. SMRT Sequencing Reveals Candidate Genes and Pathways With Medicinal Value in Cipangopaludina chinensis. Front Genet 2022; 13:881952. [PMID: 35783279 PMCID: PMC9243326 DOI: 10.3389/fgene.2022.881952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 05/26/2022] [Indexed: 12/03/2022] Open
Abstract
Cipangopaludina chinensis is an economically important aquatic snail with high medicinal value. However, molecular biology research on C. chinensis is limited by the lack of a reference genome, so the analysis of its transcripts is an important step to study the regulatory genes of various substances in C. chinensis. Herein, we conducted the first full-length transcriptome analysis of C. chinensis using PacBio single-molecule real-time (SMRT) sequencing technology. We identified a total of 26,312 unigenes with an average length of 2,572 bp, of which the largest number of zf-c2h2 transcription factor families (120,18.24%) were found, and also observed that the majority of the 8,058 SSRs contained 4-7 repeat units, which provided data for subsequent work on snail genetics Subsequently, 91.86% (24,169) of the genes were successfully annotated to the four major databases, while the highest homology was observed with Pomacea canaliculata. Functional annotation revealed that the majority of transcripts were enriched in metabolism, signal transduction and Immune-related pathways, and several candidate genes involved in drug metabolism and immune response were identified (e.g., CYP1A1, CYP2J, CYP2U1, GST, ,PIK3, PDE3A, PRKAG). This study lays a foundation for future molecular biology research and provides a reference for studying genes associated with the medicinal value of C. chinensis.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Yong Lin
- *Correspondence: Xianhui Pan, ; Yong Lin,
| |
Collapse
|
17
|
Analysis of secondary metabolite gene clusters and chitin biosynthesis pathways of Monascus purpureus with high production of pigment and citrinin based on whole-genome sequencing. PLoS One 2022; 17:e0263905. [PMID: 35648754 PMCID: PMC9159588 DOI: 10.1371/journal.pone.0263905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 04/25/2022] [Indexed: 11/19/2022] Open
Abstract
Monascus is a filamentous fungus that is widely used for producing Monascus pigments in the food industry in Southeast Asia. While the development of bioinformatics has helped elucidate the molecular mechanism underlying metabolic engineering of secondary metabolite biosynthesis, the biological information on the metabolic engineering of the morphology of Monascus remains unclear. In this study, the whole genome of M. purpureus CSU-M183 strain was sequenced using combined single-molecule real-time DNA sequencing and next-generation sequencing platforms. The length of the genome assembly was 23.75 Mb in size with a GC content of 49.13%, 69 genomic contigs and encoded 7305 putative predicted genes. In addition, we identified the secondary metabolite biosynthetic gene clusters and the chitin synthesis pathway in the genome of the high pigment-producing M. purpureus CSU-M183 strain. Furthermore, it is shown that the expression levels of most Monascus pigment and citrinin clusters located genes were significantly enhanced via atmospheric room temperature plasma mutagenesis. The results provide a basis for understanding the secondary metabolite biosynthesis, and constructing the metabolic engineering of the morphology of Monascus.
Collapse
|
18
|
Miao X, Yu Y, Zhao Z, Wang Y, Qian X, Wang Y, Li S, Wang C. Chromosome-Level Haplotype Assembly for Equus asinu. Front Genet 2022; 13:738105. [PMID: 35692816 PMCID: PMC9186339 DOI: 10.3389/fgene.2022.738105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open
Abstract
Background: Haplotype provides significant insights into understanding genomes at both individual and population levels. However, research on many non-model organisms is still based on independent genetic variations due to the lack of haplotype.Results: We conducted haplotype assembling for Equus asinu, a non-model organism that plays a vital role in human civilization. We described the hybrid single individual assembled haplotype of the Dezhou donkey based on the high-depth sequencing data from single-molecule real-time sequencing (×30), Illumina short-read sequencing (×211), and high-throughput chromosome conformation capture (×56). We assembled a near-complete haplotype for the high-depth sequenced Dezhou donkey individual and a phased cohort for the resequencing data of the donkey population.Conclusion: Here, we described the complete chromosome-scale haplotype of the Dezhou donkey with more than a 99.7% phase rate. We further phased a cohort of 156 donkeys to form a donkey haplotype dataset with more than 39 million genetic variations.
Collapse
Affiliation(s)
- Xinyao Miao
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR, China
- College of Forensic & Medicine, Xi’an Jiaotong University, Xi’an, China
| | - Yonghan Yu
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR, China
| | - Zicheng Zhao
- Shenzhen Byoryn Technology Co., Ltd., Shenzhen, China
| | - Yinan Wang
- College of Forensic & Medicine, Xi’an Jiaotong University, Xi’an, China
| | - Xiaobo Qian
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Yonghui Wang
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Shengbin Li
- College of Forensic & Medicine, Xi’an Jiaotong University, Xi’an, China
- *Correspondence: Shengbin Li, ; Changfa Wang,
| | - Changfa Wang
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
- *Correspondence: Shengbin Li, ; Changfa Wang,
| |
Collapse
|
19
|
Wierzbicki F, Schwarz F, Cannalonga O, Kofler R. Novel quality metrics allow identifying and generating high-quality assemblies of piRNA clusters. Mol Ecol Resour 2022; 22:102-121. [PMID: 34181811 DOI: 10.1111/1755-0998.13455] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 04/30/2021] [Accepted: 06/14/2021] [Indexed: 12/30/2022]
Abstract
In most animals, it is thought that the proliferation of a transposable element (TE) is stopped when the TE jumps into a piRNA cluster. Despite this central importance, little is known about the composition and the evolutionary dynamics of piRNA clusters. This is largely because piRNA clusters are notoriously difficult to assemble as they are frequently composed of highly repetitive DNA. With long reads, we may finally be able to obtain reliable assemblies of piRNA clusters. Unfortunately, it is unclear how to generate and identify the best assemblies, as many assembly strategies exist and standard quality metrics are ignorant of TEs. To address these problems, we introduce several novel quality metrics that assess: (a) the fraction of completely assembled piRNA clusters, (b) the quality of the assembled clusters and (c) whether an assembly captures the overall TE landscape of an organisms (i.e. the abundance, the number of SNPs and internal deletions of all TE families). The requirements for computing these metrics vary, ranging from annotations of piRNA clusters to consensus sequences of TEs and genomic sequencing data. Using these novel metrics, we evaluate the effect of assembly algorithm, polishing, read length, coverage, residual polymorphisms and finally identify strategies that yield reliable assemblies of piRNA clusters. Based on an optimized approach, we provide assemblies for the two Drosophila melanogaster strains Canton-S and Pi2. About 80% of known piRNA clusters were assembled in both strains. Finally, we demonstrate the generality of our approach by extending our metrics to humans and Arabidopsis thaliana.
Collapse
Affiliation(s)
- Filip Wierzbicki
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Florian Schwarz
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | | | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
20
|
Chahine Z, Le Roch KG. Decrypting the complexity of the human malaria parasite biology through systems biology approaches. FRONTIERS IN SYSTEMS BIOLOGY 2022; 2:940321. [PMID: 37200864 PMCID: PMC10191146 DOI: 10.3389/fsysb.2022.940321] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
The human malaria parasite, Plasmodium falciparum, is a unicellular protozoan responsible for over half a million deaths annually. With a complex life cycle alternating between human and invertebrate hosts, this apicomplexan is notoriously adept at evading host immune responses and developing resistance to all clinically administered treatments. Advances in omics-based technologies, increased sensitivity of sequencing platforms and enhanced CRISPR based gene editing tools, have given researchers access to more in-depth and untapped information about this enigmatic micro-organism, a feat thought to be infeasible in the past decade. Here we discuss some of the most important scientific achievements made over the past few years with a focus on novel technologies and platforms that set the stage for subsequent discoveries. We also describe some of the systems-based methods applied to uncover gaps of knowledge left through single-omics applications with the hope that we will soon be able to overcome the spread of this life-threatening disease.
Collapse
|
21
|
Li B, Zhang X, Liu Z, Wang L, Song L, Liang X, Dou S, Tu J, Shen J, Yi B, Wen J, Fu T, Dai C, Gao C, Wang A, Ma C. Genetic and Molecular Characterization of a Self-Compatible Brassica rapa Line Possessing a New Class II S Haplotype. PLANTS (BASEL, SWITZERLAND) 2021; 10:plants10122815. [PMID: 34961286 PMCID: PMC8709392 DOI: 10.3390/plants10122815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/01/2021] [Accepted: 12/03/2021] [Indexed: 05/20/2023]
Abstract
Most flowering plants have evolved a self-incompatibility (SI) system to maintain genetic diversity by preventing self-pollination. The Brassica species possesses sporophytic self-incompatibility (SSI), which is controlled by the pollen- and stigma-determinant factors SP11/SCR and SRK. However, the mysterious molecular mechanism of SI remains largely unknown. Here, a new class II S haplotype, named BrS-325, was identified in a pak choi line '325', which was responsible for the completely self-compatible phenotype. To obtain the entire S locus sequences, a complete pak choi genome was gained through Nanopore sequencing and de novo assembly, which provided a good reference genome for breeding and molecular research in B. rapa. S locus comparative analysis showed that the closest relatives to BrS-325 was BrS-60, and high sequence polymorphism existed in the S locus. Meanwhile, two duplicated SRKs (BrSRK-325a and BrSRK-325b) were distributed in the BrS-325 locus with opposite transcription directions. BrSRK-325b and BrSCR-325 were expressed normally at the transcriptional level. The multiple sequence alignment of SCRs and SRKs in class II S haplotypes showed that a number of amino acid variations were present in the contact regions (CR II and CR III) of BrSCR-325 and the hypervariable regions (HV I and HV II) of BrSRK-325s, which may influence the binding and interaction between the ligand and the receptor. Thus, these results suggested that amino acid variations in contact sites may lead to the SI destruction of a new class II S haplotype BrS-325 in B. rapa. The complete SC phenotype of '325' showed the potential for practical breeding application value in B. rapa.
Collapse
Affiliation(s)
- Bing Li
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Xueli Zhang
- Wuhan Vegetable Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan 430345, China; (X.Z.); (L.S.)
| | - Zhiquan Liu
- Hunan Vegetable Research Institute, Hunan Academy of Agricultural Science, Changsha 410125, China;
| | - Lulin Wang
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Liping Song
- Wuhan Vegetable Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan 430345, China; (X.Z.); (L.S.)
| | - Xiaomei Liang
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Shengwei Dou
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Jinxing Tu
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Jinxiong Shen
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Bin Yi
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Jing Wen
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Tingdong Fu
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Cheng Dai
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Changbin Gao
- Wuhan Vegetable Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan 430345, China; (X.Z.); (L.S.)
- Correspondence: (C.G.); (A.W.); (C.M.); Tel.: +86-27-8728-18-07 (C.M.)
| | - Aihua Wang
- Wuhan Vegetable Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan 430345, China; (X.Z.); (L.S.)
- Correspondence: (C.G.); (A.W.); (C.M.); Tel.: +86-27-8728-18-07 (C.M.)
| | - Chaozhi Ma
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
- Correspondence: (C.G.); (A.W.); (C.M.); Tel.: +86-27-8728-18-07 (C.M.)
| |
Collapse
|
22
|
Liu J, Wei H, Zhang X, He H, Cheng Y, Wang D. Chromosome-Level Genome Assembly and HazelOmics Database Construction Provides Insights Into Unsaturated Fatty Acid Synthesis and Cold Resistance in Hazelnut ( Corylus heterophylla). FRONTIERS IN PLANT SCIENCE 2021; 12:766548. [PMID: 34956265 PMCID: PMC8695561 DOI: 10.3389/fpls.2021.766548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 11/18/2021] [Indexed: 06/14/2023]
Abstract
Corylus heterophylla (2n = 22) is the most widely distributed, unique, and economically important nut species in China. Chromosome-level genomes of C. avellana, C. heterophylla, and C. mandshurica have been published in 2021, but a satisfactory hazelnut genome database is absent. Northeast China is the main distribution and cultivation area of C. heterophylla, and the mechanism underlying the adaptation of C. heterophylla to extremely low temperature in this area remains unclear. Using single-molecule real-time sequencing and the chromosomal conformational capture (Hi-C) assisted genome assembly strategy, we obtained a high-quality chromosome-scale genome sequence of C. heterophylla, with a total length of 343 Mb and scaffold N50 of 32.88 Mb. A total of 94.72% of the test genes from the assembled genome could be aligned to the Embryophyta_odb9 database. In total, 22,319 protein-coding genes were predicted, and 21,056 (94.34%) were annotated in the assembled genome. A HazelOmics online database (HOD) containing the assembled genome, gene-coding sequences, protein sequences, and various types of annotation information was constructed. This database has a user-friendly and straightforward interface. In total, 439 contracted genes and 3,810 expanded genes were identified through genome evolution analysis, and 17 expanded genes were significantly enriched in the unsaturated fatty acid biosynthesis pathway (ko01040). Transcriptome analysis results showed that FAD (Cor0058010.1), SAD (Cor0141290.1), and KAT (Cor0122500.1) with high expression abundance were upregulated at the ovule maturity stage. We deduced that the expansion of these genes may promote high unsaturated fatty acid content in the kernels and improve the adaptability of C. heterophylla to the cold climate of Northeast China. The reference genome and database will be beneficial for future molecular breeding and gene function studies in this nut species, as well as for evolutionary research on species of the order Fagales.
Collapse
Affiliation(s)
- Jianfeng Liu
- Jilin Provincial Key Laboratory of Plant Resource Science and Green Production, Jilin Normal University, Siping, China
| | - Heng Wei
- Jilin Provincial Key Laboratory of Plant Resource Science and Green Production, Jilin Normal University, Siping, China
| | - Xingzheng Zhang
- Jilin Provincial Key Laboratory of Plant Resource Science and Green Production, Jilin Normal University, Siping, China
| | - Hongli He
- Jilin Provincial Key Laboratory of Plant Resource Science and Green Production, Jilin Normal University, Siping, China
| | - Yunqing Cheng
- Jilin Provincial Key Laboratory of Plant Resource Science and Green Production, Jilin Normal University, Siping, China
| | - Daoming Wang
- Liaoning Economic Forest Research Institute, Dalian, China
| |
Collapse
|
23
|
Li A, Dai H, Guo X, Zhang Z, Zhang K, Wang C, Wang X, Wang W, Chen H, Li X, Zheng H, Li L, Zhang G. Genome of the estuarine oyster provides insights into climate impact and adaptive plasticity. Commun Biol 2021; 4:1287. [PMID: 34773106 PMCID: PMC8590024 DOI: 10.1038/s42003-021-02823-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 10/28/2021] [Indexed: 12/27/2022] Open
Abstract
Understanding the roles of genetic divergence and phenotypic plasticity in adaptation is central to evolutionary biology and important for assessing adaptive potential of species under climate change. Analysis of a chromosome-level assembly and resequencing of individuals across wide latitude distribution in the estuarine oyster (Crassostrea ariakensis) revealed unexpectedly low genomic diversity and population structures shaped by historical glaciation, geological events and oceanographic forces. Strong selection signals were detected in genes responding to temperature and salinity stress, especially of the expanded solute carrier families, highlighting the importance of gene expansion in environmental adaptation. Genes exhibiting high plasticity showed strong selection in upstream regulatory regions that modulate transcription, indicating selection favoring plasticity. Our findings suggest that genomic variation and population structure in marine bivalves are heavily influenced by climate history and physical forces, and gene expansion and selection may enhance phenotypic plasticity that is critical for the adaptation to rapidly changing environments.
Collapse
Affiliation(s)
- Ao Li
- grid.9227.e0000000119573309CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China ,grid.484590.40000 0004 5998 3072Laboratory for Marine Biology and Biotechnology, Pilot National Laboratory for Marine Science and Technology, Qingdao, China
| | - He Dai
- grid.410751.6Biomarker Technologies Corporation, Beijing, China
| | - Ximing Guo
- grid.430387.b0000 0004 1936 8796Haskin Shellfish Research Laboratory, Department of Marine and Coastal Sciences, Rutgers University, Port Norris, NJ USA
| | - Ziyan Zhang
- grid.9227.e0000000119573309CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China ,grid.484590.40000 0004 5998 3072Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology, Qingdao, China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, China
| | - Kexin Zhang
- grid.9227.e0000000119573309CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China ,grid.484590.40000 0004 5998 3072Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology, Qingdao, China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, China
| | - Chaogang Wang
- grid.9227.e0000000119573309CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China ,grid.484590.40000 0004 5998 3072Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology, Qingdao, China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, China
| | - Xinxing Wang
- grid.9227.e0000000119573309CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China ,grid.484590.40000 0004 5998 3072Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology, Qingdao, China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, China
| | - Wei Wang
- grid.9227.e0000000119573309CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China ,grid.484590.40000 0004 5998 3072Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology, Qingdao, China ,grid.9227.e0000000119573309National and Local Joint Engineering Key Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China
| | - Hongju Chen
- grid.410751.6Biomarker Technologies Corporation, Beijing, China
| | - Xumin Li
- grid.410751.6Biomarker Technologies Corporation, Beijing, China
| | - Hongkun Zheng
- grid.410751.6Biomarker Technologies Corporation, Beijing, China
| | - Li Li
- CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China. .,Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology, Qingdao, China. .,University of Chinese Academy of Sciences, Beijing, China. .,National and Local Joint Engineering Key Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China.
| | - Guofan Zhang
- CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China. .,Laboratory for Marine Biology and Biotechnology, Pilot National Laboratory for Marine Science and Technology, Qingdao, China. .,National and Local Joint Engineering Key Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China.
| |
Collapse
|
24
|
Albuquerque P, Ribeiro I, Correia S, Mucha AP, Tamagnini P, Braga-Henriques A, Carvalho MDF, Mendes MV. Complete Genome Sequence of Two Deep-Sea Streptomyces Isolates from Madeira Archipelago and Evaluation of Their Biosynthetic Potential. Mar Drugs 2021; 19:md19110621. [PMID: 34822492 PMCID: PMC8622039 DOI: 10.3390/md19110621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/28/2021] [Accepted: 10/28/2021] [Indexed: 11/22/2022] Open
Abstract
The deep-sea constitutes a true unexplored frontier and a potential source of innovative drug scaffolds. Here, we present the genome sequence of two novel marine actinobacterial strains, MA3_2.13 and S07_1.15, isolated from deep-sea samples (sediments and sponge) and collected at Madeira archipelago (NE Atlantic Ocean; Portugal). The de novo assembly of both genomes was achieved using a hybrid strategy that combines short-reads (Illumina) and long-reads (PacBio) sequencing data. Phylogenetic analyses showed that strain MA3_2.13 is a new species of the Streptomyces genus, whereas strain S07_1.15 is closely related to the type strain of Streptomyces xinghaiensis. In silico analysis revealed that the total length of predicted biosynthetic gene clusters (BGCs) accounted for a high percentage of the MA3_2.13 genome, with several potential new metabolites identified. Strain S07_1.15 had, with a few exceptions, a predicted metabolic profile similar to S. xinghaiensis. In this work, we implemented a straightforward approach for generating high-quality genomes of new bacterial isolates and analyse in silico their potential to produce novel NPs. The inclusion of these in silico dereplication steps allows to minimize the rediscovery rates of traditional natural products screening methodologies and expedite the drug discovery process.
Collapse
Affiliation(s)
- Pedro Albuquerque
- i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal; (P.A.); (P.T.)
- IBMC—Instituto de Biologia Molecular e Celular, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal
| | - Inês Ribeiro
- CIIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos s/n, 4450-208 Matosinhos, Portugal; (I.R.); (S.C.); (A.P.M.); (M.d.F.C.)
- ICBAS—Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, Rua de Jorge Viterbo Ferreira 228, 4050-313 Porto, Portugal
| | - Sofia Correia
- CIIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos s/n, 4450-208 Matosinhos, Portugal; (I.R.); (S.C.); (A.P.M.); (M.d.F.C.)
| | - Ana Paula Mucha
- CIIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos s/n, 4450-208 Matosinhos, Portugal; (I.R.); (S.C.); (A.P.M.); (M.d.F.C.)
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, Edifício FC4, 4169-007 Porto, Portugal
| | - Paula Tamagnini
- i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal; (P.A.); (P.T.)
- IBMC—Instituto de Biologia Molecular e Celular, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, Edifício FC4, 4169-007 Porto, Portugal
| | - Andreia Braga-Henriques
- OOM—Oceanic Observatory of Madeira & MARE—Marine and Environmental Sciences Centre, ARDITI—Agência Regional para o Desenvolvimento da Investigação Tecnologia e Inovação, Caminho da Penteada, 9020-105 Funchal, Portugal;
- Regional Directorate for Fisheries, Regional Secretariat for the Sea and Fisheries, Government of the Azores, Rua Cônsul Dabney—Colónia Alemã, 9900-014 Horta, Portugal
| | - Maria de Fátima Carvalho
- CIIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos s/n, 4450-208 Matosinhos, Portugal; (I.R.); (S.C.); (A.P.M.); (M.d.F.C.)
- ICBAS—Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, Rua de Jorge Viterbo Ferreira 228, 4050-313 Porto, Portugal
| | - Marta V. Mendes
- i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal; (P.A.); (P.T.)
- IBMC—Instituto de Biologia Molecular e Celular, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal
- Correspondence:
| |
Collapse
|
25
|
Tang M, He S, Gong X, Lü P, Taha RH, Chen K. High-Quality de novo Chromosome-Level Genome Assembly of a Single Bombyx mori With BmNPV Resistance by a Combination of PacBio Long-Read Sequencing, Illumina Short-Read Sequencing, and Hi-C Sequencing. Front Genet 2021; 12:718266. [PMID: 34603381 PMCID: PMC8481875 DOI: 10.3389/fgene.2021.718266] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 08/05/2021] [Indexed: 12/17/2022] Open
Abstract
The reference genomes of Bombyx mori (B. mori), Silkworm Knowledge-based database (SilkDB) and SilkBase, have served as the gold standard for nearly two decades. Their use has fundamentally shaped model organisms and accelerated relevant studies on lepidoptera. However, the current reference genomes of B. mori do not accurately represent the full set of genes for any single strain. As new genome-wide sequencing technologies have emerged and the cost of high-throughput sequencing technology has fallen, it is now possible for standard laboratories to perform full-genome assembly for specific strains. Here we present a high-quality de novo chromosome-level genome assembly of a single B. mori with nuclear polyhedrosis virus (BmNPV) resistance through the integration of PacBio long-read sequencing, Illumina short-read sequencing, and Hi-C sequencing. In addition, regular bioinformatics analyses, such as gene family, phylogenetic, and divergence analyses, were performed. The sample was from our unique B. mori species (NB), which has strong inborn resistance to BmNPV. Our genome assembly showed good collinearity with SilkDB and SilkBase and particular regions. To the best of our knowledge, this is the first genome assembly with BmNPV resistance, which should be a more accurate insect model for resistance studies.
Collapse
Affiliation(s)
- Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, China
| | - Suqun He
- School of Life Sciences, Jiangsu University, Zhenjiang, China
| | - Xun Gong
- Institute of Clinical Pharmacology, Anhui Medical University, Hefei, China.,Department of Medical Rheumatology, Columbia University, New York, NY, United States
| | - Peng Lü
- School of Life Sciences, Jiangsu University, Zhenjiang, China
| | - Rehab H Taha
- Department of Sericulture, Plant Protection Research Institute, Agricultural Research Center, Giza, Egypt
| | - Keping Chen
- School of Life Sciences, Jiangsu University, Zhenjiang, China
| |
Collapse
|
26
|
Melnikova NV, Pushkova EN, Dvorianinova EM, Beniaminov AD, Novakovskiy RO, Povkhova LV, Bolsheva NL, Snezhkina AV, Kudryavtseva AV, Krasnov GS, Dmitriev AA. Genome Assembly and Sex-Determining Region of Male and Female Populus × sibirica. FRONTIERS IN PLANT SCIENCE 2021; 12:625416. [PMID: 34567016 PMCID: PMC8455832 DOI: 10.3389/fpls.2021.625416] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 07/30/2021] [Indexed: 06/13/2023]
Abstract
The genus Populus is presented by dioecious species, and it became a promising object to study the genetics of sex in plants. In this work, genomes of male and female Populus × sibirica individuals were sequenced for the first time. To achieve high-quality genome assemblies, we used Oxford Nanopore Technologies and Illumina platforms. A protocol for the isolation of long and pure DNA from young poplar leaves was developed, which enabled us to obtain 31 Gb (N50 = 21 kb) for the male poplar and 23 Gb (N50 = 24 kb) for the female one using the MinION sequencer. Genome assembly was performed with different tools, and Canu provided the most complete and accurate assemblies with a length of 818 Mb (N50 = 1.5 Mb) for the male poplar and 816 Mb (N50 = 0.5 Mb) for the female one. After polishing with Racon and Medaka (Nanopore reads) and then with POLCA (Illumina reads), assembly completeness was 98.45% (87.48% duplicated) for the male and 98.20% (76.77% duplicated) for the female according to BUSCO (benchmarking universal single-copy orthologs). A high proportion of duplicated BUSCO and the increased genome size (about 300 Mb above the expected) pointed at the separation of haplotypes in a large part of male and female genomes of P. × sibirica. Due to this, we were able to identify two haplotypes of the sex-determining region (SDR) in both assemblies; and one of these four SDR haplotypes, in the male genome, contained partial repeats of the ARR17 gene (Y haplotype), while the rest three did not (X haplotypes). The analysis of the male P. × sibirica SDR suggested that the Y haplotype originated from P. nigra, while the X haplotype is close to P. trichocarpa and P. balsamifera species. Moreover, we revealed a Populus-specific repeat that could be involved in translocation of the ARR17 gene or its part to the SDR of P. × sibirica and other Populus species. The obtained results expand our knowledge on SDR features in the genus Populus and poplar phylogeny.
Collapse
Affiliation(s)
- Nataliya V. Melnikova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Elena N. Pushkova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Ekaterina M. Dvorianinova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
| | - Artemy D. Beniaminov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Roman O. Novakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Liubov V. Povkhova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
| | - Nadezhda L. Bolsheva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | | | - Anna V. Kudryavtseva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - George S. Krasnov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Alexey A. Dmitriev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
27
|
Voelker J, Shepherd M, Mauleon R. A high-quality draft genome for Melaleuca alternifolia (tea tree): a new platform for evolutionary genomics of myrtaceous terpene-rich species. GIGABYTE 2021; 2021:gigabyte28. [PMID: 36824337 PMCID: PMC9650293 DOI: 10.46471/gigabyte.28] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 08/05/2021] [Indexed: 11/09/2022] Open
Abstract
The economically important Melaleuca alternifolia (tea tree) is the source of a terpene-rich essential oil with therapeutic and cosmetic uses around the world. Tea tree has been cultivated and bred in Australia since the 1990s. It has been extensively studied for the genetics and biochemistry of terpene biosynthesis. Here, we report a high quality de novo genome assembly using Pacific Biosciences and Illumina sequencing. The genome was assembled into 3128 scaffolds with a total length of 362 Mb (N50 = 1.9 Mb), with significantly higher contiguity than a previous assembly (N50 = 8.7 Kb). Using a homology-based, RNA-seq evidence-based and ab initio prediction approach, 37,226 protein-coding genes were predicted. Genome assembly and annotation exhibited high completeness scores of 98.1% and 89.4%, respectively. Sequence contiguity was sufficient to reveal extensive gene order conservation and chromosomal rearrangements in alignments with Eucalyptus grandis and Corymbia citriodora genomes. This new genome advances currently available resources to investigate the genome structure and gene family evolution of M. alternifolia. It will enable further comparative genomic studies in Myrtaceae to elucidate the genetic foundations of economically valuable traits in this crop.
Collapse
Affiliation(s)
- Julia Voelker
- Faculty of Science and Engineering, Southern Cross University, Military Road, East Lismore NSW 2480, Australia
| | - Mervyn Shepherd
- Faculty of Science and Engineering, Southern Cross University, Military Road, East Lismore NSW 2480, Australia
| | - Ramil Mauleon
- Faculty of Science and Engineering, Southern Cross University, Military Road, East Lismore NSW 2480, Australia
| |
Collapse
|
28
|
Oliver A, Podell S, Pinowska A, Traller JC, Smith SR, McClure R, Beliaev A, Bohutskyi P, Hill EA, Rabines A, Zheng H, Allen LZ, Kuo A, Grigoriev IV, Allen AE, Hazlebeck D, Allen EE. Diploid genomic architecture of Nitzschia inconspicua, an elite biomass production diatom. Sci Rep 2021; 11:15592. [PMID: 34341414 PMCID: PMC8329260 DOI: 10.1038/s41598-021-95106-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 07/14/2021] [Indexed: 01/13/2023] Open
Abstract
A near-complete diploid nuclear genome and accompanying circular mitochondrial and chloroplast genomes have been assembled from the elite commercial diatom species Nitzschia inconspicua. The 50 Mbp haploid size of the nuclear genome is nearly double that of model diatom Phaeodactylum tricornutum, but 30% smaller than closer relative Fragilariopsis cylindrus. Diploid assembly, which was facilitated by low levels of allelic heterozygosity (2.7%), included 14 candidate chromosome pairs composed of long, syntenic contigs, covering 93% of the total assembly. Telomeric ends were capped with an unusual 12-mer, G-rich, degenerate repeat sequence. Predicted proteins were highly enriched in strain-specific marker domains associated with cell-surface adhesion, biofilm formation, and raphe system gliding motility. Expanded species-specific families of carbonic anhydrases suggest potential enhancement of carbon concentration efficiency, and duplicated glycolysis and fatty acid synthesis pathways across cytosolic and organellar compartments may enhance peak metabolic output, contributing to competitive success over other organisms in mixed cultures. The N. inconspicua genome delivers a robust new reference for future functional and transcriptomic studies to illuminate the physiology of benthic pennate diatoms and harness their unique adaptations to support commercial algae biomass and bioproduct production.
Collapse
Affiliation(s)
- Aaron Oliver
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - Sheila Podell
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA.
| | | | | | - Sarah R Smith
- Microbial and Environmental Genomics Group, J. Craig Venter Institute, La Jolla, CA, USA
| | - Ryan McClure
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Alex Beliaev
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Pavlo Bohutskyi
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Eric A Hill
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Ariel Rabines
- Microbial and Environmental Genomics Group, J. Craig Venter Institute, La Jolla, CA, USA
| | - Hong Zheng
- Microbial and Environmental Genomics Group, J. Craig Venter Institute, La Jolla, CA, USA
| | - Lisa Zeigler Allen
- Microbial and Environmental Genomics Group, J. Craig Venter Institute, La Jolla, CA, USA
| | - Alan Kuo
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, USA
| | - Igor V Grigoriev
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, USA.,Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
| | - Andrew E Allen
- Microbial and Environmental Genomics Group, J. Craig Venter Institute, La Jolla, CA, USA
| | | | - Eric E Allen
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA. .,Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA. .,Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
29
|
Peng C, Mei Y, Ding L, Wang X, Chen X, Wang J, Xu J. Using Combined Methods of Genetic Mapping and Nanopore-Based Sequencing Technology to Analyze the Insertion Positions of G10evo-EPSPS and Cry1Ab/Cry2Aj Transgenes in Maize. FRONTIERS IN PLANT SCIENCE 2021; 12:690951. [PMID: 34394143 PMCID: PMC8358107 DOI: 10.3389/fpls.2021.690951] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 06/29/2021] [Indexed: 06/13/2023]
Abstract
The insertion position of the exogenous fragment sequence in a genetically modified organism (GMO) is important for the safety assessment and labeling of GMOs. SK12-5 is a newly developed transgenic maize line transformed with two trait genes [i.e., G10evo-5-enolpyrul-shikimate-3-phosphate synthase (EPSPS) and Cry1Ab/Cry2Aj] that was recently approved for commercial use in China. In this study, we tried to determine the insertion position of the exogenous fragment for SK12-5. The transgene-host left border and right border integration junctions were obtained from SK12-5 genomic DNA by using the thermal asymmetric interlaced polymerase chain reaction (TAIL-PCR) and next-generation Illumina sequencing technology. However, a Basic Local Alignment Search Tool (BLAST) analysis revealed that the flanking sequences in the maize genome are unspecific and that the insertion position is located in a repetitive sequence area in the maize genome. To locate the fine-scale insertion position in SK12-5, we combined the methods of genetic mapping and nanopore-based sequencing technology. From a classical bulked-segregant analysis (BSA), the insertion position in SK12-5 was mapped onto Bin9.03 of chromosome 9 between the simple sequence repeat (SSR) markers umc2337 and umc1743 (26,822,048-100,724,531 bp). The nanopore sequencing results uncovered 10 reads for which one end was mapped onto the vector and the other end was mapped onto the maize genome. These observations indicated that the exogenous T-DNA fragments were putatively integrated at the position from 82,329,568 to 82,379,296 bp of chromosome 9 in the transgenic maize SK12-5. This study is helpful for the safety assessment of the novel transgenic maize SK12-5 and shows that the combined method of genetic mapping and the nanopore-based sequencing technology will be a useful approach for identifying the insertion positions of transgenic sequences in other GM plants with relatively large and complex genomes.
Collapse
Affiliation(s)
- Cheng Peng
- State Key Laboratory Breeding Base for Zhejiang Sustainable Pest and Disease Control, Institute of Agro-Product Safety and Nutrition, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Yingting Mei
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lin Ding
- State Key Laboratory Breeding Base for Zhejiang Sustainable Pest and Disease Control, Institute of Agro-Product Safety and Nutrition, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Xiaofu Wang
- State Key Laboratory Breeding Base for Zhejiang Sustainable Pest and Disease Control, Institute of Agro-Product Safety and Nutrition, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Xiaoyun Chen
- State Key Laboratory Breeding Base for Zhejiang Sustainable Pest and Disease Control, Institute of Agro-Product Safety and Nutrition, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Junmin Wang
- Institute of Crops and Nuclear Technology Utilization, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Junfeng Xu
- State Key Laboratory Breeding Base for Zhejiang Sustainable Pest and Disease Control, Institute of Agro-Product Safety and Nutrition, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| |
Collapse
|
30
|
Bouchemousse S, Falquet L, Müller-Schärer H. Genome Assembly of the Ragweed Leaf Beetle: A Step Forward to Better Predict Rapid Evolution of a Weed Biocontrol Agent to Environmental Novelties. Genome Biol Evol 2021; 12:1167-1173. [PMID: 32428241 PMCID: PMC7486951 DOI: 10.1093/gbe/evaa102] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2020] [Indexed: 12/21/2022] Open
Abstract
Rapid evolution of weed biological control agents (BCAs) to new biotic and abiotic conditions is poorly understood and so far only little considered both in pre-release and post-release studies, despite potential major negative or positive implications for risks of nontargeted attacks or for colonizing yet unsuitable habitats, respectively. Provision of genetic resources, such as assembled and annotated genomes, is essential to assess potential adaptive processes by identifying underlying genetic mechanisms. Here, we provide the first sequenced genome of a phytophagous insect used as a BCA, that is, the leaf beetle Ophraella communa, a promising BCA of common ragweed, recently and accidentally introduced into Europe. A total 33.98 Gb of raw DNA sequences, representing ∼43-fold coverage, were obtained using the PacBio SMRT-Cell sequencing approach. Among the five different assemblers tested, the SMARTdenovo assembly displaying the best scores was then corrected with Illumina short reads. A final genome of 774 Mb containing 7,003 scaffolds was obtained. The reliability of the final assembly was then assessed by benchmarking universal single-copy orthologous genes (>96.0% of the 1,658 expected insect genes) and by remapping tests of Illumina short reads (average of 98.6 ± 0.7% without filtering). The number of protein-coding genes of 75,642, representing 82% of the published antennal transcriptome, and the phylogenetic analyses based on 825 orthologous genes placing O. communa in the monophyletic group of Chrysomelidae, confirm the relevance of our genome assembly. Overall, the genome provides a valuable resource for studying potential risks and benefits of this BCA facing environmental novelties.
Collapse
Affiliation(s)
| | - Laurent Falquet
- Department of Biology, University of Fribourg, Switzerland.,Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | | |
Collapse
|
31
|
Draft genome sequence of the pulse crop blackgram [Vigna mungo (L.) Hepper] reveals potential R-genes. Sci Rep 2021; 11:11247. [PMID: 34045617 PMCID: PMC8160138 DOI: 10.1038/s41598-021-90683-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Accepted: 05/17/2021] [Indexed: 12/29/2022] Open
Abstract
Blackgram [Vigna mungo (L.) Hepper] (2n = 2x = 22), an important Asiatic legume crop, is a major source of dietary protein for the predominantly vegetarian population. Here we construct a draft genome sequence of blackgram, for the first time, by employing hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. The final de novo whole genome of blackgram is ~ 475 Mb (82% of the genome) and has maximum scaffold length of 6.3 Mb with scaffold N50 of 1.42 Mb. Genome analysis identified 42,115 genes with mean coding sequence length of 1131 bp. Around 80.6% of predicted genes were annotated. Nearly half of the assembled sequence is composed of repetitive elements with retrotransposons as major (47.3% of genome) transposable elements, whereas, DNA transposons made up only 2.29% of the genome. A total of 166,014 SSRs, including 65,180 compound SSRs, were identified and primer pairs for 34,816 SSRs were designed. Out of the 33,959 proteins, 1659 proteins showed presence of R-gene related domains. KIN class was found in majority of the proteins (905) followed by RLK (239) and RLP (188). The genome sequence of blackgram will facilitate identification of agronomically important genes and accelerate the genetic improvement of blackgram.
Collapse
|
32
|
Xie L, Wong L. PDR: a new genome assembly evaluation metric based on genetics concerns. Bioinformatics 2021; 37:289-295. [PMID: 32761066 DOI: 10.1093/bioinformatics/btaa704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 06/30/2020] [Accepted: 07/30/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Existing genome assembly evaluation metrics provide only limited insight on specific aspects of genome assembly quality, and sometimes even disagree with each other. For better integrative comparison between assemblies, we propose, here, a new genome assembly evaluation metric, Pairwise Distance Reconstruction (PDR). It derives from a common concern in genetic studies, and takes completeness, contiguity, and correctness into consideration. We also propose an approximation implementation to accelerate PDR computation. RESULTS Our results on publicly available datasets affirm PDR's ability to integratively assess the quality of a genome assembly. In fact, this is guaranteed by its definition. The results also indicated the error introduced by approximation is extremely small and thus negligible. AVAILABILITYAND IMPLEMENTATION https://github.com/XLuyu/PDR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luyu Xie
- Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Limsoon Wong
- Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
33
|
Chen Y, Nie F, Xie SQ, Zheng YF, Dai Q, Bray T, Wang YX, Xing JF, Huang ZJ, Wang DP, He LJ, Luo F, Wang JX, Liu YZ, Xiao CL. Efficient assembly of nanopore reads via highly accurate and intact error correction. Nat Commun 2021; 12:60. [PMID: 33397900 PMCID: PMC7782737 DOI: 10.1038/s41467-020-20236-7] [Citation(s) in RCA: 176] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 11/19/2020] [Indexed: 12/21/2022] Open
Abstract
Long nanopore reads are advantageous in de novo genome assembly. However, nanopore reads usually have broad error distribution and high-error-rate subsequences. Existing error correction tools cannot correct nanopore reads efficiently and effectively. Most methods trim high-error-rate subsequences during error correction, which reduces both the length of the reads and contiguity of the final assembly. Here, we develop an error correction, and de novo assembly tool designed to overcome complex errors in nanopore reads. We propose an adaptive read selection and two-step progressive method to quickly correct nanopore reads to high accuracy. We introduce a two-stage assembler to utilize the full length of nanopore reads. Our tool achieves superior performance in both error correction and de novo assembling nanopore reads. It requires only 8122 hours to assemble a 35X coverage human genome and achieves a 2.47-fold improvement in NG50. Furthermore, our assembly of the human WERI cell line shows an NG50 of 22 Mbp. The high-quality assembly of nanopore reads can significantly reduce false positives in structure variation detection.
Collapse
Affiliation(s)
- Ying Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, #7 Jinsui Road, Tianhe District, Guangzhou, People's Republic of China
| | - Fan Nie
- School of Information Science and Engineering, Central South University, Changsha, 410083, People's Republic of China
| | - Shang-Qian Xie
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants, Ministry of Education, Hainan University, Haikou, 570228, People's Republic of China
- Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, 570228, People's Republic of China
| | - Ying-Feng Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, #7 Jinsui Road, Tianhe District, Guangzhou, People's Republic of China
| | - Qi Dai
- College of Life Sciences and Medicine, Zhejiang Sci-Tech University, Hangzhou, 310018, People's Republic of China
| | - Thomas Bray
- Oxford Nanopore Technologies, Gosling Building, Edmund Halley Road, Oxford Science Park, Oxford, OX4 4DQ, UK
| | - Yao-Xin Wang
- College of Life Sciences and Medicine, Zhejiang Sci-Tech University, Hangzhou, 310018, People's Republic of China
| | - Jian-Feng Xing
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants, Ministry of Education, Hainan University, Haikou, 570228, People's Republic of China
- Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, 570228, People's Republic of China
| | - Zhi-Jian Huang
- School of Marine Sciences, Sun Yat-sen University, Guangzhou, Guangdong, People's Republic of China
- State Key Laboratory of Biocontrol, Sun Yat-sen University, Guangzhou, Guangdong, People's Republic of China
- Southern Marine Sciences and Engineering Guangdong Laboratory (Zhuhai), Sun Yat-sen University, Guangzhou, Guangdong, People's Republic of China
| | - De-Peng Wang
- Nextomics Biosciences Co., Ltd, Wuhan, People's Republic of China
| | - Li-Juan He
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, #7 Jinsui Road, Tianhe District, Guangzhou, People's Republic of China
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, 29634-0974, USA.
| | - Jian-Xin Wang
- School of Information Science and Engineering, Central South University, Changsha, 410083, People's Republic of China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China.
| | - Yi-Zhi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, #7 Jinsui Road, Tianhe District, Guangzhou, People's Republic of China.
- Research Units of Ocular Development and Regeneration, Chinese Academy of Medical Sciences, Beijing, People's Republic of China.
| | - Chuan-Le Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, #7 Jinsui Road, Tianhe District, Guangzhou, People's Republic of China.
| |
Collapse
|
34
|
Zaccaron AZ, Stergiopoulos I. First Draft Genome Resource for the Tomato Black Leaf Mold Pathogen Pseudocercospora fuligena. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2020; 33:1441-1445. [PMID: 33044124 DOI: 10.1094/mpmi-06-20-0139-a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Pseudocercospora fuligena is a fungus that causes black leaf mold, an important disease of tomato in tropical and subtropical regions of the world. Despite its economic importance, genomic resources for this pathogen are scarce and no reference genome was available thus far. Here, we report a 50.6-Mb genome assembly for P. fuligena, consisting of 348 contigs with an N50 value of 0.407 Mb. In total, 13,764 protein-coding genes were predicted with an estimated BUSCO completeness of 98%. Among the predicted genes there were 179 candidate effectors, 445 carbohydrate-active enzymes, and 30 secondary metabolite gene clusters. The resources presented in this study will allow genome-wide comparative analyses and population genomic studies of this pathogen, ultimately improving management strategies for black leaf mold of tomato.
Collapse
Affiliation(s)
- Alex Z Zaccaron
- Department of Plant Pathology, University of California Davis, One Shields Avenue, Davis, CA 95616-8751, U.S.A
| | - Ioannis Stergiopoulos
- Department of Plant Pathology, University of California Davis, One Shields Avenue, Davis, CA 95616-8751, U.S.A
| |
Collapse
|
35
|
Hoang PTN, Fiebig A, Novák P, Macas J, Cao HX, Stepanenko A, Chen G, Borisjuk N, Scholz U, Schubert I. Chromosome-scale genome assembly for the duckweed Spirodela intermedia, integrating cytogenetic maps, PacBio and Oxford Nanopore libraries. Sci Rep 2020; 10:19230. [PMID: 33154426 PMCID: PMC7645714 DOI: 10.1038/s41598-020-75728-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 10/13/2020] [Indexed: 11/16/2022] Open
Abstract
Duckweeds are small, free-floating, morphologically highly reduced organisms belonging to the monocot order Alismatales. They display the most rapid growth among flowering plants, vary ~ 14-fold in genome size and comprise five genera. Spirodela is the phylogenetically oldest genus with only two mainly asexually propagating species: S. polyrhiza (2n = 40; 160 Mbp/1C) and S. intermedia (2n = 36; 160 Mbp/1C). This study combined comparative cytogenetics and de novo genome assembly based on PacBio, Illumina and Oxford Nanopore (ON) reads to obtain the first genome reference for S. intermedia and to compare its genomic features with those of the sister species S. polyrhiza. Both species' genomes revealed little more than 20,000 putative protein-coding genes, very low rDNA copy numbers and a low amount of repetitive sequences, mainly Ty3/gypsy retroelements. The detection of a few new small chromosome rearrangements between both Spirodela species refined the karyotype and the chromosomal sequence assignment for S. intermedia.
Collapse
Affiliation(s)
- Phuong T N Hoang
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Stadt Seeland, Germany
- Biology Faculty, Dalat University, District 8, Dalat City, Lamdong Province, Vietnam
| | - Anne Fiebig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Stadt Seeland, Germany
| | - Petr Novák
- Biology Centre, Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, 37005, Czech Republic
| | - Jiří Macas
- Biology Centre, Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, 37005, Czech Republic
| | - Hieu X Cao
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Stadt Seeland, Germany
- Institute of Biology, Martin-Luther-University Halle-Wittenberg, 06120, Halle, Germany
| | - Anton Stepanenko
- Jiangsu Key Laboratory for Eco-Agricultural Biotechnology Around Hongze Lake, School of Life Sciences, Huaiyin Normal University, Huai'an, 223300, China
- Jiangsu Collaborative Innovation Centre of Regional Modern Agriculture and Environmental Protection, Huaiyin Normal University, Huai'an, 223300, China
| | - Guimin Chen
- Jiangsu Key Laboratory for Eco-Agricultural Biotechnology Around Hongze Lake, School of Life Sciences, Huaiyin Normal University, Huai'an, 223300, China
- Jiangsu Collaborative Innovation Centre of Regional Modern Agriculture and Environmental Protection, Huaiyin Normal University, Huai'an, 223300, China
| | - Nikolai Borisjuk
- Jiangsu Key Laboratory for Eco-Agricultural Biotechnology Around Hongze Lake, School of Life Sciences, Huaiyin Normal University, Huai'an, 223300, China
- Jiangsu Collaborative Innovation Centre of Regional Modern Agriculture and Environmental Protection, Huaiyin Normal University, Huai'an, 223300, China
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Stadt Seeland, Germany
| | - Ingo Schubert
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Stadt Seeland, Germany.
| |
Collapse
|
36
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
37
|
Sullivan AR, Eldfjell Y, Schiffthaler B, Delhomme N, Asp T, Hebelstrup KH, Keech O, Öberg L, Møller IM, Arvestad L, Street NR, Wang XR. The Mitogenome of Norway Spruce and a Reappraisal of Mitochondrial Recombination in Plants. Genome Biol Evol 2020; 12:3586-3598. [PMID: 31774499 PMCID: PMC6944214 DOI: 10.1093/gbe/evz263] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/25/2019] [Indexed: 02/07/2023] Open
Abstract
Plant mitogenomes can be difficult to assemble because they are structurally dynamic and prone to intergenomic DNA transfers, leading to the unusual situation where an organelle genome is far outnumbered by its nuclear counterparts. As a result, comparative mitogenome studies are in their infancy and some key aspects of genome evolution are still known mainly from pregenomic, qualitative methods. To help address these limitations, we combined machine learning and in silico enrichment of mitochondrial-like long reads to assemble the bacterial-sized mitogenome of Norway spruce (Pinaceae: Picea abies). We conducted comparative analyses of repeat abundance, intergenomic transfers, substitution and rearrangement rates, and estimated repeat-by-repeat homologous recombination rates. Prompted by our discovery of highly recombinogenic small repeats in P. abies, we assessed the genomic support for the prevailing hypothesis that intramolecular recombination is predominantly driven by repeat length, with larger repeats facilitating DNA exchange more readily. Overall, we found mixed support for this view: Recombination dynamics were heterogeneous across vascular plants and highly active small repeats (ca. 200 bp) were present in about one-third of studied mitogenomes. As in previous studies, we did not observe any robust relationships among commonly studied genome attributes, but we identify variation in recombination rates as a underinvestigated source of plant mitogenome diversity.
Collapse
Affiliation(s)
- Alexis R Sullivan
- Department of Ecology and Environmental Science, Umeå Plant Science Center, Umeå University, Sweden
| | - Yrin Eldfjell
- Science for Life Laboratory, Department of Mathematics, Swedish e-Science Research Centre, Stockholm University, Sweden
| | - Bastian Schiffthaler
- Department of Plant Physiology, Umeå Plant Science Center, Umeå University, Sweden
| | - Nicolas Delhomme
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Center, Swedish University of Agricultural Sciences, Umeå, Sweden
| | - Torben Asp
- Department of Molecular Biology and Genetics, Aarhus University, Slagelse, Denmark
| | | | - Olivier Keech
- Department of Plant Physiology, Umeå Plant Science Center, Umeå University, Sweden
| | - Lisa Öberg
- Oldtjikko Photo Art & Science, Duved, Sweden
| | - Ian Max Møller
- Department of Molecular Biology and Genetics, Aarhus University, Slagelse, Denmark
| | - Lars Arvestad
- Science for Life Laboratory, Department of Mathematics, Swedish e-Science Research Centre, Stockholm University, Sweden
| | - Nathaniel R Street
- Department of Plant Physiology, Umeå Plant Science Center, Umeå University, Sweden
| | - Xiao-Ru Wang
- Department of Ecology and Environmental Science, Umeå Plant Science Center, Umeå University, Sweden
| |
Collapse
|
38
|
Telomere-to-telomere assembled and centromere annotated genomes of the two main subspecies of the button mushroom Agaricus bisporus reveal especially polymorphic chromosome ends. Sci Rep 2020; 10:14653. [PMID: 32887908 PMCID: PMC7473861 DOI: 10.1038/s41598-020-71043-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 07/22/2020] [Indexed: 11/09/2022] Open
Abstract
Agaricus bisporus, the most cultivated edible mushroom worldwide, is represented mainly by the subspecies var. bisporus and var. burnettii. var. bisporus has a secondarily homothallic life cycle with recombination restricted to chromosome ends, while var. burnettii is heterothallic with recombination seemingly equally distributed over the chromosomes. To better understand the relationship between genomic make-up and different lifestyles, we have de novo sequenced a burnettii homokaryon and synchronised gene annotations with updated versions of the published genomes of var. bisporus. The genomes were assembled into telomere-to-telomere chromosomes and a consistent set of gene predictions was generated. The genomes of both subspecies were largely co-linear, and especially the chromosome ends differed in gene model content between the two subspecies. A single large cluster of repeats was found on each chromosome at the same respective position in all strains, harbouring nearly 50% of all repeats and likely representing centromeres. Repeats were all heavily methylated. Finally, a mapping population of var. burnettii confirmed an even distribution of crossovers in meiosis, contrasting the recombination landscape of var. bisporus. The new findings using the exceptionally complete and well annotated genomes of this basidiomycete demonstrate the importance for unravelling genetic components underlying the different life cycles.
Collapse
|
39
|
Krasnov GS, Pushkova EN, Novakovskiy RO, Kudryavtseva LP, Rozhmina TA, Dvorianinova EM, Povkhova LV, Kudryavtseva AV, Dmitriev AA, Melnikova NV. High-Quality Genome Assembly of Fusarium oxysporum f. sp. lini. Front Genet 2020; 11:959. [PMID: 33193577 PMCID: PMC7481384 DOI: 10.3389/fgene.2020.00959] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Accepted: 07/30/2020] [Indexed: 12/31/2022] Open
Abstract
In the present work, a highly pathogenic isolate of Fusarium oxysporum f. sp. lini, which is the most harmful pathogen affecting flax (Linum usitatissimum L.), was sequenced for the first time. To achieve a high-quality genome assembly, we used the combination of two sequencing platforms - Oxford Nanopore Technologies (MinION system) with long noisy reads and Illumina (HiSeq 2500 instrument) with short accurate reads. Given the quality of DNA is crucial for Nanopore sequencing, we developed the protocol for extraction of pure high-molecular-weight DNA from fungi. Sequencing of DNA extracted using this protocol allowed us to obtain about 85x genome coverage with long (N50 = 29 kb) MinION reads and 30x coverage with 2 × 250 bp HiSeq reads. Several tools were developed for genome assembly; however, they provide different results depending on genome complexity, sequencing data volume, read length and quality. We benchmarked the most requested assemblers (Canu, Flye, Shasta, wtdbg2, and MaSuRCA), Nanopore polishers (Medaka and Racon), and Illumina polishers (Pilon and POLCA) on our sequencing data. The assembly performed with Canu and polished with Medaka and POLCA was considered the most full and accurate. After further elimination of redundant contigs using Purge Haplotigs, we achieved a high-quality genome of F. oxysporum f. sp. lini with a total length of 59 Mb, N50 of 3.3 Mb, and 99.5% completeness according to BUSCO. We also obtained a complete circular mitochondrial genome with a length of 38.7 kb. The achieved assembly expands studies on F. oxysporum and plant-pathogen interaction in flax.
Collapse
Affiliation(s)
- George S. Krasnov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Elena N. Pushkova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Roman O. Novakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | | | - Tatiana A. Rozhmina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Federal Research Center for Bast Fiber Crops, Torzhok, Russia
| | - Ekaterina M. Dvorianinova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Liubov V. Povkhova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Anna V. Kudryavtseva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Alexey A. Dmitriev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Nataliya V. Melnikova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
40
|
Latorre-Pérez A, Villalba-Bermell P, Pascual J, Vilanova C. Assembly methods for nanopore-based metagenomic sequencing: a comparative study. Sci Rep 2020; 10:13588. [PMID: 32788623 PMCID: PMC7423617 DOI: 10.1038/s41598-020-70491-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 07/22/2020] [Indexed: 02/08/2023] Open
Abstract
Metagenomic sequencing has allowed for the recovery of previously unexplored microbial genomes. Whereas short-read sequencing platforms often result in highly fragmented metagenomes, nanopore-based sequencers could lead to more contiguous assemblies due to their potential to generate long reads. Nevertheless, there is a lack of updated and systematic studies evaluating the performance of different assembly tools on nanopore data. In this study, we have benchmarked the ability of different assemblers to reconstruct two different commercially-available mock communities that have been sequenced using Oxford Nanopore Technologies platforms. Among the tested tools, only metaFlye, Raven, and Canu performed well in all the datasets. These tools retrieved highly contiguous genomes (or even complete genomes) directly from the metagenomic data. Despite the intrinsic high error of nanopore sequencing, final assemblies reached high accuracy (~ 99.5 to 99.8% of consensus accuracy). Polishing strategies demonstrated to be necessary for reducing the number of indels, and this had an impact on the prediction of biosynthetic gene clusters. Correction with high quality short reads did not always result in higher quality draft assemblies. Overall, nanopore metagenomic sequencing data-adapted to MinION's current output-proved sufficient for assembling and characterizing low-complexity microbial communities.
Collapse
|
41
|
Henry RJ. Innovations in plant genetics adapting agriculture to climate change. CURRENT OPINION IN PLANT BIOLOGY 2020; 56:168-173. [PMID: 31836470 DOI: 10.1016/j.pbi.2019.11.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 11/01/2019] [Accepted: 11/20/2019] [Indexed: 05/25/2023]
Abstract
Developing new genotypes of plants is one of the key options for adaptation of agriculture to climate change. Plants may be required to provide resilience in changed climates or support the migration of agriculture to new regions. Very different genotypes may be required to perform in the modified environments of protected agriculture. Consumers will continue to demand taste, convenience, healthy and safe food and sustainably and ethically produced food, despite the greater challenges of climate in the future. Improving the nutritional value of foods in response to climate change is a significant challenge. Genomic sequences of relevant germplasm and an understanding of the functional role of alleles controlling key traits will be an enabling platform for this innovation.
Collapse
Affiliation(s)
- Robert J Henry
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Qld 4072 Australia.
| |
Collapse
|
42
|
Yen EC, McCarthy SA, Galarza JA, Generalovic TN, Pelan S, Nguyen P, Meier JI, Warren IA, Mappes J, Durbin R, Jiggins CD. A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning. Gigascience 2020; 9:giaa088. [PMID: 32808665 PMCID: PMC7433188 DOI: 10.1093/gigascience/giaa088] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 07/03/2020] [Accepted: 07/27/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Diploid genome assembly is typically impeded by heterozygosity because it introduces errors when haplotypes are collapsed into a consensus sequence. Trio binning offers an innovative solution that exploits heterozygosity for assembly. Short, parental reads are used to assign parental origin to long reads from their F1 offspring before assembly, enabling complete haplotype resolution. Trio binning could therefore provide an effective strategy for assembling highly heterozygous genomes, which are traditionally problematic, such as insect genomes. This includes the wood tiger moth (Arctia plantaginis), which is an evolutionary study system for warning colour polymorphism. FINDINGS We produced a high-quality, haplotype-resolved assembly for Arctia plantaginis through trio binning. We sequenced a same-species family (F1 heterozygosity ∼1.9%) and used parental Illumina reads to bin 99.98% of offspring Pacific Biosciences reads by parental origin, before assembling each haplotype separately and scaffolding with 10X linked reads. Both assemblies are contiguous (mean scaffold N50: 8.2 Mb) and complete (mean BUSCO completeness: 97.3%), with annotations and 31 chromosomes identified through karyotyping. We used the assembly to analyse genome-wide population structure and relationships between 40 wild resequenced individuals from 5 populations across Europe, revealing the Georgian population as the most genetically differentiated with the lowest genetic diversity. CONCLUSIONS We present the first invertebrate genome to be assembled via trio binning. This assembly is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling heterozygous genomes. Using our assembly, we provide genomic insights into the geographic population structure of A. plantaginis.
Collapse
Affiliation(s)
- Eugenie C Yen
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Shane A McCarthy
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden CB10 1SA, UK
| | - Juan A Galarza
- Department of Biological and Environmental Science, University of Jyväskylä, FI-40014 Jyväskylä, Finland
| | - Tomas N Generalovic
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Sarah Pelan
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden CB10 1SA, UK
| | - Petr Nguyen
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, Branišovská 1160/31, 370 05 České Budějovice, Czech Republic
- University of South Bohemia, Faculty of Science, Branišovská 1645/31A, 370 05 České Budějovice, Czech Republic
| | - Joana I Meier
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
- St John's College, University of Cambridge, St John's Street, Cambridge CB2 1TP, UK
| | - Ian A Warren
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Johanna Mappes
- Department of Biological and Environmental Science, University of Jyväskylä, FI-40014 Jyväskylä, Finland
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden CB10 1SA, UK
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
- St John's College, University of Cambridge, St John's Street, Cambridge CB2 1TP, UK
| |
Collapse
|
43
|
Compton A, Sharakhov IV, Tu Z. Recent advances and future perspectives in vector-omics. CURRENT OPINION IN INSECT SCIENCE 2020; 40:94-103. [PMID: 32650287 PMCID: PMC8041138 DOI: 10.1016/j.cois.2020.05.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 05/07/2020] [Accepted: 05/18/2020] [Indexed: 06/11/2023]
Abstract
We have reviewed recent progress and the remaining challenges in vector-omics. We have highlighted several technologies and applications that facilitate novel biological insights beyond achieving a reference-quality genome assembly. Among other topics, we have discussed the applications of chromatin conformation capture, chromatin accessibility assays, optical mapping, full-length RNA sequencing, single cell RNA analysis, proteomics, and population genomics. We anticipate that we will witness a great expansion in vector-omics research not only in its application in a broad range of species, but also its ability to uncover novel genetic elements and tackle previously inaccessible regions of the genome. It is our hope that the continued innovation in device portability, cost reduction, and informatics support will in the foreseeable future bring vector-omics to every vector laboratory and field station in the world, which will have an unparalleled impact on basic research and the control of vector-borne infectious diseases.
Collapse
Affiliation(s)
- Austin Compton
- Department of Biochemistry, Virginia Tech, Blacksburg, VA 24061, United States; Fralin Life Science Institute, Virginia Tech, Blacksburg, VA 24061, United States
| | - Igor V Sharakhov
- Fralin Life Science Institute, Virginia Tech, Blacksburg, VA 24061, United States; Department of Entomology, Virginia Tech, Blacksburg, VA 24061, United States; The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, United States; Department of Genetics and Cell Biology, Tomsk State University, Tomsk 634050, Russia
| | - Zhijian Tu
- Department of Biochemistry, Virginia Tech, Blacksburg, VA 24061, United States; Department of Entomology, Virginia Tech, Blacksburg, VA 24061, United States; The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, United States.
| |
Collapse
|
44
|
Orton LM, Fitzek E, Feng X, Grayburn WS, Mower JP, Liu K, Zhang C, Duvall MR, Yin Y. Zygnema circumcarinatum UTEX 1559 chloroplast and mitochondrial genomes provide insight into land plant evolution. JOURNAL OF EXPERIMENTAL BOTANY 2020; 71:3361-3373. [PMID: 32206790 DOI: 10.1093/jxb/eraa149] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 03/19/2020] [Indexed: 05/22/2023]
Abstract
The complete chloroplast and mitochondrial genomes of Charophyta have shed new light on land plant terrestrialization. Here, we report the organellar genomes of the Zygnema circumcarinatum strain UTEX 1559, and a comparative genomics investigation of 33 plastomes and 18 mitogenomes of Chlorophyta, Charophyta (including UTEX 1559 and its conspecific relative SAG 698-1a), and Embryophyta. Gene presence/absence was determined across these plastomes and mitogenomes. A comparison between the plastomes of UTEX 1559 (157 548 bp) and SAG 698-1a (165 372 bp) revealed very similar gene contents, but substantial genome rearrangements. Surprisingly, the two plastomes share only 85.69% nucleotide sequence identity. The UTEX 1559 mitogenome size is 215 954 bp, the largest among all sequenced Charophyta. Interestingly, this large mitogenome contains a 50 kb region without homology to any other organellar genomes, which is flanked by two 86 bp direct repeats and contains 15 ORFs. These ORFs have significant homology to proteins from bacteria and plants with functions such as primase, RNA polymerase, and DNA polymerase. We conclude that (i) the previously published SAG 698-1a plastome is probably from a different Zygnema species, and (ii) the 50 kb region in the UTEX 1559 mitogenome might be recently acquired as a mobile element.
Collapse
Affiliation(s)
- Lauren M Orton
- Biological Sciences, Northern Illinois University, DeKalb, IL, USA
| | - Elisabeth Fitzek
- Biology/Computational Biology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology-CeBiTec, Bielefeld, Germany
| | - Xuehuan Feng
- Department of Food Science and Technology, Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - W Scott Grayburn
- Biological Sciences, Northern Illinois University, DeKalb, IL, USA
| | - Jeffrey P Mower
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, USA
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE USA
| | - Kan Liu
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, USA
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Chi Zhang
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, USA
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Melvin R Duvall
- Biological Sciences, Northern Illinois University, DeKalb, IL, USA
| | - Yanbin Yin
- Department of Food Science and Technology, Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, NE, USA
| |
Collapse
|
45
|
Ou S, Liu J, Chougule KM, Fungtammasan A, Seetharam AS, Stein JC, Llaca V, Manchanda N, Gilbert AM, Wei S, Chin CS, Hufnagel DE, Pedersen S, Snodgrass SJ, Fengler K, Woodhouse M, Walenz BP, Koren S, Phillippy AM, Hannigan BT, Dawe RK, Hirsch CN, Hufford MB, Ware D. Effect of sequence depth and length in long-read assembly of the maize inbred NC358. Nat Commun 2020; 11:2288. [PMID: 32385271 PMCID: PMC7211024 DOI: 10.1038/s41467-020-16037-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Accepted: 04/09/2020] [Indexed: 01/23/2023] Open
Abstract
Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11-21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.
Collapse
Affiliation(s)
- Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA
| | - Jianing Liu
- Department of Genetics, University of Georgia, Athens, Georgia, 30602, USA
| | - Kapeel M Chougule
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
| | | | - Arun S Seetharam
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA
- Genome Informatics Facility, Iowa State University, Ames, Iowa, 50011, USA
| | - Joshua C Stein
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
| | - Victor Llaca
- Genomics Technologies, Applied Science and Technology, Corteva Agriscience TM, Johnston, Iowa, 50131, USA
| | - Nancy Manchanda
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA
| | - Amanda M Gilbert
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota, 55108, USA
| | - Sharon Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
| | - Chen-Shan Chin
- DNAnexus, Inc., Mountain View, San Francisco, California, 94040, USA
| | - David E Hufnagel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA
| | - Sarah Pedersen
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA
| | - Samantha J Snodgrass
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA
| | - Kevin Fengler
- Genomics Technologies, Applied Science and Technology, Corteva Agriscience TM, Johnston, Iowa, 50131, USA
| | - Margaret Woodhouse
- USDA ARS Corn Insects and Crop Genetics Research Unit, Ames, Iowa, 50011, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, 20892, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, 20892, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, 20892, USA
| | - Brett T Hannigan
- DNAnexus, Inc., Mountain View, San Francisco, California, 94040, USA
| | - R Kelly Dawe
- Department of Genetics, University of Georgia, Athens, Georgia, 30602, USA.
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota, 55108, USA.
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA.
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA.
- USDA ARS Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, New York, 14853, USA.
| |
Collapse
|
46
|
Jayakumar V, Ishii H, Seki M, Kumita W, Inoue T, Hase S, Sato K, Okano H, Sasaki E, Sakakibara Y. An improved de novo genome assembly of the common marmoset genome yields improved contiguity and increased mapping rates of sequence data. BMC Genomics 2020; 21:243. [PMID: 32241258 PMCID: PMC7114785 DOI: 10.1186/s12864-020-6657-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The common marmoset (Callithrix jacchus) is one of the most studied primate model organisms. However, the marmoset genomes available in the public databases are highly fragmented and filled with sequence gaps, hindering research advances related to marmoset genomics and transcriptomics. RESULTS Here we utilize single-molecule, long-read sequence data to improve and update the existing genome assembly and report a near-complete genome of the common marmoset. The assembly is of 2.79 Gb size, with a contig N50 length of 6.37 Mb and a chromosomal scaffold N50 length of 143.91 Mb, representing the most contiguous and high-quality marmoset genome up to date. Approximately 90% of the assembled genome was represented in contigs longer than 1 Mb, with approximately 104-fold improvement in contiguity over the previously published marmoset genome. More than 98% of the gaps from the previously published genomes were filled successfully, which improved the mapping rates of genomic and transcriptomic data on to the assembled genome. CONCLUSIONS Altogether the updated, high-quality common marmoset genome assembly provide improvements at various levels over the previous versions of the marmoset genome assemblies. This will allow researchers working on primate genomics to apply the genome more efficiently for their genomic and transcriptomic sequence data.
Collapse
Affiliation(s)
- Vasanthan Jayakumar
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Hiromi Ishii
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Misato Seki
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Wakako Kumita
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kawasaki, Kanagawa 210-0821 Japan
| | - Takashi Inoue
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kawasaki, Kanagawa 210-0821 Japan
| | - Sumitaka Hase
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Kengo Sato
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Hideyuki Okano
- Department of Physiology, Keio University School of Medicine, Shinjuku, Tokyo, 160-8582 Japan
- Laboratory for Marmoset Neural Architecture, RIKEN Center for Brain Science, Wako-shi, Saitama, 351-0198 Japan
| | - Erika Sasaki
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kawasaki, Kanagawa 210-0821 Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| |
Collapse
|
47
|
Molina-Mora JA, Campos-Sánchez R, Rodríguez C, Shi L, García F. High quality 3C de novo assembly and annotation of a multidrug resistant ST-111 Pseudomonas aeruginosa genome: Benchmark of hybrid and non-hybrid assemblers. Sci Rep 2020; 10:1392. [PMID: 31996747 PMCID: PMC6989561 DOI: 10.1038/s41598-020-58319-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 01/06/2020] [Indexed: 12/14/2022] Open
Abstract
Genotyping methods and genome sequencing are indispensable to reveal genomic structure of bacterial species displaying high level of genome plasticity. However, reconstruction of genome or assembly is not straightforward due to data complexity, including repeats, mobile and accessory genetic elements of bacterial genomes. Moreover, since the solution to this problem is strongly influenced by sequencing technology, bioinformatics pipelines, and selection criteria to assess assemblers, there is no systematic way to select a priori the optimal assembler and parameter settings. To assembly the genome of Pseudomonas aeruginosa strain AG1 (PaeAG1), short reads (Illumina) and long reads (Oxford Nanopore) sequencing data were used in 13 different non-hybrid and hybrid approaches. PaeAG1 is a multiresistant high-risk sequence type 111 (ST-111) clone that was isolated from a Costa Rican hospital and it was the first report of an isolate of P. aeruginosa carrying both blaVIM-2 and blaIMP-18 genes encoding for metallo-β-lactamases (MBL) enzymes. To assess the assemblies, multiple metrics regard to contiguity, correctness and completeness (3C criterion, as we define here) were used for benchmarking the 13 approaches and select a definitive assembly. In addition, annotation was done to identify genes (coding and RNA regions) and to describe the genomic content of PaeAG1. Whereas long reads and hybrid approaches showed better performances in terms of contiguity, higher correctness and completeness metrics were obtained for short read only and hybrid approaches. A manually curated and polished hybrid assembly gave rise to a single circular sequence with 100% of core genes and known regions identified, >98% of reads mapped back, no gaps, and uniform coverage. The strategy followed to obtain this high-quality 3C assembly is detailed in the manuscript and we provide readers with an all-in-one script to replicate our results or to apply it to other troublesome cases. The final 3C assembly revealed that the PaeAG1 genome has 7,190,208 bp, a 65.7% GC content and 6,709 genes (6,620 coding sequences), many of which are included in multiple mobile genomic elements, such as 57 genomic islands, six prophages, and two complete integrons with blaVIM-2 and blaIMP-18 MBL genes. Up to 250 and 60 of the predicted genes are anticipated to play a role in virulence (adherence, quorum sensing and secretion) or antibiotic resistance (β-lactamases, efflux pumps, etc). Altogether, the assembly and annotation of the PaeAG1 genome provide new perspectives to continue studying the genomic diversity and gene content of this important human pathogen.
Collapse
Affiliation(s)
- José Arturo Molina-Mora
- Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica.
| | - Rebeca Campos-Sánchez
- Centro de Investigación en Biología Celular y Molecular, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - César Rodríguez
- Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - Leming Shi
- Human Phenome Institute of Fudan University, Shanghai, China
| | - Fernando García
- Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| |
Collapse
|
48
|
Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. QUANTITATIVE BIOLOGY 2019. [DOI: 10.1007/s40484-019-0181-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
49
|
Midha MK, Wu M, Chiu KP. Long-read sequencing in deciphering human genetics to a greater depth. Hum Genet 2019; 138:1201-1215. [PMID: 31538236 DOI: 10.1007/s00439-019-02064-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 09/13/2019] [Indexed: 12/12/2022]
Abstract
Through four decades' development, DNA sequencing has inched into the era of single-molecule sequencing (SMS), or the third-generation sequencing (TGS), as represented by two distinct technical approaches developed independently by Pacific Bioscience (PacBio) and Oxford Nanopore Technologies (ONT). Historically, each generation of sequencing technologies was marked by innovative technological achievements and novel applications. Long reads (LRs) are considered as the most advantageous feature of SMS shared by both PacBio and ONT to distinguish SMS from next-generation sequencing (NGS, or the second-generation sequencing) and Sanger sequencing (the first-generation sequencing). Long reads overcome the limitations of NGS and drastically improves the quality of genome assembly. Besides, ONT also contributes several unique features including ultra-long reads (ULRs) with read length above 300 kb and some close to 1 million bp, direct RNA sequencing and superior portability as made possible by pocket-sized MinION sequencer. Here, we review the history of DNA sequencing technologies and associated applications, with a special focus on the advantages as well as the limitations of ULR sequencing in genome assembly.
Collapse
Affiliation(s)
- Mohit K Midha
- Genomics Research Center, Academia Sinica, 128 Academia Road, Sec. 2, Nankang District, Taipei, 115, Taiwan.,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei, Taiwan
| | - Mengchu Wu
- Health GeneTech, 22F No. 99, Xin Pu 6th St., Taoyuan, Taiwan
| | - Kuo-Ping Chiu
- Genomics Research Center, Academia Sinica, 128 Academia Road, Sec. 2, Nankang District, Taipei, 115, Taiwan. .,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei, Taiwan. .,Department of Life Sciences, College of Life Sciences, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
50
|
Constructing a Reference Genome in a Single Lab: The Possibility to Use Oxford Nanopore Technology. PLANTS 2019; 8:plants8080270. [PMID: 31390788 PMCID: PMC6724115 DOI: 10.3390/plants8080270] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 07/29/2019] [Accepted: 08/04/2019] [Indexed: 12/19/2022]
Abstract
The whole genome sequencing (WGS) has become a crucial tool in understanding genome structure and genetic variation. The MinION sequencing of Oxford Nanopore Technologies (ONT) is an excellent approach for performing WGS and it has advantages in comparison with other Next-Generation Sequencing (NGS): It is relatively inexpensive, portable, has simple library preparation, can be monitored in real-time, and has no theoretical limits on reading length. Sorghum bicolor (L.) Moench is diploid (2n = 2x = 20) with a genome size of about 730 Mb, and its genome sequence information is released in the Phytozome database. Therefore, sorghum can be used as a good reference. However, plant species have complex and large genomes when compared to animals or microorganisms. As a result, complete genome sequencing is difficult for plant species. MinION sequencing that produces long-reads can be an excellent tool for overcoming the weak assembly of short-reads generated from NGS by minimizing the generation of gaps or covering the repetitive sequence that appears on the plant genome. Here, we conducted the genome sequencing for S. bicolor cv. BTx623 while using the MinION platform and obtained 895,678 reads and 17.9 gigabytes (Gb) (ca. 25× coverage of reference) from long-read sequence data. A total of 6124 contigs (covering 45.9%) were generated from Canu, and a total of 2661 contigs (covering 50%) were generated from Minimap and Miniasm with a Racon through a de novo assembly using two different tools and mapped assembled contigs against the sorghum reference genome. Our results provide an optimal series of long-read sequencing analysis for plant species while using the MinION platform and a clue to determine the total sequencing scale for optimal coverage that is based on various genome sizes.
Collapse
|