51
|
Eleftheriou E, Aury JM, Vacherie B, Istace B, Belser C, Noel B, Moret Y, Rigaud T, Berro F, Gasparian S, Labadie-Bretheau K, Lefebvre T, Madoui MA. Chromosome-scale assembly of the yellow mealworm genome. OPEN RESEARCH EUROPE 2022; 1:94. [PMID: 37645128 PMCID: PMC10445852 DOI: 10.12688/openreseurope.13987.3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/30/2022] [Indexed: 08/31/2023]
Abstract
Background: The yellow mealworm beetle, Tenebrio molitor, is a promising alternative protein source for animal and human nutrition and its farming involves relatively low environmental costs. For these reasons, its industrial scale production started this century. However, to optimize and breed sustainable new T. molitor lines, the access to its genome remains essential. Methods: By combining Oxford Nanopore and Illumina Hi-C data, we constructed a high-quality chromosome-scale assembly of T. molitor. Then, we combined RNA-seq data and available coleoptera proteomes for gene prediction with GMOVE. Results: We produced a high-quality genome with a N50 = 21.9Mb with a completeness of 99.5% and predicted 21,435 genes with a median size of 1,780 bp. Gene orthology between T. molitor and Tribolium castaneum showed a highly conserved synteny between the two coleoptera and paralogs search revealed an expansion of histones in the T. molitor genome. Conclusions: The present genome will greatly help fundamental and applied research such as genetic breeding and will contribute to the sustainable production of the yellow mealworm.
Collapse
Affiliation(s)
- Evangelia Eleftheriou
- Génomique Métabolique, Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), CNRS, Univ Evry, Université Paris-Saclay, Université Paris-Saclay, Evry, 91057, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), CNRS, Univ Evry, Université Paris-Saclay, Université Paris-Saclay, Evry, 91057, France
| | - Benoît Vacherie
- Genoscope, Institut de biologie François Jacob, CEA, Université Paris‐Saclay, Evry, 91057, France
| | - Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), CNRS, Univ Evry, Université Paris-Saclay, Université Paris-Saclay, Evry, 91057, France
| | - Caroline Belser
- Génomique Métabolique, Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), CNRS, Univ Evry, Université Paris-Saclay, Université Paris-Saclay, Evry, 91057, France
| | - Benjamin Noel
- Génomique Métabolique, Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), CNRS, Univ Evry, Université Paris-Saclay, Université Paris-Saclay, Evry, 91057, France
| | - Yannick Moret
- Équipe Écologie Évolutive, UMR CNRS 6282 BioGéoSciences, Université de Bourgogne Franche-Comté, Dijon, 21000, France
| | - Thierry Rigaud
- Équipe Écologie Évolutive, UMR CNRS 6282 BioGéoSciences, Université de Bourgogne Franche-Comté, Dijon, 21000, France
| | | | | | - Karine Labadie-Bretheau
- Genoscope, Institut de biologie François Jacob, CEA, Université Paris‐Saclay, Evry, 91057, France
| | | | - Mohammed-Amin Madoui
- Génomique Métabolique, Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), CNRS, Univ Evry, Université Paris-Saclay, Université Paris-Saclay, Evry, 91057, France
- Équipe Écologie Évolutive, UMR CNRS 6282 BioGéoSciences, Université de Bourgogne Franche-Comté, Dijon, 21000, France
- Service d’Etude des Prions et des Infections Atypiques (SEPIA), Institut François Jacob, Commissariat à l’Energie Atomique et aux Energies Alternatives (CEA), Université Paris Saclay, Fontenay-aux-Roses, France
| |
Collapse
|
52
|
Wen M, Pan Q, Jouanno E, Montfort J, Zahm M, Cabau C, Klopp C, Iampietro C, Roques C, Bouchez O, Castinel A, Donnadieu C, Parrinello H, Poncet C, Belmonte E, Gautier V, Avarre JC, Dugue R, Gustiano R, Hà TTT, Campet M, Sriphairoj K, Ribolli J, de Almeida FL, Desvignes T, Postlethwait JH, Bucao CF, Robinson-Rechavi M, Bobe J, Herpin A, Guiguen Y. An ancient truncated duplication of the anti-Müllerian hormone receptor type 2 gene is a potential conserved master sex determinant in the Pangasiidae catfish family. Mol Ecol Resour 2022; 22:2411-2428. [PMID: 35429227 PMCID: PMC9555307 DOI: 10.1111/1755-0998.13620] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/29/2022] [Accepted: 04/11/2022] [Indexed: 11/30/2022]
Abstract
The evolution of sex determination (SD) in teleosts is amazingly dynamic, as reflected by the variety of different master sex-determining genes identified. Pangasiids are economically important catfishes in South Asian countries, but little is known about their SD system. Here, we generated novel genomic resources for 12 Pangasiids and characterized their SD system. Based on a Pangasianodon hypophthalmus chromosome-scale genome assembly, we identified an anti-Müllerian hormone receptor type Ⅱ gene (amhr2) duplication, which was further characterized as being sex-linked in males and expressed only in testes. These results point to a Y chromosome male-specific duplication (amhr2by) of the autosomal amhr2a. Sequence annotation revealed that the P. hypophthalmus Amhr2by is truncated in its N-terminal domain, lacking the cysteine-rich extracellular part of the receptor that is crucial for ligand binding, suggesting a potential route for its neofunctionalization. Reference-guided assembly of 11 additional Pangasiids, along with sex-linkage studies, revealed that this truncated amhr2by duplication is a male-specific conserved gene in Pangasiids. Reconstructions of the amhr2 phylogeny suggested that amhr2by arose from an ancient duplication/insertion event at the root of the Siluroidei radiation that is dated to ~100 million years ago. Together these results bring multiple lines of evidence supporting that amhr2by is an ancient and conserved master sex-determining gene in Pangasiids, a finding that highlights the recurrent use of the transforming growth factor β pathway, which is often used for the recruitment of teleost master SD genes, and provides another empirical case towards firther understanding of dynamics of SD systems.
Collapse
Affiliation(s)
- Ming Wen
- State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha, China
- INRAE, LPGP, Rennes, France
| | - Qiaowei Pan
- INRAE, LPGP, Rennes, France
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | | | | | - Margot Zahm
- Plate-forme bio-informatique Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRAE, Castanet Tolosan, France
| | - Cédric Cabau
- SIGENAE, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France
| | - Christophe Klopp
- Plate-forme bio-informatique Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRAE, Castanet Tolosan, France
- SIGENAE, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France
| | | | - Céline Roques
- INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | - Olivier Bouchez
- INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | - Adrien Castinel
- INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | | | - Hugues Parrinello
- Montpellier GenomiX (MGX), C/O Institut de Génomique Fonctionnelle, Montpellier, France
| | - Charles Poncet
- GDEC Gentyane, INRAE, Université Clermont Auvergne, Clermont-Ferrand, France
| | - Elodie Belmonte
- GDEC Gentyane, INRAE, Université Clermont Auvergne, Clermont-Ferrand, France
| | - Véronique Gautier
- GDEC Gentyane, INRAE, Université Clermont Auvergne, Clermont-Ferrand, France
| | | | - Remi Dugue
- ISEM, CNRS, IRD, Univ Montpellier, Montpellier, France
| | - Rudhy Gustiano
- Research Institute of Freshwater Fisheries (CRIFI-RIFF), Instalasi Penelitian Perikanan Air Tawar, Jakarta, Indonesia
| | - Trần Thị Thúy Hà
- Research Institute for Aquaculture No.1. Dinh Bang, Tu Son, Bac Ninh, Viet Nam
| | | | - Kednapat Sriphairoj
- Faculty of Natural Resources and Agro-Industry, Kasetsart University Chalermphrakiat Sakon Nakhon Province Campus, Sakon Nakhon, Thailand
| | - Josiane Ribolli
- Laboratório de Biologia e Cultivo de Peixes de Água Doce, Universidade Federal de Santa Catarina, Florianópolis, SC, Brazil
| | | | - Thomas Desvignes
- Institute of Neuroscience, University of Oregon, Eugene, Oregon, USA
| | | | - Christabel Floi Bucao
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | | | | |
Collapse
|
53
|
Ferchiou S, Caza F, de Boissel PGJ, Villemur R, St-Pierre Y. Applying the concept of liquid biopsy to monitor the microbial biodiversity of marine coastal ecosystems. ISME COMMUNICATIONS 2022; 2:61. [PMID: 37938655 PMCID: PMC9723566 DOI: 10.1038/s43705-022-00145-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 06/28/2022] [Accepted: 07/08/2022] [Indexed: 10/04/2023]
Abstract
Liquid biopsy (LB) is a concept that is rapidly gaining ground in the biomedical field. Its concept is largely based on the detection of circulating cell-free DNA (ccfDNA) fragments that are mostly released as small fragments following cell death in various tissues. A small percentage of these fragments are from foreign (nonself) tissues or organisms. In the present work, we applied this concept to mussels, a sentinel species known for its high filtration capacity of seawater. We exploited the capacity of mussels to be used as natural filters to capture environmental DNA fragments of different origins to provide information on the biodiversity of marine coastal ecosystems. Our results showed that hemolymph of mussels contains DNA fragments that varied considerably in size, ranging from 1 to 5 kb. Shotgun sequencing revealed that a significant amount of DNA fragments had a nonself microbial origin. Among these, we found DNA fragments derived from bacteria, archaea, and viruses, including viruses known to infect a variety of hosts that commonly populate coastal marine ecosystems. Taken together, our study shows that the concept of LB applied to mussels provides a rich and yet unexplored source of knowledge regarding the microbial biodiversity of a marine coastal ecosystem.
Collapse
Affiliation(s)
- Sophia Ferchiou
- INRS-Centre Armand-Frappier Santé Biotechnologie, Laval, Québec, H7V 1B7, Canada
| | - France Caza
- INRS-Centre Armand-Frappier Santé Biotechnologie, Laval, Québec, H7V 1B7, Canada
| | | | - Richard Villemur
- INRS-Centre Armand-Frappier Santé Biotechnologie, Laval, Québec, H7V 1B7, Canada
| | - Yves St-Pierre
- INRS-Centre Armand-Frappier Santé Biotechnologie, Laval, Québec, H7V 1B7, Canada.
| |
Collapse
|
54
|
Pich O, Reyes-Salazar I, Gonzalez-Perez A, Lopez-Bigas N. Discovering the drivers of clonal hematopoiesis. Nat Commun 2022; 13:4267. [PMID: 35871184 PMCID: PMC9308779 DOI: 10.1038/s41467-022-31878-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 07/06/2022] [Indexed: 12/28/2022] Open
Abstract
Mutations in genes that confer a selective advantage to hematopoietic stem cells (HSCs) drive clonal hematopoiesis (CH). While some CH drivers have been identified, the compendium of all genes able to drive CH upon mutations in HSCs remains incomplete. Exploiting signals of positive selection in blood somatic mutations may be an effective way to identify CH driver genes, analogously to cancer. Using the tumor sample in blood/tumor pairs as reference, we identify blood somatic mutations across more than 12,000 donors from two large cancer genomics cohorts. The application of IntOGen, a driver discovery pipeline, to both cohorts, and more than 24,000 targeted sequenced samples yields a list of close to 70 genes with signals of positive selection in CH, available at http://www.intogen.org/ch . This approach recovers known CH genes, and discovers other candidates.
Collapse
Affiliation(s)
- Oriol Pich
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028, Barcelona, Spain
- Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London, UK
| | - Iker Reyes-Salazar
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028, Barcelona, Spain
| | - Abel Gonzalez-Perez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028, Barcelona, Spain.
- Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
| | - Nuria Lopez-Bigas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028, Barcelona, Spain.
- Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
55
|
Athanasouli M, Rödelsperger C. Analysis of repeat elements in the Pristionchus pacificus genome reveals an ancient invasion by horizontally transferred transposons. BMC Genomics 2022; 23:523. [PMID: 35854227 PMCID: PMC9297572 DOI: 10.1186/s12864-022-08731-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 07/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Repetitive sequences and mobile elements make up considerable fractions of individual genomes. While transposition events can be detrimental for organismal fitness, repetitive sequences form an enormous reservoir for molecular innovation. In this study, we aim to add repetitive elements to the annotation of the Pristionchus pacificus genome and assess their impact on novel gene formation. RESULTS Different computational approaches define up to 24% of the P. pacificus genome as repetitive sequences. While retroelements are more frequently found at the chromosome arms, DNA transposons are distributed more evenly. We found multiple DNA transposons, as well as LTR and LINE elements with abundant evidence of expression as single-exon transcripts. When testing whether transposons disproportionately contribute towards new gene formation, we found that roughly 10-20% of genes across all age classes overlap transposable elements with the strongest trend being an enrichment of low complexity regions among the oldest genes. Finally, we characterized a horizontal gene transfer of Zisupton elements into diplogastrid nematodes. These DNA transposons invaded nematodes from eukaryotic donor species and experienced a recent burst of activity in the P. pacificus lineage. CONCLUSIONS The comprehensive annotation of repetitive elements in the P. pacificus genome builds a resource for future functional genomic analyses as well as for more detailed investigations of molecular innovations.
Collapse
Affiliation(s)
- Marina Athanasouli
- Max Planck Institute for Biology, Department for Integrative Evolutionary Biology, Max-Planck-Ring 9, 72076, Tübingen, Germany
| | - Christian Rödelsperger
- Max Planck Institute for Biology, Department for Integrative Evolutionary Biology, Max-Planck-Ring 9, 72076, Tübingen, Germany.
| |
Collapse
|
56
|
Privitera GF, Alaimo S, Ferro A, Pulvirenti A. Virus finding tools: current solutions and limitations. Brief Bioinform 2022; 23:6618234. [PMID: 35753694 DOI: 10.1093/bib/bbac235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/02/2022] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The study of the Human Virome remains challenging nowadays. Viral metagenomics, through high-throughput sequencing data, is the best choice for virus discovery. The metagenomics approach is culture-independent and sequence-independent, helping search for either known or novel viruses. Though it is estimated that more than 40% of the viruses found in metagenomics analysis are not recognizable, we decided to analyze several tools to identify and discover viruses in RNA-seq samples. RESULTS We have analyzed eight Virus Tools for the identification of viruses in RNA-seq data. These tools were compared using a synthetic dataset of 30 viruses and a real one. Our analysis shows that no tool succeeds in recognizing all the viruses in the datasets. So we can conclude that each of these tools has pros and cons, and their choice depends on the application domain. AVAILABILITY Synthetic data used through the review and raw results of their analysis can be found at https://zenodo.org/record/6426147. FASTQ files of real data can be found in GEO (https://www.ncbi.nlm.nih.gov/gds) or ENA (https://www.ebi.ac.uk/ena/browser/home). Raw results of their analysis can be downloaded from https://zenodo.org/record/6425917.
Collapse
Affiliation(s)
- Grete Francesca Privitera
- Department of Physics and Astronomy, University of Catania, Viale A. Doria, 6, 95125, Catania, Italy
| | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dept. of Math. and Comp. Science Viale A. Doria, 6, 95125, Catania, Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dept. of Math. and Comp. Science Viale A. Doria, 6, 95125, Catania, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dept. of Math. and Comp. Science Viale A. Doria, 6, 95125, Catania, Italy
| |
Collapse
|
57
|
Krishnamoorthy M, Ranjan P, Erb-Downward JR, Dickson RP, Wiens J. AMAISE: a machine learning approach to index-free sequence enrichment. Commun Biol 2022; 5:568. [PMID: 35681015 PMCID: PMC9184628 DOI: 10.1038/s42003-022-03498-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 05/18/2022] [Indexed: 11/21/2022] Open
Abstract
Metagenomics holds potential to improve clinical diagnostics of infectious diseases, but DNA from clinical specimens is often dominated by host-derived sequences. To address this, researchers employ host-depletion methods. Laboratory-based host-depletion methods, however, are costly in terms of time and effort, while computational host-depletion methods rely on memory-intensive reference index databases and struggle to accurately classify noisy sequence data. To solve these challenges, we propose an index-free tool, AMAISE (A Machine Learning Approach to Index-Free Sequence Enrichment). Applied to the task of separating host from microbial reads, AMAISE achieves over 98% accuracy. Applied prior to metagenomic classification, AMAISE results in a 14-18% decrease in memory usage compared to using metagenomic classification alone. Our results show that a reference-independent machine learning approach to host depletion allows for accurate and efficient sequence detection.
Collapse
Affiliation(s)
- Meera Krishnamoorthy
- Division of Computer Science and Engineering, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Piyush Ranjan
- Division of Pulmonary & Critical Care Medicine, Department of Medicine, University of Michigan, Ann Arbor, MI, USA
| | - John R Erb-Downward
- Division of Pulmonary & Critical Care Medicine, Department of Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Robert P Dickson
- Division of Pulmonary & Critical Care Medicine, Department of Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Max Harry Weil Institute for Critical Care Research and Innovation, University of Michigan, Ann Arbor, MI, USA
| | - Jenna Wiens
- Division of Computer Science and Engineering, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
58
|
Pietsch GM, Gazis R, Klingeman WE, Huff ML, Staton ME, Kolarik M, Hadziabdic D. Characterization and microsatellite marker development for a common bark and ambrosia beetle associate, Geosmithia obscura. Microbiologyopen 2022; 11:e1286. [PMID: 35765178 PMCID: PMC9108439 DOI: 10.1002/mbo3.1286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 04/27/2022] [Indexed: 11/12/2022] Open
Abstract
Symbioses between Geosmithia fungi and wood-boring and bark beetles seldom result in disease induction within the plant host. Yet, exceptions exist such as Geosmithia morbida, the causal agent of Thousand Cankers Disease (TCD) of walnuts and wingnuts, and Geosmithia sp. 41, the causal agent of Foamy Bark Canker disease of oaks. Isolates of G. obscura were recovered from black walnut trees in eastern Tennessee and at least one isolate induced cankers following artificial inoculation. Due to the putative pathogenicity and lack of recovery of G. obscura from natural lesions, a molecular diagnostic screening tool was developed using microsatellite markers mined from the G. obscura genome. A total of 3256 candidate microsatellite markers were identified (2236, 789, 137 di-, tri-, and tetranucleotide motifs, respectively), with 2011, 703, 101 di-, tri-, and tetranucleotide motifs, respectively, containing markers with primers. From these, 75 microsatellite markers were randomly selected, screened, and optimized, resulting in 28 polymorphic markers that yielded single, consistently recovered bands, which were used in downstream analyses. Five of these microsatellite markers were found to be specific to G. obscura and did not cross-amplify into other, closely related species. Although the remaining tested markers could be useful, they cross-amplified within different Geosmithia species, making them not reliable for G. obscura detection. Five novel microsatellite markers (GOBS9, GOBS10, GOBS41, GOBS43, and GOBS50) were developed based on the G. obscura genome. These species-specific microsatellite markers are available as a tool for use in molecular diagnostics and can assist future surveillance studies.
Collapse
Affiliation(s)
- Grace M. Pietsch
- Department of Plant SciencesThe University of TennesseeKnoxvilleTennesseeUSA
| | - Romina Gazis
- Department of Plant PathologyUniversity of FloridaHomesteadFloridaUSA
| | | | - Matthew L. Huff
- Department of Entomology and Plant PathologyThe University of TennesseeKnoxvilleTennesseeUSA
| | - Margaret E. Staton
- Department of Entomology and Plant PathologyThe University of TennesseeKnoxvilleTennesseeUSA
| | - Miroslav Kolarik
- Institute of MicrobiologyCzech Academy of SciencesPragueCzech Republic
| | - Denita Hadziabdic
- Department of Entomology and Plant PathologyThe University of TennesseeKnoxvilleTennesseeUSA
| |
Collapse
|
59
|
Hamm TP, Boggess SL, Kandel JS, Staton ME, Huff ML, Hadziabdic D, Shoemaker D, Adamczyk Jr. JJ, Nowicki M, Trigiano RN. Development and Characterization of 20 Genomic SSR Markers for Ornamental Cultivars of Weigela. PLANTS 2022; 11:plants11111444. [PMID: 35684218 PMCID: PMC9182808 DOI: 10.3390/plants11111444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 05/20/2022] [Accepted: 05/27/2022] [Indexed: 12/03/2022]
Abstract
Weigela (Caprifoliaceae) is a genus of ornamental plants popular for its phenotypic variation and hardiness, that includes species hybridized to produce the commercially available cultivars. Despite its popularity, limited genetic resources exist for the genus. Twenty genomic simple sequence repeat (gSSR) markers distributed across the genome were developed using low coverage whole-genome sequencing data of Weigela Spilled Wine®. A cross-amplification evaluation with these 20 gSSR markers on a collection of 18 Weigela cultivars revealed a total of 111 unique alleles, including 36 private alleles. A diagrammatic key was constructed to identify cultivars using only six of the gSSR markers, demonstrating the newly developed gSSR markers are immediately useful for cultivar identification. Future uses could include breeding with marker-assisted selection, determining the history of hybridization of the current cultivated lines, aiding in the construction of genetic maps, and assessing the patterns of population genetic structure of Weigela spp.
Collapse
Affiliation(s)
- Trinity P. Hamm
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.E.S.); (M.L.H.); (D.H.); (D.S.); (M.N.)
- Correspondence: (T.P.H.); (S.L.B.); (R.N.T.)
| | - Sarah L. Boggess
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.E.S.); (M.L.H.); (D.H.); (D.S.); (M.N.)
- Correspondence: (T.P.H.); (S.L.B.); (R.N.T.)
| | - Jinita Sthapit Kandel
- Thad Cochran Southern Horticultural Research Laboratory, USDA ARS, Poplarville, MS 39470, USA; (J.S.K.); (J.J.A.J.)
| | - Margaret E. Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.E.S.); (M.L.H.); (D.H.); (D.S.); (M.N.)
| | - Matthew L. Huff
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.E.S.); (M.L.H.); (D.H.); (D.S.); (M.N.)
| | - Denita Hadziabdic
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.E.S.); (M.L.H.); (D.H.); (D.S.); (M.N.)
| | - DeWayne Shoemaker
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.E.S.); (M.L.H.); (D.H.); (D.S.); (M.N.)
| | - John J. Adamczyk Jr.
- Thad Cochran Southern Horticultural Research Laboratory, USDA ARS, Poplarville, MS 39470, USA; (J.S.K.); (J.J.A.J.)
| | - Marcin Nowicki
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.E.S.); (M.L.H.); (D.H.); (D.S.); (M.N.)
| | - Robert N. Trigiano
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA; (M.E.S.); (M.L.H.); (D.H.); (D.S.); (M.N.)
- Correspondence: (T.P.H.); (S.L.B.); (R.N.T.)
| |
Collapse
|
60
|
Kołomański M, Szyda J, Frąszczak M, Mielczarek M. DNA sequence features underlying large-scale duplications and deletions in human. J Appl Genet 2022; 63:527-533. [PMID: 35590085 PMCID: PMC9365719 DOI: 10.1007/s13353-022-00704-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 03/22/2022] [Accepted: 05/05/2022] [Indexed: 11/25/2022]
Abstract
Copy number variants (CNVs) may cover up to 12% of the whole genome and have substantial impact on phenotypes. We used 5867 duplications and 33,181 deletions available from the 1000 Genomes Project to characterise genomic regions vulnerable to CNV formation and to identify sequence features characteristic for those regions. The GC content for deletions was lower and for duplications was higher than for randomly selected regions. In regions flanking deletions and downstream of duplications, content was higher than in the random sequences, but upstream of duplication content was lower. In duplications and downstream of deletion regions, the percentage of low-complexity sequences was not different from the randomised data. In deletions and upstream of CNVs, it was higher, while for downstream of duplications, it was lower as compared to random sequences. The majority of CNVs intersected with genic regions — mainly with introns. GC content may be associated with CNV formation and CNVs, especially duplications are initiated in low-complexity regions. Moreover, CNVs located or overlapped with introns indicate their role in shaping intron variability. Genic CNV regions were enriched in many essential biological processes such as cell adhesion, synaptic transmission, transport, cytoskeleton organization, immune response and metabolic mechanisms, which indicates that these large-scaled variants play important biological roles.
Collapse
Affiliation(s)
- Mateusz Kołomański
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw, Poland
| | - Joanna Szyda
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw, Poland
| | - Magdalena Frąszczak
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw, Poland
| | - Magda Mielczarek
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw, Poland.
| |
Collapse
|
61
|
Jasonowicz AJ, Simeon A, Zahm M, Cabau C, Klopp C, Roques C, Iampietro C, Lluch J, Donnadieu C, Parrinello H, Drinan DP, Hauser L, Guiguen Y, Planas JV. Generation of a chromosome‐level genome assembly for Pacific halibut (
Hippoglossus stenolepis
) and characterization of its sex‐determining genomic region. Mol Ecol Resour 2022; 22:2685-2700. [PMID: 35569134 PMCID: PMC9541706 DOI: 10.1111/1755-0998.13641] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/22/2022] [Accepted: 05/11/2022] [Indexed: 12/01/2022]
Abstract
The Pacific halibut (Hippoglossus stenolepis) is a key species in the North Pacific Ocean and Bering Sea ecosystems, where it also supports important fisheries. However, the lack of genomic resources limits our understanding of evolutionary, environmental and anthropogenic forces affecting key life history characteristics of Pacific halibut and prevents the application of genomic tools in fisheries management and conservation efforts. In the present study, we report on the first generation of a high‐quality chromosome‐level assembly of the Pacific halibut genome, with an estimated size of 602 Mb, 24 chromosome‐length scaffolds that contain 99.8% of the assembly and a N50 scaffold length of 27.3 Mb. In the first application of this important resource, we conducted genome‐wide analyses of sex‐specific genetic variation by pool sequencing and characterized a potential sex‐determining region in chromosome 9 with a high density of female‐specific SNPs. Within this region, we identified the bmpr1ba gene as a potential candidate for master sex‐determining (MSD) gene. bmpr1ba is a member of the TGF‐β family that in teleosts has provided the largest number of MSD genes, including a paralogue of this gene in Atlantic herring. The genome assembly constitutes an essential resource for future studies on Pacific halibut population structure and dynamics, evolutionary history and responses to environmental and anthropogenic influences. Furthermore, the genomic location of the sex‐determining region in Pacific halibut has been identified and a putative candidate MSD gene has been proposed, providing further support for the rapid evolution of sex‐determining mechanisms in teleost fish.
Collapse
Affiliation(s)
| | - Anna Simeon
- International Pacific Halibut Commission Seattle, WA 98199 USA
- Present address: School of Aquatic and Fishery Science University of Washington Seattle WA
| | - Margot Zahm
- SIGENAE, Bioinfo Genotoul, UMIAT, INRAE Castanet‐Tolosan France
| | - Cédric Cabau
- SIGENAE, GenPhySE Université de Toulouse INRAE, ENVT, 31326 Castanet‐Tolosan France
| | | | - Céline Roques
- INRAE, GeT‐PlaGe, Genotoul, 31326 Castanet‐Tolosan France
| | | | - Jérôme Lluch
- INRAE, GeT‐PlaGe, Genotoul, 31326 Castanet‐Tolosan France
| | | | - Hugues Parrinello
- MGX‐Montpellier GenomiX, Univ. Montpellier, CNRS, INSERM Montpellier France
| | - Daniel P. Drinan
- School of Aquatic and Fishery Science University of Washington Seattle, WA 98105 USA
| | - Lorenz Hauser
- School of Aquatic and Fishery Science University of Washington Seattle, WA 98105 USA
| | | | - Josep V. Planas
- International Pacific Halibut Commission Seattle, WA 98199 USA
| |
Collapse
|
62
|
Kyndt JA, Aviles FA, Imhoff JF, Künzel S, Neulinger SC, Meyer TE. Comparative Genome Analysis of the Photosynthetic Betaproteobacteria of the Genus Rhodocyclus: Heterogeneity within Strains Assigned to Rhodocyclus tenuis and Description of Rhodocyclus gracilis sp. nov. as a New Species. Microorganisms 2022; 10:microorganisms10030649. [PMID: 35336224 PMCID: PMC8954225 DOI: 10.3390/microorganisms10030649] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 03/11/2022] [Accepted: 03/15/2022] [Indexed: 01/09/2023] Open
Abstract
The genome sequences for Rhodocyclus purpureus DSM 168T and four strains assigned to Rhodocyclus tenuis (DSM 110, DSM 111, DSM 112, and IM 230) have been determined. One of the strains studied (IM 230) has an average nucleotide identity (ANI) of 97% to the recently reported genome of the type strain DSM 109 of Rcy. tenuis and is regarded as virtually identical at the species level. The ANI of 80% for three other strains (DSM 110, DSM 111, DSM 112) to the type strain of Rcy. tenuis points to a differentiation of these at the species level. Rcy. purpureus is equidistant from Rcy. tenuis and the new species, based on both ANI (78–80%) and complete proteome comparisons (70% AAI). Strains DSM 110, DSM 111, and DSM 112 are very closely related to each other based on ANI, whole genome, and proteome comparisons but clearly distinct from the Rcy. tenuis type strain DSM 109. In addition to the whole genome differentiation, these three strains also contain unique genetic differences in cytochrome genes and contain genes for an anaerobic cobalamin synthesis pathway that is lacking from both Rcy. tenuis and Rcy. purpureus. Based on genomic and genetic differences, these three strains should be considered to represent a new species, which is distinctly different from both Rcy. purpureus and Rcy. tenuis, for which the new name Rhodocyclus gracilis sp. nov. is proposed.
Collapse
Affiliation(s)
- John A. Kyndt
- College of Science and Technology, Bellevue University, Bellevue, NE 68005, USA;
- Correspondence:
| | - Fabiola A. Aviles
- College of Science and Technology, Bellevue University, Bellevue, NE 68005, USA;
| | - Johannes F. Imhoff
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD3 Marine Symbioses, Düsternbrooker Weg 20, 24105 Kiel, Germany;
| | - Sven Künzel
- Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany;
| | | | - Terrance E. Meyer
- Department of Biochemistry, University of Arizona, Tucson, AZ 85721, USA;
| |
Collapse
|
63
|
Talenti A, Powell J, Hemmink JD, Cook EAJ, Wragg D, Jayaraman S, Paxton E, Ezeasor C, Obishakin ET, Agusi ER, Tijjani A, Amanyire W, Muhanguzi D, Marshall K, Fisch A, Ferreira BR, Qasim A, Chaudhry U, Wiener P, Toye P, Morrison LJ, Connelley T, Prendergast JGD. A cattle graph genome incorporating global breed diversity. Nat Commun 2022; 13:910. [PMID: 35177600 PMCID: PMC8854726 DOI: 10.1038/s41467-022-28605-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 01/20/2022] [Indexed: 11/28/2022] Open
Abstract
Despite only 8% of cattle being found in Europe, European breeds dominate current genetic resources. This adversely impacts cattle research in other important global cattle breeds, especially those from Africa for which genomic resources are particularly limited, despite their disproportionate importance to the continent's economies. To mitigate this issue, we have generated assemblies of African breeds, which have been integrated with genomic data for 294 diverse cattle into a graph genome that incorporates global cattle diversity. We illustrate how this more representative reference assembly contains an extra 116.1 Mb (4.2%) of sequence absent from the current Hereford sequence and consequently inaccessible to current studies. We further demonstrate how using this graph genome increases read mapping rates, reduces allelic biases and improves the agreement of structural variant calling with independent optical mapping data. Consequently, we present an improved, more representative, reference assembly that will improve global cattle research.
Collapse
Affiliation(s)
- A Talenti
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - J Powell
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - J D Hemmink
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- The International Livestock Research Institute, PO Box 30709, Nairobi, Kenya
- Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK
- Centre for Tropical Livestock Genetics and Health, ILRI Kenya, Nairobi, 30709-00100, Kenya
| | - E A J Cook
- The International Livestock Research Institute, PO Box 30709, Nairobi, Kenya
- Centre for Tropical Livestock Genetics and Health, ILRI Kenya, Nairobi, 30709-00100, Kenya
| | - D Wragg
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK
| | - S Jayaraman
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - E Paxton
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - C Ezeasor
- Department of Veterinary Pathology and Microbiology, University of Nigeria, Nsukka, Enugu State, Nigeria
| | - E T Obishakin
- Biotechnology Division, National Veterinary Research Institute, Vom, Plateau State, Nigeria
- Biomedical Research Centre, Ghent University Global Campus, Songdo, Incheon, South Korea
| | - E R Agusi
- Biotechnology Division, National Veterinary Research Institute, Vom, Plateau State, Nigeria
- Biomedical Research Centre, Ghent University Global Campus, Songdo, Incheon, South Korea
| | - A Tijjani
- International Livestock Research Institute (ILRI) PO, 5689, Addis Ababa, Ethiopia
- Centre for Tropical Livestock Genetics and Health (CTLGH), ILRI Ethiopia, PO Box 5689, Addis Ababa, Ethiopia
| | - W Amanyire
- School of Biosecurity, Biotechnology and Laboratory Sciences (SBLS), College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, P.O Box 7062, Kampala, Uganda
| | - D Muhanguzi
- School of Biosecurity, Biotechnology and Laboratory Sciences (SBLS), College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, P.O Box 7062, Kampala, Uganda
| | - K Marshall
- The International Livestock Research Institute, PO Box 30709, Nairobi, Kenya
- Centre for Tropical Livestock Genetics and Health, ILRI Kenya, Nairobi, 30709-00100, Kenya
| | - A Fisch
- Ribeirão Preto College of Nursing, University of Sao Paulo, Ribeirão Preto, SP, Brazil
| | - B R Ferreira
- Ribeirão Preto College of Nursing, University of Sao Paulo, Ribeirão Preto, SP, Brazil
| | - A Qasim
- Faculty of Veterinary and Animal Sciences, Gomal University, Dera Ismail Khan, Pakistan
| | - U Chaudhry
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - P Wiener
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - P Toye
- The International Livestock Research Institute, PO Box 30709, Nairobi, Kenya
- Centre for Tropical Livestock Genetics and Health, ILRI Kenya, Nairobi, 30709-00100, Kenya
| | - L J Morrison
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK
| | - T Connelley
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK
| | - J G D Prendergast
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
- Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK.
| |
Collapse
|
64
|
Edgar RC, Taylor B, Lin V, Altman T, Barbera P, Meleshko D, Lohr D, Novakovsky G, Buchfink B, Al-Shayeb B, Banfield JF, de la Peña M, Korobeynikov A, Chikhi R, Babaian A. Petabase-scale sequence alignment catalyses viral discovery. Nature 2022; 602:142-147. [PMID: 35082445 DOI: 10.1038/s41586-021-04332-2] [Citation(s) in RCA: 171] [Impact Index Per Article: 85.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 12/10/2021] [Indexed: 01/20/2023]
Abstract
Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.
Collapse
Affiliation(s)
| | - Brie Taylor
- Independent researcher, Vancouver, British Columbia, Canada
| | - Victor Lin
- Independent researcher, Seattle, WA, USA
| | | | - Pierre Barbera
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, St Petersburg State University, St Petersburg, Russia
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | - Gherman Novakovsky
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - Benjamin Buchfink
- Computational Biology Group, Max Planck Institute for Biology, Tübingen, Germany
| | - Basem Al-Shayeb
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Jillian F Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, CA, USA
| | - Marcos de la Peña
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, St Petersburg State University, St Petersburg, Russia
- Department of Statistical Modelling, St Petersburg State University, St Petersburg, Russia
| | - Rayan Chikhi
- G5 Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Artem Babaian
- Independent researcher, Vancouver, British Columbia, Canada.
| |
Collapse
|
65
|
Palevich N, Maclean PH. Sequencing and Reconstructing Helminth Mitochondrial Genomes Directly from Genomic Next-Generation Sequencing Data. Methods Mol Biol 2022; 2369:27-40. [PMID: 34313982 DOI: 10.1007/978-1-0716-1681-9_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
We present a detailed method for extraction of high-molecular weight genomic DNA suitable for numerous DNA sequencing applications, and a straightforward in silico approach for reconstructing novel mitochondrial (mt) genomes directly from total genomic DNA extracts derived from next-generation sequencing (NGS) data sets. The in silico post-sequencing pipeline described is fast, accurate, and highly efficient, with modest memory requirements that can be performed using a standard desktop computer. The approach is particularly effective for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information currently available and overcomes many of the limitations of traditional strategies. The described methodologies are also applicable for metagenomics sequencing from mixed or pooled samples containing multiple species and subsequent specific assembly of specific mitochondrial genomes.
Collapse
Affiliation(s)
- Nikola Palevich
- AgResearch Limited, Grasslands Research Centre, Palmerston North, New Zealand.
| | - Paul Haydon Maclean
- AgResearch Limited, Grasslands Research Centre, Palmerston North, New Zealand
| |
Collapse
|
66
|
Katju V, Konrad A, Deiss TC, Bergthorsson U. Mutation rate and spectrum in obligately outcrossing Caenorhabditis elegans mutation accumulation lines subjected to RNAi-induced knockdown of the mismatch repair gene msh-2. G3 GENES|GENOMES|GENETICS 2022; 12:6407146. [PMID: 34849777 PMCID: PMC8727991 DOI: 10.1093/g3journal/jkab364] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 10/13/2021] [Indexed: 01/09/2023]
Abstract
DNA mismatch repair (MMR), an evolutionarily conserved repair pathway shared by prokaryotic and eukaryotic species alike, influences molecular evolution by detecting and correcting mismatches, thereby protecting genetic fidelity, reducing the mutational load, and preventing lethality. Herein we conduct the first genome-wide evaluation of the alterations to the mutation rate and spectrum under impaired activity of the MutSα homolog, msh-2, in Caenorhabditis elegans male–female fog-2(lf) lines. We performed mutation accumulation (MA) under RNAi-induced knockdown of msh-2 for up to 50 generations, followed by next-generation sequencing of 19 MA lines and the ancestral control. msh-2 impairment in the male–female background substantially increased the frequency of nuclear base substitutions (∼23×) and small indels (∼328×) relative to wildtype hermaphrodites. However, we observed no increase in the mutation rates of mtDNA, and copy-number changes of single-copy genes. There was a marked increase in copy-number variation of rDNA genes under MMR impairment. In C. elegans, msh-2 repairs transitions more efficiently than transversions and increases the AT mutational bias relative to wildtype. The local sequence context, including sequence complexity, G + C-content, and flanking bases influenced the mutation rate. The X chromosome exhibited lower substitution and higher indel rates than autosomes, which can either result from sex-specific mutation rates or a nonrandom distribution of mutable sites between chromosomes. Provided the observed difference in mutational pattern is mostly due to MMR impairment, our results indicate that the specificity of MMR varies between taxa, and is more efficient in detecting and repairing small indels in eukaryotes relative to prokaryotes.
Collapse
Affiliation(s)
- Vaishali Katju
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77845, USA
| | - Anke Konrad
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77845, USA
- Faculdade de Ciência da Universidade de Lisboa (FCUL), CE3C—Centre for Ecology, Evolution and Environmental Changes, 1749-016 Lisboa, Portugal
| | - Thaddeus C Deiss
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77845, USA
| | - Ulfar Bergthorsson
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77845, USA
| |
Collapse
|
67
|
Stritt C, Gimmi EL, Wyler M, Bakali AH, Skalska A, Hasterok R, Mur LAJ, Pecchioni N, Roulin AC. Migration without interbreeding: Evolutionary history of a highly selfing Mediterranean grass inferred from whole genomes. Mol Ecol 2022; 31:70-85. [PMID: 34601787 PMCID: PMC9298040 DOI: 10.1111/mec.16207] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 09/07/2021] [Accepted: 09/28/2021] [Indexed: 11/30/2022]
Abstract
Wild plant populations show extensive genetic subdivision and are far from the ideal of panmixia which permeates population genetic theory. Understanding the spatial and temporal scale of population structure is therefore fundamental for empirical population genetics - and of interest in itself, as it yields insights into the history and biology of a species. In this study we extend the genomic resources for the wild Mediterranean grass Brachypodium distachyon to investigate the scale of population structure and its underlying history at whole-genome resolution. A total of 86 accessions were sampled at local and regional scales in Italy and France, which closes a conspicuous gap in the collection for this model organism. The analysis of 196 accessions, spanning the Mediterranean from Spain to Iraq, suggests that the interplay of high selfing and seed dispersal rates has shaped genetic structure in B. distachyon. At the continental scale, the evolution in B. distachyon is characterized by the independent expansion of three lineages during the Upper Pleistocene. Today, these lineages may occur on the same meadow yet do not interbreed. At the regional scale, dispersal and selfing interact and maintain high genotypic diversity, thus challenging the textbook notion that selfing in finite populations implies reduced diversity. Our study extends the population genomic resources for B. distachyon and suggests that an important use of this wild plant model is to investigate how selfing and dispersal, two processes typically studied separately, interact in colonizing plant species.
Collapse
Affiliation(s)
- Christoph Stritt
- Institute for Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| | - Elena L Gimmi
- Institute for Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| | - Michele Wyler
- Institute for Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| | - Abdelmonaim H Bakali
- National Institute of Agronomy, Regional Center of Errachidia, Errachidia, Morocco
| | - Aleksandra Skalska
- Institute of Biology, Biotechnology and Environmental Protection, Faculty of Natural Sciences, University of Silesia in Katowice, Katowice, Poland
| | - Robert Hasterok
- Institute of Biology, Biotechnology and Environmental Protection, Faculty of Natural Sciences, University of Silesia in Katowice, Katowice, Poland
| | - Luis A J Mur
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Wales, UK
| | - Nicola Pecchioni
- Research Centre for Cereal and Industrial Crops, CREA - Council for Agricultural Research and Economics, Foggia, Italy
| | - Anne C Roulin
- Institute for Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| |
Collapse
|
68
|
Gudmundsson S, Singer-Berk M, Watts NA, Phu W, Goodrich JK, Solomonson M, Rehm HL, MacArthur DG, O'Donnell-Luria A. Variant interpretation using population databases: Lessons from gnomAD. Hum Mutat 2021; 43:1012-1030. [PMID: 34859531 PMCID: PMC9160216 DOI: 10.1002/humu.24309] [Citation(s) in RCA: 197] [Impact Index Per Article: 65.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 11/02/2021] [Accepted: 11/28/2021] [Indexed: 01/22/2023]
Abstract
Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We introduce key features including allele frequency, per-base expression levels, constraint scores, and variant co-occurrence, alongside guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease.
Collapse
Affiliation(s)
- Sanna Gudmundsson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - William Phu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Julia K Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | | | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Centre for Population Genomics, Garvan Institute of Medical Research, University of New South Wales Sydney, Sydney, New South Wales, Australia.,Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
69
|
Aberrant integration of Hepatitis B virus DNA promotes major restructuring of human hepatocellular carcinoma genome architecture. Nat Commun 2021; 12:6910. [PMID: 34824211 PMCID: PMC8617174 DOI: 10.1038/s41467-021-26805-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 10/22/2021] [Indexed: 12/24/2022] Open
Abstract
Most cancers are characterized by the somatic acquisition of genomic rearrangements during tumour evolution that eventually drive the oncogenesis. Here, using multiplatform sequencing technologies, we identify and characterize a remarkable mutational mechanism in human hepatocellular carcinoma caused by Hepatitis B virus, by which DNA molecules from the virus are inserted into the tumour genome causing dramatic changes in its configuration, including non-homologous chromosomal fusions, dicentric chromosomes and megabase-size telomeric deletions. This aberrant mutational mechanism, present in at least 8% of all HCC tumours, can provide the driver rearrangements that a cancer clone requires to survive and grow, including loss of relevant tumour suppressor genes. Most of these events are clonal and occur early during liver cancer evolution. Real-time timing estimation reveals some HBV-mediated rearrangements occur as early as two decades before cancer diagnosis. Overall, these data underscore the importance of characterising liver cancer genomes for patterns of HBV integration.
Collapse
|
70
|
Harrison PM. fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences. PeerJ 2021; 9:e12363. [PMID: 34760378 PMCID: PMC8557692 DOI: 10.7717/peerj.12363] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 09/30/2021] [Indexed: 12/12/2022] Open
Abstract
Compositionally-biased (CB) regions in biological sequences are enriched for a subset of sequence residue types. These can be shorter regions with a concentrated bias (i.e., those termed ‘low-complexity’), or longer regions that have a compositional skew. These regions comprise a prominent class of the uncharacterized ‘dark matter’ of the protein universe. Here, I report the latest version of the fLPS package for the annotation of CB regions, which includes added consideration of DNA sequences, to label the eight possible biased regions of DNA. In this version, the user is now able to restrict analysis to a specified subset of residue types, and also to filter for previously annotated domains to enable detection of discontinuous CB regions. A ‘thorough’ option has been added which enables the labelling of subtler biases, typically made from a skew for several residue types. In the output, protein CB regions are now labelled with bias classes reflecting the physico-chemical character of the biasing residues. The fLPS 2.0 package is available from: https://github.com/pmharrison/flps2 or in a Supplemental File of this paper.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, Canada
| |
Collapse
|
71
|
Contreras-Moreira B, Filippi CV, Naamati G, Girón CG, Allen JE, Flicek P. K-mer counting and curated libraries drive efficient annotation of repeats in plant genomes. THE PLANT GENOME 2021; 14:e20143. [PMID: 34562304 PMCID: PMC7614178 DOI: 10.1002/tpg2.20143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 07/06/2021] [Indexed: 06/13/2023]
Abstract
The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis, or pangenome exploration. Although homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here, we benchmarked a two-step approach, where repeats were first called by k-mer counting and then annotated by comparison to curated libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, with the k-mer-based Repeat Detector (Red) and two repeat libraries (REdat, last updated in 2013, and nrTEplants, curated for this work). Custom libraries produced by RepeatModeler were also tested. We obtained repeated genome fractions that matched those reported in the literature but with shorter repeated elements than those produced directly by sequence homology. Inspection of the masked regions that overlapped genes revealed no preference for specific protein domains. Most Red-masked sequences could be successfully classified by sequence similarity, with the complete protocol taking less than 2 h on a desktop Linux box. A guide to curating your own repeat libraries and the scripts for masking and annotating plant genomes can be obtained at https://github.com/Ensembl/plant-scripts.
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla V Filippi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Instituto de Biotecnología, Centro de Investigaciones en Ciencias Veterinarias y Agronómicas (CICVyA), Instituto Nacional de Tecnología Agropecuaria (INTA); Instituto de Agrobiotecnología y Biología Molecular (IABIMO), INTA-Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) Nicolas Repetto y Los Reseros s/n (1686), Hurlingham, Buenos Aires, Argentina
- CONICET, Av Rivadavia 1917, C1033AAJ Ciudad de Buenos Aires, Argentina
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - James E Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
72
|
Dickson ZW, Hackenberger D, Kuch M, Marzok A, Banerjee A, Rossi L, Klowak JA, Fox-Robichaud A, Mossmann K, Miller MS, Surette MG, Golding GB, Poinar H. Probe design for simultaneous, targeted capture of diverse metagenomic targets. CELL REPORTS METHODS 2021; 1:100069. [PMID: 35474894 PMCID: PMC9017208 DOI: 10.1016/j.crmeth.2021.100069] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/10/2021] [Accepted: 08/05/2021] [Indexed: 11/20/2022]
Abstract
The compounding challenges of low signal, high background, and uncertain targets plague many metagenomic sequencing efforts. One solution has been DNA capture, wherein probes are designed to hybridize with target sequences, enriching them in relation to their background. However, balancing probe depth with breadth of capture is challenging for diverse targets. To find this balance, we have developed the HUBDesign pipeline, which makes use of sequence homology to design probes at multiple taxonomic levels. This creates an efficient probe set capable of simultaneously and specifically capturing known and related sequences. We validated HUBDesign by generating probe sets targeting the breadth of coronavirus diversity, as well as a suite of bacterial pathogens often underlying sepsis. In separate experiments demonstrating significant, simultaneous enrichment, we captured SARS-CoV-2 and HCoV-NL63 in a human RNA background and seven bacterial strains in human blood. HUBDesign (https://github.com/zacherydickson/HUBDesign) has broad applicability wherever there are multiple organisms of interest.
Collapse
Affiliation(s)
- Zachery W. Dickson
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Dirk Hackenberger
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Melanie Kuch
- McMaster aDNA Center, Department of Anthropology, McMaster University, Hamilton, ON L8S 4L9, Canada
| | - Art Marzok
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster Immunology Research Center, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Arinjay Banerjee
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster Immunology Research Center, McMaster University, Hamilton, ON L8S 4K1, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Hamilton, ON L8S 4K1, Canada
- Vaccine and Infectious Disease Organization, Department of Veterinary Microbiology, University of Saskatchewan, Saskatoon, SK S7N 5E3, Canada
| | - Laura Rossi
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
| | | | | | - Karen Mossmann
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster Immunology Research Center, McMaster University, Hamilton, ON L8S 4K1, Canada
- Department of Medicine, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Matthew S. Miller
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster Immunology Research Center, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Michael G. Surette
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- Department of Medicine, McMaster University, Hamilton, ON L8S 4K1, Canada
| | | | - Hendrik Poinar
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster aDNA Center, Department of Anthropology, McMaster University, Hamilton, ON L8S 4L9, Canada
| |
Collapse
|
73
|
Stephens Z, O’Brien D, Dehankar M, Roberts LR, Iyer RK, Kocher JP. Exogene: A performant workflow for detecting viral integrations from paired-end next-generation sequencing data. PLoS One 2021; 16:e0250915. [PMID: 34550971 PMCID: PMC8457494 DOI: 10.1371/journal.pone.0250915] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 07/08/2021] [Indexed: 01/14/2023] Open
Abstract
The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene's read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with long read validation. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are also supported by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq and targeted capture.
Collapse
Affiliation(s)
- Zachary Stephens
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL, United States of America
| | - Daniel O’Brien
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States of America
| | - Mrunal Dehankar
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States of America
| | - Lewis R. Roberts
- Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States of America
| | - Ravishankar K. Iyer
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL, United States of America
| | - Jean-Pierre Kocher
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States of America
| |
Collapse
|
74
|
Katz KS, Shutov O, Lapoint R, Kimelman M, Brister JR, O’Sullivan C. STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions. Genome Biol 2021; 22:270. [PMID: 34544477 PMCID: PMC8450716 DOI: 10.1186/s13059-021-02490-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 09/08/2021] [Indexed: 02/03/2024] Open
Abstract
Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.
Collapse
Affiliation(s)
- Kenneth S. Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Oleg Shutov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Richard Lapoint
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Michael Kimelman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - J. Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Christopher O’Sullivan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| |
Collapse
|
75
|
Li R, Yang P, Dai X, Asadollahpour Nanaei H, Fang W, Yang Z, Cai Y, Zheng Z, Wang X, Jiang Y. A near complete genome for goat genetic and genomic research. Genet Sel Evol 2021; 53:74. [PMID: 34507524 PMCID: PMC8434745 DOI: 10.1186/s12711-021-00668-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 09/01/2021] [Indexed: 01/29/2023] Open
Abstract
Background Goat, one of the first domesticated livestock, is a worldwide important species both culturally and economically. The current goat reference genome, known as ARS1, is reported as the first nonhuman genome assembly using 69× PacBio sequencing. However, ARS1 suffers from incomplete X chromosome and highly fragmented Y chromosome scaffolds. Results Here, we present a very high-quality de novo genome assembly, Saanen_v1, from a male Saanen dairy goat, with the first goat Y chromosome scaffold based on 117× PacBio long-read sequencing and 118× Hi-C data. Saanen_v1 displays a high level of completeness thanks to the presence of centromeric and telomeric repeats at the proximal and distal ends of two-thirds of the autosomes, and a much reduced number of gaps (169 vs. 773). The completeness and accuracy of the Saanen_v1 genome assembly are also evidenced by more assembled sequences on the chromosomes (2.63 Gb for Saanen_v1 vs. 2.58 Gb for ARS1), a slightly increased mapping ratio for transcriptomic data, and more genes anchored to chromosomes. The eight putative large assembly errors (1 to ~ 7 Mb each) found in ARS1 were amended, and for the first time, the substitution rate of this ruminant Y chromosome was estimated. Furthermore, sequence improvement in Saanen_v1, compared with ARS1, enables us to assign the likely correct positions for 4.4% of the single nucleotide polymorphism (SNP) probes in the widely used GoatSNP50 chip. Conclusions The updated goat genome assembly including both sex chromosomes (X and Y) and the autosomes with high-resolution quality will serve as a valuable resource for goat genetic research and applications. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00668-5.
Collapse
Affiliation(s)
- Ran Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China
| | - Peng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China
| | - Xuelei Dai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China
| | - Hojjat Asadollahpour Nanaei
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China
| | - Wenwen Fang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China
| | - Zhirui Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China
| | - Yudong Cai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China
| | - Zhuqing Zheng
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China
| | - Xihong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Xinong Rd 22, Yangling, 712100, Shaanxi, China.
| |
Collapse
|
76
|
Urantówka AD, Kroczak A, Strzała T, Zaniewicz G, Kurkowski M, Mackiewicz P. Mitogenomes of Accipitriformes and Cathartiformes Were Subjected to Ancestral and Recent Duplications Followed by Gradual Degeneration. Genome Biol Evol 2021; 13:evab193. [PMID: 34432018 PMCID: PMC8435663 DOI: 10.1093/gbe/evab193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2021] [Indexed: 11/25/2022] Open
Abstract
The rearrangement of 37 genes with one control region, firstly identified in Gallus gallus mitogenome, is believed to be ancestral for all Aves. However, mitogenomic sequences obtained in recent years revealed that many avian mitogenomes contain duplicated regions that were omitted in previous genomic versions. Their evolution and mechanism of duplication are still poorly understood. The order of Accipitriformes is especially interesting in this context because its representatives contain a duplicated control region in various stages of degeneration. Therefore, we applied an appropriate PCR strategy to look for duplications within the mitogenomes of the early diverged species Sagittarius serpentarius and Cathartiformes, which is a sister order to Accipitriformes. The analyses revealed the same duplicated gene order in all examined taxa and the common ancestor of these groups. The duplicated regions were subjected to gradual degeneration and homogenization during concerted evolution. The latter process occurred recently in the species of Cathartiformes as well as in the early diverged lineages of Accipitriformes, that is, Sagittarius serpentarius and Pandion haliaetus. However, in other lineages, that is, Pernis ptilorhynchus, as well as representatives of Aegypiinae, Aquilinae, and five related subfamilies of Accipitriformes (Accipitrinae, Circinae, Buteoninae, Haliaeetinae, and Milvinae), the duplications were evolving independently for at least 14-47 Myr. Different portions of control regions in Cathartiformes showed conflicting phylogenetic signals indicating that some sections of these regions were homogenized at a frequency higher than the rate of speciation, whereas others have still evolved separately.
Collapse
Affiliation(s)
- Adam Dawid Urantówka
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland
| | - Aleksandra Kroczak
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, Wrocław University, Poland
| | - Tomasz Strzała
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland
| | - Grzegorz Zaniewicz
- Department of Vertebrate Ecology and Zoology, Avian Ecophysiology Unit, University of Gdańsk, Poland
| | - Marcin Kurkowski
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland
| | - Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, Wrocław University, Poland
| |
Collapse
|
77
|
Istace B, Belser C, Falentin C, Labadie K, Boideau F, Deniot G, Maillet L, Cruaud C, Bertrand L, Chèvre AM, Wincker P, Rousseau-Gueutin M, Aury JM. Sequencing and Chromosome-Scale Assembly of Plant Genomes, Brassica rapa as a Use Case. BIOLOGY 2021; 10:732. [PMID: 34439964 PMCID: PMC8389630 DOI: 10.3390/biology10080732] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 07/27/2021] [Accepted: 07/28/2021] [Indexed: 11/29/2022]
Abstract
With the rise of long-read sequencers and long-range technologies, delivering high-quality plant genome assemblies is no longer reserved to large consortia. Not only sequencing techniques, but also computer algorithms have reached a point where the reconstruction of assemblies at the chromosome scale is now feasible at the laboratory scale. Current technologies, in particular long-range technologies, are numerous, and selecting the most promising one for the genome of interest is crucial to obtain optimal results. In this study, we resequenced the genome of the yellow sarson, Brassica rapa cv. Z1, using the Oxford Nanopore PromethION sequencer and assembled the sequenced data using current assemblers. To reconstruct complete chromosomes, we used and compared three long-range scaffolding techniques, optical mapping, Omni-C, and Pore-C sequencing libraries, commercialized by Bionano Genomics, Dovetail Genomics, and Oxford Nanopore Technologies, respectively, or a combination of the three, in order to evaluate the capability of each technology.
Collapse
Affiliation(s)
- Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| | - Caroline Belser
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| | - Cyril Falentin
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Karine Labadie
- Genoscope, Institut François Jacob, Commissariat à l’Energie Atomique (CEA), Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (K.L.); (C.C.)
| | - Franz Boideau
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Gwenaëlle Deniot
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Loeiz Maillet
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Corinne Cruaud
- Genoscope, Institut François Jacob, Commissariat à l’Energie Atomique (CEA), Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (K.L.); (C.C.)
| | - Laurie Bertrand
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| | - Anne-Marie Chèvre
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| | - Mathieu Rousseau-Gueutin
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| |
Collapse
|
78
|
Garrido-Sanz L, Senar MÀ, Piñol J. Relative species abundance estimation in artificial mixtures of insects using mito-metagenomics and a correction factor for the mitochondrial DNA copy number. Mol Ecol Resour 2021; 22:153-167. [PMID: 34251746 DOI: 10.1111/1755-0998.13464] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 06/21/2021] [Accepted: 07/07/2021] [Indexed: 11/27/2022]
Abstract
Mito-metagenomics (MMG) is becoming an alternative to amplicon metabarcoding for the assessment of biodiversity in complex biological samples using high-throughput sequencing. Whereas MMG overcomes the biases introduced by the PCR step in the generation of amplicons, it is not yet a technique free of shortcomings. First, as the reads are obtained from shotgun sequencing, a very low proportion of reads map into the mitogenomes, so a high sequencing effort is needed. Second, as the number of mitogenomes per cell can vary among species, the relative species abundance (RSA) in a mixture could be wrongly estimated. Here, we challenge the MMG method to estimate the RSA using artificial libraries of 17 insect species whose complete genomes are available on public repositories. With fresh specimens of these species, we created single-species libraries to calibrate the bioinformatic pipeline and mixed-species libraries to estimate the RSA. Our results showed that the MMG approach confidently recovers the species list of the mixtures, even when they contain congeneric species. The method was also able to estimate the abundance of a species across different samples (within-species estimation) but failed to estimate the RSA within a single sample (across-species estimation) unless a correction factor accounting for the variable number of mitogenomes per cell was used. To estimate this correction factor, we used the proportion of reads mapping into mitogenomes in the single-species libraries and the lengths of the whole genomes and mitogenomes.
Collapse
Affiliation(s)
| | | | - Josep Piñol
- Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain.,CREAF, Cerdanyola del Vallès, Spain
| |
Collapse
|
79
|
Udagawa H, Ichida H, Takeuchi T, Abe T, Takakura Y. Highly Efficient and Comprehensive Identification of Ethyl Methanesulfonate-Induced Mutations in Nicotiana tabacum L. by Whole-Genome and Whole-Exome Sequencing. FRONTIERS IN PLANT SCIENCE 2021; 12:671598. [PMID: 34140964 PMCID: PMC8204250 DOI: 10.3389/fpls.2021.671598] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 04/26/2021] [Indexed: 06/12/2023]
Abstract
Tobacco (Nicotiana tabacum L.) is a complex allotetraploid species with a large 4.5-Gb genome that carries duplicated gene copies. In this study, we describe the development of a whole-exome sequencing (WES) procedure in tobacco and its application to characterize a test population of ethyl methanesulfonate (EMS)-induced mutations. A probe set covering 50.3-Mb protein coding regions was designed from a reference tobacco genome. The EMS-induced mutations in 19 individual M2 lines were analyzed using our mutation analysis pipeline optimized to minimize false positives/negatives. In the target regions, the on-target rate of WES was approximately 75%, and 61,146 mutations were detected in the 19 M2 lines. Most of the mutations (98.8%) were single nucleotide variants, and 95.6% of them were C/G to T/A transitions. The number of mutations detected in the target coding sequences by WES was 93.5% of the mutations detected by whole-genome sequencing (WGS). The amount of sequencing data necessary for efficient mutation detection was significantly lower in WES (11.2 Gb), which is only 6.2% of the required amount in WGS (180 Gb). Thus, WES was almost comparable to WGS in performance but is more cost effective. Therefore, the developed target exome sequencing, which could become a fundamental tool in high-throughput mutation identification, renders the genome-wide analysis of tobacco highly efficient.
Collapse
Affiliation(s)
- Hisashi Udagawa
- Leaf Tobacco Research Center, Japan Tobacco Inc., Oyama, Japan
| | - Hiroyuki Ichida
- RIKEN Nishina Center for Accelerator-Based Science, Wako, Japan
| | | | - Tomoko Abe
- RIKEN Nishina Center for Accelerator-Based Science, Wako, Japan
| | | |
Collapse
|
80
|
Evidence from oyster suggests an ancient role for Pdx in regulating insulin gene expression in animals. Nat Commun 2021; 12:3117. [PMID: 34035261 PMCID: PMC8149454 DOI: 10.1038/s41467-021-23216-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2018] [Accepted: 04/19/2021] [Indexed: 11/17/2022] Open
Abstract
Hox and ParaHox genes encode transcription factors with similar expression patterns in divergent animals. The Pdx (Xlox) homeobox gene, for example, is expressed in a sharp spatial domain in the endodermal cell layer of the gut in chordates, echinoderms, annelids and molluscs. The significance of comparable gene expression patterns is unclear because it is not known if downstream transcriptional targets are also conserved. Here, we report evidence indicating that a classic transcriptional target of Pdx1 in vertebrates, the insulin gene, is a likely direct target of Pdx in Pacific oyster adults. We show that one insulin-related gene, cgILP, is co-expressed with cgPdx in oyster digestive tissue. Transcriptomic comparison suggests that this tissue plays a similar role to the vertebrate pancreas. Using ATAC-seq and ChIP, we identify an upstream regulatory element of the cgILP gene which shows binding interaction with cgPdx protein in oyster hepatopancreas and demonstrate, using a cell culture assay, that the oyster Pdx can act as a transcriptional activator through this site, possibly in synergy with NeuroD. These data argue that a classic homeodomain-target gene interaction dates back to the origin of Bilateria. In vertebrates insulin is a direct transcriptional target of Pdx: the same is true in Pacific oysters and the authors show insulin-related gene, cgILP, is co-expressed with cgPdx in oyster digestive tissue, showing this gene interaction dates back to the origin of Bilateria.
Collapse
|
81
|
Grumaz C, Hoffmann A, Vainshtein Y, Kopp M, Grumaz S, Stevens P, Decker SO, Weigand MA, Hofer S, Brenner T, Sohn K. Rapid Next-Generation Sequencing-Based Diagnostics of Bacteremia in Septic Patients. J Mol Diagn 2021; 22:405-418. [PMID: 32146977 DOI: 10.1016/j.jmoldx.2019.12.006] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 11/18/2019] [Accepted: 12/11/2019] [Indexed: 01/23/2023] Open
Abstract
The increasing incidence of bloodstream infections including sepsis is a major challenge in intensive care units worldwide. However, current diagnostics for pathogen identification mainly depend on culture- and molecular-based approaches, which are not satisfactory regarding specificity, sensitivity, and time to diagnosis. Herein, we established a complete diagnostic workflow for real-time high-throughput sequencing of cell-free DNA from plasma based on nanopore sequencing for the detection of the causative agents, which was applied to the analyses of eight samples from four septic patients and three healthy controls, and subsequently validated against standard next-generation sequencing results. By optimization of library preparation protocols for short fragments with low input amounts, a 3.5-fold increase in sequencing throughput could be achieved. With tailored bioinformatics workflows, all eight septic patient samples were found to be positive for relevant pathogens. When considering time to diagnosis, pathogens were identified within minutes after start of sequencing. Moreover, an extrapolation of real-time sequencing performance on a cohort of 239 septic patient samples revealed that more than 90% of pathogen hits would have also been detected using the optimized MinION workflow. Reliable identification of pathogens based on circulating cell-free DNA sequencing using optimized workflows and real-time nanopore-based sequencing can be accomplished within 5 to 6 hours following blood draw. Therefore, this approach might provide therapy-relevant results in a clinically critical timeframe.
Collapse
Affiliation(s)
- Christian Grumaz
- Department of in-Vitro Diagnostics, Fraunhofer Institute for Interfacial Engineering and Biotechnology IGB, Stuttgart, Germany
| | - Anne Hoffmann
- Department of in-Vitro Diagnostics, Fraunhofer Institute for Interfacial Engineering and Biotechnology IGB, Stuttgart, Germany
| | - Yevhen Vainshtein
- Department of in-Vitro Diagnostics, Fraunhofer Institute for Interfacial Engineering and Biotechnology IGB, Stuttgart, Germany
| | - Maria Kopp
- Department of in-Vitro Diagnostics, Fraunhofer Institute for Interfacial Engineering and Biotechnology IGB, Stuttgart, Germany
| | - Silke Grumaz
- Department of in-Vitro Diagnostics, Fraunhofer Institute for Interfacial Engineering and Biotechnology IGB, Stuttgart, Germany
| | | | - Sebastian O Decker
- Department of Anesthesiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Markus A Weigand
- Department of Anesthesiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Stefan Hofer
- Westpfalz-Klinikum GmbH, Kaiserslautern, Germany
| | - Thorsten Brenner
- Department of Anesthesiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Kai Sohn
- Department of in-Vitro Diagnostics, Fraunhofer Institute for Interfacial Engineering and Biotechnology IGB, Stuttgart, Germany.
| |
Collapse
|
82
|
Wilson GW, Derouet M, Darling GE, Yeung JC. scSNV: accurate dscRNA-seq SNV co-expression analysis using duplicate tag collapsing. Genome Biol 2021; 22:144. [PMID: 33962667 PMCID: PMC8103760 DOI: 10.1186/s13059-021-02364-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 04/23/2021] [Indexed: 12/21/2022] Open
Abstract
Identifying single nucleotide variants has become common practice for droplet-based single-cell RNA-seq experiments; however, presently, a pipeline does not exist to maximize variant calling accuracy. Furthermore, molecular duplicates generated in these experiments have not been utilized to optimally detect variant co-expression. Herein, we introduce scSNV designed from the ground up to "collapse" molecular duplicates and accurately identify variants and their co-expression. We demonstrate that scSNV is fast, with a reduced false-positive variant call rate, and enables the co-detection of genetic variants and A>G RNA edits across twenty-two samples.
Collapse
Affiliation(s)
- Gavin W Wilson
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada.
| | - Mathieu Derouet
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada
| | - Gail E Darling
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada.,Division of Thoracic Surgery, Department of Surgery, University of Toronto, Toronto, M5G 2C4, Canada
| | - Jonathan C Yeung
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada. .,Division of Thoracic Surgery, Department of Surgery, University of Toronto, Toronto, M5G 2C4, Canada. .,Toronto General Hospital, 200 Elizabeth St, 9N-983, Toronto, ON, M5G 2C4, Canada.
| |
Collapse
|
83
|
Song YJ, Cho DH. Local Alignment of DNA Sequence Based on Deep Reinforcement Learning. IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY 2021; 2:170-178. [PMID: 35402982 PMCID: PMC8975175 DOI: 10.1109/ojemb.2021.3076156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 04/20/2021] [Accepted: 04/23/2021] [Indexed: 11/06/2022] Open
Abstract
Goal: Over the decades, there have been improvements in the sequence alignment algorithm, with significant advances in various aspects such as complexity and accuracy. However, human-defined algorithms have an explicit limitation in view of developmental completeness. This paper introduces a novel local alignment method to obtain optimal sequence alignment based on reinforcement learning. Methods: There is a DQNalign algorithm that learns and performs sequence alignment through deep reinforcement learning. This paper proposes a DQN x-drop algorithm that performs local alignment without human intervention by combining the x-drop algorithm with this DQNalign algorithm. The proposed algorithm performs local alignment by repeatedly observing the subsequences and selecting the next alignment direction until the x-drop algorithm terminates the DQNalign algorithm. This proposed algorithm has an advantage in view of linear computational complexity compared to conventional local alignment algorithms. Results: This paper compares alignment performance (coverage and identity) and complexity for a fair comparison between the proposed DQN x-drop algorithm and the conventional greedy x-drop algorithm. Firstly, we prove the proposed algorithm's superiority by comparing the two algorithms' computational complexity through numerical analysis. After that, we tested the alignment performance actual HEV and E.coli sequence datasets. The proposed method shows the comparable identity and coverage performance to the conventional alignment method while having linear complexity for the [Formula: see text] parameter. Conclusions: Through this study, it was possible to confirm the possibility of a new local alignment algorithm that minimizes computational complexity without human intervention.
Collapse
Affiliation(s)
- Yong-Joon Song
- School of Electrical EngineeringKorea Advanced Institute of Science and TechnologyDaejeon305-701South Korea
| | - Dong-Ho Cho
- School of Electrical EngineeringKorea Advanced Institute of Science and TechnologyDaejeon305-701South Korea
| |
Collapse
|
84
|
Nagy NA, Rácz R, Rimington O, Póliska S, Orozco-terWengel P, Bruford MW, Barta Z. Draft genome of a biparental beetle species, Lethrus apterus. BMC Genomics 2021; 22:301. [PMID: 33902445 PMCID: PMC8074431 DOI: 10.1186/s12864-021-07627-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 04/13/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The lack of an understanding about the genomic architecture underpinning parental behaviour in subsocial insects displaying simple parental behaviours prevents the development of a full understanding about the evolutionary origin of sociality. Lethrus apterus is one of the few insect species that has biparental care. Division of labour can be observed between parents during the reproductive period in order to provide food and protection for their offspring. RESULTS Here, we report the draft genome of L. apterus, the first genome in the family Geotrupidae. The final assembly consisted of 286.93 Mbp in 66,933 scaffolds. Completeness analysis found the assembly contained 93.5% of the Endopterygota core BUSCO gene set. Ab initio gene prediction resulted in 25,385 coding genes, whereas homology-based analyses predicted 22,551 protein coding genes. After merging, 20,734 were found during functional annotation. Compared to other publicly available beetle genomes, 23,528 genes among the predicted genes were assigned to orthogroups of which 1664 were in species-specific groups. Additionally, reproduction related genes were found among the predicted genes based on which a reduction in the number of odorant- and pheromone-binding proteins was detected. CONCLUSIONS These genes can be used in further comparative and functional genomic researches which can advance our understanding of the genetic basis and hence the evolution of parental behaviour.
Collapse
Affiliation(s)
- Nikoletta A Nagy
- MTA-DE Behavioural Ecology Research Group, Department of Evolutionary Zoology, University of Debrecen, Egyetem tér 1, Debrecen, H-4032, Hungary.
- Department of Evolutionary Zoology and Human Biology, University of Debrecen, Debrecen, Hungary.
| | - Rita Rácz
- MTA-DE Behavioural Ecology Research Group, Department of Evolutionary Zoology, University of Debrecen, Egyetem tér 1, Debrecen, H-4032, Hungary
- Department of Evolutionary Zoology and Human Biology, University of Debrecen, Debrecen, Hungary
| | | | - Szilárd Póliska
- Genomic Medicine and Bioinformatic Core Facility, Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | | | | | - Zoltán Barta
- MTA-DE Behavioural Ecology Research Group, Department of Evolutionary Zoology, University of Debrecen, Egyetem tér 1, Debrecen, H-4032, Hungary
- Department of Evolutionary Zoology and Human Biology, University of Debrecen, Debrecen, Hungary
| |
Collapse
|
85
|
Ramaprasad A, Klaus S, Douvropoulou O, Culleton R, Pain A. Plasmodium vinckei genomes provide insights into the pan-genome and evolution of rodent malaria parasites. BMC Biol 2021; 19:69. [PMID: 33888092 PMCID: PMC8063448 DOI: 10.1186/s12915-021-00995-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 02/25/2021] [Indexed: 01/27/2023] Open
Abstract
Background Rodent malaria parasites (RMPs) serve as tractable tools to study malaria parasite biology and host-parasite-vector interactions. Among the four RMPs originally collected from wild thicket rats in sub-Saharan Central Africa and adapted to laboratory mice, Plasmodium vinckei is the most geographically widespread with isolates collected from five separate locations. However, there is a lack of extensive phenotype and genotype data associated with this species, thus hindering its use in experimental studies. Results We have generated a comprehensive genetic resource for P. vinckei comprising of five reference-quality genomes, one for each of its subspecies, blood-stage RNA sequencing data for five P. vinckei isolates, and genotypes and growth phenotypes for ten isolates. Additionally, we sequenced seven isolates of the RMP species Plasmodium chabaudi and Plasmodium yoelii, thus extending genotypic information for four additional subspecies enabling a re-evaluation of the genotypic diversity and evolutionary history of RMPs. The five subspecies of P. vinckei have diverged widely from their common ancestor and have undergone large-scale genome rearrangements. Comparing P. vinckei genotypes reveals region-specific selection pressures particularly on genes involved in mosquito transmission. Using phylogenetic analyses, we show that RMP multigene families have evolved differently across the vinckei and berghei groups of RMPs and that family-specific expansions in P. chabaudi and P. vinckei occurred in the common vinckei group ancestor prior to speciation. The erythrocyte membrane antigen 1 and fam-c families in particular show considerable expansions among the lowland forest-dwelling P. vinckei parasites. The subspecies from the highland forests of Katanga, P. v. vinckei, has a uniquely smaller genome, a reduced multigene family repertoire and is also amenable to transfection making it an ideal parasite for reverse genetics. We also show that P. vinckei parasites are amenable to genetic crosses. Conclusions Plasmodium vinckei isolates display a large degree of phenotypic and genotypic diversity and could serve as a resource to study parasite virulence and immunogenicity. Inclusion of P. vinckei genomes provide new insights into the evolution of RMPs and their multigene families. Amenability to genetic crossing and transfection make them also suitable for classical and functional genetics to study Plasmodium biology. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-00995-5.
Collapse
Affiliation(s)
- Abhinay Ramaprasad
- Pathogen Genomics Group, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia.,Malaria Unit, Department of Pathology, Institute of Tropical Medicine (NEKKEN), Nagasaki University, 1-12-4 Sakamoto, Nagasaki, 852-8523, Japan.,Present address: Malaria Biochemistry Laboratory, Francis Crick Institute, London, NW1 1AT, UK
| | - Severina Klaus
- Malaria Unit, Department of Pathology, Institute of Tropical Medicine (NEKKEN), Nagasaki University, 1-12-4 Sakamoto, Nagasaki, 852-8523, Japan.,Biomedical Sciences, University of Heidelberg, Heidelberg, Germany
| | - Olga Douvropoulou
- Pathogen Genomics Group, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Richard Culleton
- Malaria Unit, Department of Pathology, Institute of Tropical Medicine (NEKKEN), Nagasaki University, 1-12-4 Sakamoto, Nagasaki, 852-8523, Japan. .,Division of Molecular Parasitology, Proteo-Science Center, Ehime University, 454 Shitsukawa, Toon, Ehime, 791-0295, Japan. .,Department of Protozoology, Institute of Tropical Medicine (NEKKEN), Nagasaki University, 1-12-4 Sakamoto, Nagasaki, 852-8523, Japan.
| | - Arnab Pain
- Pathogen Genomics Group, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia. .,Center for Zoonosis Control, Global Institution for Collaborative Research and Education (GI-CoRE), Hokkaido University, N20 W10 Kita-ku, Sapporo, 001-0020, Japan.
| |
Collapse
|
86
|
Jones W, Gong B, Novoradovskaya N, Li D, Kusko R, Richmond TA, Johann DJ, Bisgin H, Sahraeian SME, Bushel PR, Pirooznia M, Wilkins K, Chierici M, Bao W, Basehore LS, Lucas AB, Burgess D, Butler DJ, Cawley S, Chang CJ, Chen G, Chen T, Chen YC, Craig DJ, Del Pozo A, Foox J, Francescatto M, Fu Y, Furlanello C, Giorda K, Grist KP, Guan M, Hao Y, Happe S, Hariani G, Haseley N, Jasper J, Jurman G, Kreil DP, Łabaj P, Lai K, Li J, Li QZ, Li Y, Li Z, Liu Z, López MS, Miclaus K, Miller R, Mittal VK, Mohiyuddin M, Pabón-Peña C, Parsons BL, Qiu F, Scherer A, Shi T, Stiegelmeyer S, Suo C, Tom N, Wang D, Wen Z, Wu L, Xiao W, Xu C, Yu Y, Zhang J, Zhang Y, Zhang Z, Zheng Y, Mason CE, Willey JC, Tong W, Shi L, Xu J. A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency. Genome Biol 2021; 22:111. [PMID: 33863366 PMCID: PMC8051128 DOI: 10.1186/s13059-021-02316-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 03/18/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.
Collapse
Affiliation(s)
- Wendell Jones
- Q2 Solutions - EA Genomics, 5927 S Miami Blvd., Morrisville, NC, 27560, USA.
| | - Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | | | - Dan Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Rebecca Kusko
- Immuneering Corporation, One Broadway, 14th Floor, Cambridge, MA, 02142, USA
| | - Todd A Richmond
- Market & Application Development Bioinformatics, Roche Sequencing Solutions Inc., 4300 Hacienda Dr., Pleasanton, CA, 94588, USA
| | - Donald J Johann
- Winthrop P Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, 4301 W Markham St., Little Rock, AR, 72205, USA
| | - Halil Bisgin
- Department of Computer Science, Engineering and Physics, University of Michigan-Flint, Flint, MI, 48502, USA
| | - Sayed Mohammad Ebrahim Sahraeian
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., 1301 Shoreway Rd., Suite 7 #300, Belmont, CA, 94002, USA
| | - Pierre R Bushel
- National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Mehdi Pirooznia
- Bioinformatics and Computational Biology Laboratory, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Katherine Wilkins
- Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | | | - Wenjun Bao
- JMP Life Sciences, SAS Institute Inc., Cary, NC, 27519, USA
| | - Lee Scott Basehore
- Agilent Technologies, 11011 N Torrey Pines Rd., La Jolla, CA, 92037, USA
| | | | - Daniel Burgess
- (formerly) Research and Development, Roche Sequencing Solutions Inc., 500 South Rosa Rd., Madison, WI, 53719, USA
| | - Daniel J Butler
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY, 10065, USA
| | - Simon Cawley
- (formerly) Clinical Sequencing Division, Thermo Fisher Scientific, 180 Oyster Point Blvd., South San Francisco, CA, 94080, USA
| | - Chia-Jung Chang
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
| | - Guangchun Chen
- Department of Immunology, Genomics and Microarray Core Facility, University of Texas Southwestern Medical Center, 5323 Harry Hine Blvd., Dallas, TX, 75390, USA
| | - Tao Chen
- University of Texas Southwestern Medical Center, 2330 Inwood Rd., Dallas, TX, 75390, USA
| | - Yun-Ching Chen
- Bioinformatics and Computational Biology Laboratory, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Daniel J Craig
- Department of Medicine, College of Medicine and Life Sciences, The University of Toledo, Toledo, OH, 43614, USA
| | - Angela Del Pozo
- Institute of Medical and Molecular Genetics (INGEMM), Hospital Universitario La Paz, CIBERER Instituto de Salud Carlos III, 28046, Madrid, Spain
| | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY, 10065, USA
| | | | - Yutao Fu
- Thermo Fisher Scientific, 110 Miller Ave., Ann Arbor, MI, 48104, USA
| | | | - Kristina Giorda
- Marketing, Integrated DNA Technologies, Inc., 1710 Commercial Park, Coralville, IA, 52241, USA
| | - Kira P Grist
- Q2 Solutions - EA Genomics, 5927 S Miami Blvd., Morrisville, NC, 27560, USA
| | - Meijian Guan
- JMP Life Sciences, SAS Institute Inc., Cary, NC, 27519, USA
| | - Yingyi Hao
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Scott Happe
- Agilent Technologies, 1834 State Hwy 71 West, Cedar Creek, TX, 78612, USA
| | - Gunjan Hariani
- Q2 Solutions - EA Genomics, 5927 S Miami Blvd., Morrisville, NC, 27560, USA
| | - Nathan Haseley
- Illumina Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Jeff Jasper
- Q2 Solutions - EA Genomics, 5927 S Miami Blvd., Morrisville, NC, 27560, USA
| | | | - David Philip Kreil
- Bioinformatics Research, Institute of Molecular Biotechnology, Boku University Vienna, Vienna, Austria
| | - Paweł Łabaj
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Department of Biotechnology, Boku University, Vienna, Austria
| | - Kevin Lai
- Bioinformatics, Integrated DNA Technologies, Inc., 1710 Commercial Park, Coralville, IA, 52241, USA
| | - Jianying Li
- Kelly Government Solutions, Inc., Research Triangle Park, NC, 27709, USA
| | - Quan-Zhen Li
- Department of Immunology, Genomics and Microarray Core Facility, University of Texas Southwestern Medical Center, 5323 Harry Hine Blvd., Dallas, TX, 75390, USA
| | - Yulong Li
- Center of Genome and Personalized Medicine, Institute of Cancer Stem Cell, Dalian Medical University, Dalian, Liaoning, China
| | - Zhiguang Li
- Center of Genome and Personalized Medicine, Institute of Cancer Stem Cell, Dalian Medical University, Dalian, Liaoning, China
| | - Zhichao Liu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Mario Solís López
- Institute of Medical and Molecular Genetics (INGEMM), Hospital Universitario La Paz, CIBERER Instituto de Salud Carlos III, 28046, Madrid, Spain
- EATRIS ERIC- European Infrastructure for Translational Medicine, De Boelelaan 1118, 1081, HZ, Amsterdam, The Netherlands
| | - Kelci Miclaus
- JMP Life Sciences, SAS Institute Inc., Cary, NC, 27519, USA
| | - Raymond Miller
- Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | - Vinay K Mittal
- Thermo Fisher Scientific, 110 Miller Ave., Ann Arbor, MI, 48104, USA
| | - Marghoob Mohiyuddin
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., 1301 Shoreway Rd., Suite 7 #300, Belmont, CA, 94002, USA
| | - Carlos Pabón-Peña
- Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | - Barbara L Parsons
- Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Fujun Qiu
- Research and Development, Burning Rock Biotech, Shanghai, 201114, China
| | - Andreas Scherer
- EATRIS ERIC- European Infrastructure for Translational Medicine, De Boelelaan 1118, 1081, HZ, Amsterdam, The Netherlands
- Institute for Molecular Medicine Finland (FIMM), Nordic EMBL Partnership for Molecular Medicine, HiLIFE Unit, Biomedicum Helsinki 2U (D302b), FI-00014 University of Helsinki, P.O. Box 20 (Tukholmankatu 8), Helsinki, Finland
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, 500 Dongchuan Rd, Shanghai, 200241, China
| | - Suzy Stiegelmeyer
- University of North Carolina Health, 101 Manning Drive, Chapel Hill, NC, 27514, USA
| | - Chen Suo
- Department of Epidemiology, School of Public Health, Fudan University, Shanghai, China
| | - Nikola Tom
- EATRIS ERIC- European Infrastructure for Translational Medicine, De Boelelaan 1118, 1081, HZ, Amsterdam, The Netherlands
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic
| | - Dong Wang
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wenzhong Xiao
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
- Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA
| | - Chang Xu
- Research and Development, QIAGEN Sciences Inc., Frederick, MD, 21703, USA
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Hospital/Cancer Institute, Fudan University, Shanghai, 200438, China
| | - Jiyang Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Hospital/Cancer Institute, Fudan University, Shanghai, 200438, China
| | - Yifan Zhang
- University of Arkansas at Little Rock, Little Rock, AR, 72204, USA
| | - Zhihong Zhang
- Research and Development, Burning Rock Biotech, Shanghai, 201114, China
| | - Yuanting Zheng
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Hospital/Cancer Institute, Fudan University, Shanghai, 200438, China
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY, 10065, USA
| | - James C Willey
- Departments of Medicine, Pathology, and Cancer Biology, College of Medicine and Life Sciences, University of Toledo Health Sciences Campus, 3000 Arlington Ave, Toledo, OH, 43614, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Hospital/Cancer Institute, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 201203, China
- Fudan-Gospel Joint Research Center for Precision Medicine, Fudan University, Shanghai, 200438, China
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
87
|
Vernot B, Zavala EI, Gómez-Olivencia A, Jacobs Z, Slon V, Mafessoni F, Romagné F, Pearson A, Petr M, Sala N, Pablos A, Aranburu A, de Castro JMB, Carbonell E, Li B, Krajcarz MT, Krivoshapkin AI, Kolobova KA, Kozlikin MB, Shunkov MV, Derevianko AP, Viola B, Grote S, Essel E, Herráez DL, Nagel S, Nickel B, Richter J, Schmidt A, Peter B, Kelso J, Roberts RG, Arsuaga JL, Meyer M. Unearthing Neanderthal population history using nuclear and mitochondrial DNA from cave sediments. Science 2021; 372:science.abf1667. [PMID: 33858989 DOI: 10.1126/science.abf1667] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 03/31/2021] [Indexed: 12/15/2022]
Abstract
Bones and teeth are important sources of Pleistocene hominin DNA, but are rarely recovered at archaeological sites. Mitochondrial DNA (mtDNA) has been retrieved from cave sediments but provides limited value for studying population relationships. We therefore developed methods for the enrichment and analysis of nuclear DNA from sediments and applied them to cave deposits in western Europe and southern Siberia dated to between 200,000 and 50,000 years ago. We detected a population replacement in northern Spain about 100,000 years ago, which was accompanied by a turnover of mtDNA. We also identified two radiation events in Neanderthal history during the early part of the Late Pleistocene. Our work lays the ground for studying the population history of ancient hominins from trace amounts of nuclear DNA in sediments.
Collapse
Affiliation(s)
- Benjamin Vernot
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany.
| | - Elena I Zavala
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Asier Gómez-Olivencia
- Departamento de Geología, Facultad de Ciencia y Tecnología, Universidad del País Vasco-Euskal Herriko Unibertsitatea (UPV/EHU), Leioa, Spain.,Sociedad de Ciencias Aranzadi, Donostia-San Sebastián, Spain.,Centro Mixto UCM-ISCIII de Evolución y Comportamiento Humanos, Madrid, Spain
| | - Zenobia Jacobs
- Centre for Archaeological Science, School of Earth, Atmospheric and Life Sciences, University of Wollongong, Wollongong, New South Wales, Australia.,Australian Research Council (ARC) Centre of Excellence for Australian Biodiversity and Heritage, University of Wollongong, Wollongong, New South Wales, Australia
| | - Viviane Slon
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany.,Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,The Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Fabrizio Mafessoni
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Frédéric Romagné
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Alice Pearson
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Martin Petr
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Nohemi Sala
- Centro Mixto UCM-ISCIII de Evolución y Comportamiento Humanos, Madrid, Spain.,Centro Nacional de Investigación Sobre la Evolución Humana (CENIEH), Burgos, Spain
| | - Adrián Pablos
- Centro Mixto UCM-ISCIII de Evolución y Comportamiento Humanos, Madrid, Spain.,Centro Nacional de Investigación Sobre la Evolución Humana (CENIEH), Burgos, Spain
| | - Arantza Aranburu
- Departamento de Geología, Facultad de Ciencia y Tecnología, Universidad del País Vasco-Euskal Herriko Unibertsitatea (UPV/EHU), Leioa, Spain.,Sociedad de Ciencias Aranzadi, Donostia-San Sebastián, Spain
| | | | - Eudald Carbonell
- Institut Català de Paleoecologia Humana i Evolució Social (IPHES), Universitat Rovira i Virgili, Tarragona, Spain.,Àrea de Prehistòria, Universitat Rovira i Virgili, Tarragona, Spain
| | - Bo Li
- Centre for Archaeological Science, School of Earth, Atmospheric and Life Sciences, University of Wollongong, Wollongong, New South Wales, Australia.,Australian Research Council (ARC) Centre of Excellence for Australian Biodiversity and Heritage, University of Wollongong, Wollongong, New South Wales, Australia
| | - Maciej T Krajcarz
- Institute of Geological Sciences, Polish Academy of Sciences, Warszawa, Poland
| | - Andrey I Krivoshapkin
- Institute of Archaeology and Ethnography, Russian Academy of Sciences, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia
| | - Kseniya A Kolobova
- Institute of Archaeology and Ethnography, Russian Academy of Sciences, Novosibirsk, Russia
| | - Maxim B Kozlikin
- Institute of Archaeology and Ethnography, Russian Academy of Sciences, Novosibirsk, Russia
| | - Michael V Shunkov
- Institute of Archaeology and Ethnography, Russian Academy of Sciences, Novosibirsk, Russia
| | - Anatoly P Derevianko
- Institute of Archaeology and Ethnography, Russian Academy of Sciences, Novosibirsk, Russia
| | - Bence Viola
- Department of Anthropology, University of Toronto, Toronto, Ontario, Canada
| | - Steffi Grote
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Elena Essel
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - David López Herráez
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Sarah Nagel
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Birgit Nickel
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Julia Richter
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anna Schmidt
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benjamin Peter
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Janet Kelso
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Richard G Roberts
- Centre for Archaeological Science, School of Earth, Atmospheric and Life Sciences, University of Wollongong, Wollongong, New South Wales, Australia.,Australian Research Council (ARC) Centre of Excellence for Australian Biodiversity and Heritage, University of Wollongong, Wollongong, New South Wales, Australia
| | - Juan-Luis Arsuaga
- Centro Mixto UCM-ISCIII de Evolución y Comportamiento Humanos, Madrid, Spain.,Departamento de Paleontología, Facultad Ciencias Geológicas, Universidad Complutense de Madrid, Madrid, Spain
| | - Matthias Meyer
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Germany.
| |
Collapse
|
88
|
Brennan CA, Nakatsu G, Gallini Comeau CA, Drew DA, Glickman JN, Schoen RE, Chan AT, Garrett WS. Aspirin Modulation of the Colorectal Cancer-Associated Microbe Fusobacterium nucleatum. mBio 2021; 12:e00547-21. [PMID: 33824205 PMCID: PMC8092249 DOI: 10.1128/mbio.00547-21] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 03/04/2021] [Indexed: 12/14/2022] Open
Abstract
Aspirin is a chemopreventive agent for colorectal adenoma and cancer (CRC) that, like many drugs inclusive of chemotherapeutics, has been investigated for its effects on bacterial growth and virulence gene expression. Given the evolving recognition of the roles for bacteria in CRC, in this work, we investigate the effects of aspirin with a focus on one oncomicrobe-Fusobacterium nucleatum We show that aspirin and its primary metabolite salicylic acid alter F. nucleatum strain Fn7-1 growth in culture and that aspirin can effectively kill both actively growing and stationary Fn7-1. We also demonstrate that, at levels that do not inhibit growth, aspirin influences Fn7-1 gene expression. To assess whether aspirin modulation of F. nucleatum may be relevant in vivo, we use the ApcMin/+ mouse intestinal tumor model in which Fn7-1 is orally inoculated daily to reveal that aspirin-supplemented chow is sufficient to inhibit F. nucleatum-potentiated colonic tumorigenesis. We expand our characterization of aspirin sensitivity across other F. nucleatum strains, including those isolated from human CRC tissues, as well as other CRC-associated microbes, enterotoxigenic Bacteroides fragilis, and colibactin-producing Escherichia coli Finally, we determine that individuals who use aspirin daily have lower fusobacterial abundance in colon adenoma tissues, as determined by quantitative PCR performed on adenoma DNA. Together, our data support that aspirin has direct antibiotic activity against F. nucleatum strains and suggest that consideration of the potential effects of aspirin on the microbiome holds promise in optimizing risk-benefit assessments for use of aspirin in CRC prevention and management.IMPORTANCE There is an increasing understanding of the clinical correlations and potential mechanistic roles of specific members of the gut and tumoral microbiota in colorectal cancer (CRC) initiation, progression, and survival. However, we have yet to parlay this knowledge into better CRC outcomes through microbially informed diagnostic, preventive, or therapeutic approaches. Here, we demonstrate that aspirin, an established CRC chemopreventive, exhibits specific effects on the CRC-associated Fusobacterium nucleatum in culture, an animal model of intestinal tumorigenesis, and in human colonic adenoma tissues. Our work proposes a potential role for aspirin in influencing CRC-associated bacteria to prevent colorectal adenomas and cancer, beyond aspirin's canonical anti-inflammatory role targeting host tissues. Future research, such as studies investigating the effects of aspirin on fusobacterial load in patients, will help further elucidate the prospect of using aspirin to modulate F. nucleatumin vivo for improving CRC outcomes.
Collapse
Affiliation(s)
- Caitlin A Brennan
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
- Harvard T. H. Chan Microbiome in Public Health Center, Boston, Massachusetts, USA
| | - Geicho Nakatsu
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
- Harvard T. H. Chan Microbiome in Public Health Center, Boston, Massachusetts, USA
| | - Carey Ann Gallini Comeau
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| | - David A Drew
- Clinical and Translational Epidemiology Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Division of Gastroenterology, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Jonathan N Glickman
- Department of Pathology, Harvard Medical School, Boston, Massachusetts, USA
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| | - Robert E Schoen
- Division of Gastroenterology, Hepatology, and Nutrition, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Andrew T Chan
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
- Harvard T. H. Chan Microbiome in Public Health Center, Boston, Massachusetts, USA
- Clinical and Translational Epidemiology Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Division of Gastroenterology, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Wendy S Garrett
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
- Harvard T. H. Chan Microbiome in Public Health Center, Boston, Massachusetts, USA
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Department and Division of Medical Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA
- Department of Molecular Metabolism, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
89
|
Ferguson KB, Visser S, Dalíková M, Provazníková I, Urbaneja A, Pérez‐Hedo M, Marec F, Werren JH, Zwaan BJ, Pannebakker BA, Verhulst EC. Jekyll or Hyde? The genome (and more) of Nesidiocoris tenuis, a zoophytophagous predatory bug that is both a biological control agent and a pest. INSECT MOLECULAR BIOLOGY 2021; 30:188-209. [PMID: 33305885 PMCID: PMC8048687 DOI: 10.1111/imb.12688] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 11/25/2020] [Accepted: 12/07/2020] [Indexed: 05/14/2023]
Abstract
Nesidiocoris tenuis (Reuter) is an efficient predatory biological control agent used throughout the Mediterranean Basin in tomato crops but regarded as a pest in northern European countries. From the family Miridae, it is an economically important insect yet very little is known in terms of genetic information and no genomic or transcriptomic studies have been published. Here, we use a linked-read sequencing strategy on a single female N. tenuis. From this, we assembled the 355 Mbp genome and delivered an ab initio, homology-based and evidence-based annotation. Along the way, the bacterial "contamination" was removed from the assembly. In addition, bacterial lateral gene transfer (LGT) candidates were detected in the N. tenuis genome. The complete gene set is composed of 24 688 genes; the associated proteins were compared to other hemipterans (Cimex lectularis, Halyomorpha halys and Acyrthosiphon pisum). We visualized the genome using various cytogenetic techniques, such as karyotyping, CGH and GISH, indicating a karyotype of 2n = 32. Additional analyses include the localization of 18S rDNA and unique satellite probes as well as pooled sequencing to assess nucleotide diversity and neutrality of the commercial population. This is one of the first mirid genomes to be released and the first of a mirid biological control agent.
Collapse
Affiliation(s)
- K. B. Ferguson
- Laboratory of GeneticsWageningen UniversityWageningenThe Netherlands
| | - S. Visser
- Biology Centre CASInstitute of EntomologyČeské BudějoviceCzech Republic
- Faculty of ScienceUniversity of South BohemiaČeské BudějoviceCzech Republic
| | - M. Dalíková
- Biology Centre CASInstitute of EntomologyČeské BudějoviceCzech Republic
- Faculty of ScienceUniversity of South BohemiaČeské BudějoviceCzech Republic
| | - I. Provazníková
- Biology Centre CASInstitute of EntomologyČeské BudějoviceCzech Republic
- Faculty of ScienceUniversity of South BohemiaČeské BudějoviceCzech Republic
- European Molecular Biology LaboratoryHeidelbergGermany
| | - A. Urbaneja
- Centro de Protección Vegetal y BiotecnologíaInstituto Valenciano de Investigaciones Agrarias (IVIA)MoncadaSpain
| | - M. Pérez‐Hedo
- Centro de Protección Vegetal y BiotecnologíaInstituto Valenciano de Investigaciones Agrarias (IVIA)MoncadaSpain
| | - F. Marec
- Biology Centre CASInstitute of EntomologyČeské BudějoviceCzech Republic
| | - J. H. Werren
- Department of BiologyUniversity of RochesterRochesterNew YorkUSA
| | - B. J. Zwaan
- Laboratory of GeneticsWageningen UniversityWageningenThe Netherlands
| | - B. A. Pannebakker
- Laboratory of GeneticsWageningen UniversityWageningenThe Netherlands
| | - E. C. Verhulst
- Laboratory of EntomologyWageningen UniversityWageningenThe Netherlands
| |
Collapse
|
90
|
Cornetti L, Fields PD, Ebert D. Genomic characterization of selfing in the cyclic parthenogen Daphnia magna. J Evol Biol 2021; 34:792-802. [PMID: 33704857 DOI: 10.1111/jeb.13780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 02/23/2021] [Accepted: 03/06/2021] [Indexed: 11/29/2022]
Abstract
Inbreeding refers to the fusion of related individuals' gametes, with self-fertilization (selfing) being an extreme form of inbreeding-involving gametes produced by the same individual. Selfing is expected to reduce heterozygosity by an average of 50% in one generation; however, little is known about the empirical variation on a genome level surrounding this figure and the factors that affect variation. We selfed genotypes of the cyclic parthenogen Daphnia magna and analysed whole genomes of mothers and selfed offspring, observing the predicted 50% heterozygosity reduction on average. We also saw substantial variation around this value and significant differences among mother-offspring pairs. Crossover analysis confirmed the known trend of recombination occurring more often towards the telomeres. This effect was shown, through simulations, to increase the variance of heterozygosity reduction compared to when a uniform distribution of crossovers was used. Similarly, we simulated inbred line production after several generations of selfing and we observed higher variance in achieved homozygosity when we consider a higher recombination rate towards the telomeres. Our empirical and simulation study highlights that the expected mean values of heterozygosity reduction show remarkable variation, which can help understand, for example, differences among inbred individuals.
Collapse
Affiliation(s)
- Luca Cornetti
- Department of Environmental Sciences, Zoology, University of Basel, Basel, Switzerland
| | - Peter D Fields
- Department of Environmental Sciences, Zoology, University of Basel, Basel, Switzerland
| | - Dieter Ebert
- Department of Environmental Sciences, Zoology, University of Basel, Basel, Switzerland
| |
Collapse
|
91
|
Santos CA, Sonoda GG, Cortez T, Coutinho LL, Andrade SCS. Transcriptome Expression of Biomineralization Genes in Littoraria flava Gastropod in Brazilian Rocky Shore Reveals Evidence of Local Adaptation. Genome Biol Evol 2021; 13:6171147. [PMID: 33720344 PMCID: PMC8070887 DOI: 10.1093/gbe/evab050] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/09/2021] [Accepted: 03/11/2021] [Indexed: 12/11/2022] Open
Abstract
Understanding how selection shapes population differentiation and local adaptation in marine species remains one of the greatest challenges in the field of evolutionary biology. The selection of genes in response to environment-specific factors and microenvironmental variation often results in chaotic genetic patchiness, which is commonly observed in rocky shore organisms. To identify these genes, the expression profile of the marine gastropod Littoraria flava collected from four Southeast Brazilian locations in ten rocky shore sites was analyzed. In this first L. flava transcriptome, 250,641 unigenes were generated, and 24% returned hits after functional annotation. Independent paired comparisons between 1) transects, 2) sites within transects, and 3) sites from different transects were performed for differential expression, detecting 8,622 unique differentially expressed genes. Araçá (AR) and São João (SJ) transect comparisons showed the most divergent gene products. For local adaptation, fitness-related differentially expressed genes were chosen for selection tests. Nine and 24 genes under adaptative and purifying selection, respectively, were most related to biomineralization in AR and chaperones in SJ. The biomineralization-genes perlucin and gigasin-6 were positively selected exclusively in the site toward the open ocean in AR, with sequence variants leading to pronounced protein structure changes. Despite an intense gene flow among L. flava populations due to its planktonic larva, gene expression patterns within transects may be the result of selective pressures. Our findings represent the first step in understanding how microenvironmental genetic variation is maintained in rocky shore populations and the mechanisms underlying local adaptation in marine species.
Collapse
Affiliation(s)
- Camilla A Santos
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Gabriel G Sonoda
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Thainá Cortez
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Luiz L Coutinho
- Departamento de Ciência Animal, Escola Superior de Agricultura Luiz de Queiroz (ESALQ), Universidade de São Paulo, Piracicaba, São Paulo, SP, Brazil
| | - Sónia C S Andrade
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
| |
Collapse
|
92
|
Development and Characterization of 15 Novel Genomic SSRs for Viburnum farreri. PLANTS 2021; 10:plants10030487. [PMID: 33807587 PMCID: PMC8000228 DOI: 10.3390/plants10030487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 03/04/2021] [Accepted: 03/04/2021] [Indexed: 11/17/2022]
Abstract
The Viburnum genus is of particular interest to horticulturalists, phylogeneticists, and biogeographers. Despite its popularity, there are few existing molecular markers to investigate genetic diversity in this large genus, which includes over 160 species. There are also few polymorphic molecular tools that can delineate closely related species within the genus. Viburnum farreri, a member of the Solenotinus subclade and one of the centers of diversity for Viburnum, was selected for DNA sequencing and development of genomic simple sequence repeats (gSSRs). In this study, 15 polymorphic gSSRs were developed and characterized for a collection of 19 V. farreri samples. Number of alleles per locus ranged from two- to- eight and nine loci had four or more alleles. Observed heterozygosity ranged from 0 to 0.84 and expected heterozygosity ranged from 0.10 to 0.80 for the 15 loci. Shannon diversity index values across these loci ranged from 0.21 to 1.62. The markers developed in this study add to the existing molecular toolkit for the genus and will be used in future studies investigating cross-transferability, genetic variation, and species and cultivar delimitation in the Viburnum genus and closely allied genera in the Adoxaceae and Caprifoliaceae.
Collapse
|
93
|
Beck KL, Haiminen N, Chambliss D, Edlund S, Kunitomi M, Huang BC, Kong N, Ganesan B, Baker R, Markwell P, Kawas B, Davis M, Prill RJ, Krishnareddy H, Seabolt E, Marlowe CH, Pierre S, Quintanar A, Parida L, Dubois G, Kaufman J, Weimer BC. Monitoring the microbiome for food safety and quality using deep shotgun sequencing. NPJ Sci Food 2021; 5:3. [PMID: 33558514 PMCID: PMC7870667 DOI: 10.1038/s41538-020-00083-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 11/24/2020] [Indexed: 01/30/2023] Open
Abstract
In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced the total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides, Clostridium, Lactococcus, Aeromonas, and Citrobacter. We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species' viability from total RNA sequencing.
Collapse
Affiliation(s)
- Kristen L. Beck
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - Niina Haiminen
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481554.9IBM T.J. Watson Research Center, Yorktown Heights, Ossining, NY USA
| | - David Chambliss
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - Stefan Edlund
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - Mark Kunitomi
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - B. Carol Huang
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.27860.3b0000 0004 1936 9684University of California Davis, School of Veterinary Medicine, 100 K Pathogen Genome Project, Davis, CA 95616 USA
| | - Nguyet Kong
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.27860.3b0000 0004 1936 9684University of California Davis, School of Veterinary Medicine, 100 K Pathogen Genome Project, Davis, CA 95616 USA
| | - Balasubramanian Ganesan
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,Mars Global Food Safety Center, Beijing, China ,grid.507690.dWisdom Health, A Division of Mars Petcare, Vancouver, WA USA
| | - Robert Baker
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,Mars Global Food Safety Center, Beijing, China
| | - Peter Markwell
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,Mars Global Food Safety Center, Beijing, China
| | - Ban Kawas
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - Matthew Davis
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - Robert J. Prill
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - Harsha Krishnareddy
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - Ed Seabolt
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - Carl H. Marlowe
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.418312.d0000 0001 2187 1663Bio-Rad Laboratories, Hercules, CA USA
| | - Sophie Pierre
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481801.40000 0004 0623 3323Bio-Rad, Food Science Division, MArnes-La-Coquette, France
| | - André Quintanar
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481801.40000 0004 0623 3323Bio-Rad, Food Science Division, MArnes-La-Coquette, France
| | - Laxmi Parida
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481554.9IBM T.J. Watson Research Center, Yorktown Heights, Ossining, NY USA
| | - Geraud Dubois
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - James Kaufman
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.481551.cIBM Almaden Research Center, San Jose, CA USA
| | - Bart C. Weimer
- Consortium for Sequencing the Food Supply Chain, San Jose, CA USA ,grid.27860.3b0000 0004 1936 9684University of California Davis, School of Veterinary Medicine, 100 K Pathogen Genome Project, Davis, CA 95616 USA
| |
Collapse
|
94
|
Baeza M, Zúñiga S, Peragallo V, Barahona S, Alcaino J, Cifuentes V. Identification of Stress-Related Genes and a Comparative Analysis of the Amino Acid Compositions of Translated Coding Sequences Based on Draft Genome Sequences of Antarctic Yeasts. Front Microbiol 2021; 12:623171. [PMID: 33633709 PMCID: PMC7902016 DOI: 10.3389/fmicb.2021.623171] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 01/15/2021] [Indexed: 12/16/2022] Open
Abstract
Microorganisms inhabiting cold environments have evolved strategies to tolerate and thrive in those extreme conditions, mainly the low temperature that slow down reaction rates. Among described molecular and metabolic adaptations to enable functioning in the cold, there is the synthesis of cold-active proteins/enzymes. In bacterial cold-active proteins, reduced proline content and highly flexible and larger catalytic active sites than mesophylls counterparts have been described. However, beyond the low temperature, microorganisms' physiological requirements may differ according to their growth velocities, influencing their global protein compositions. This hypothesis was tested in this work using eight cold-adapted yeasts isolated from Antarctica, for which their growth parameters were measured and their draft genomes determined and bioinformatically analyzed. The optimal temperature for yeasts' growth ranged from 10 to 22°C, and yeasts having similar or same optimal temperature for growth displayed significative different growth rates. The sizes of the draft genomes ranged from 10.7 (Tetracladium sp.) to 30.7 Mb (Leucosporidium creatinivorum), and the GC contents from 37 (Candida sake) to 60% (L. creatinivorum). Putative genes related to various kinds of stress were identified and were especially numerous for oxidative and cold stress responses. The putative proteins were classified according to predicted cellular function and subcellular localization. The amino acid composition was compared among yeasts considering their optimal temperature for growth and growth rates. In several groups of predicted proteins, correlations were observed between their contents of flexible amino acids and both the yeasts' optimal temperatures for growth and their growth rates. In general, the contents of flexible amino acids were higher in yeasts growing more rapidly as their optimal temperature for growth was lower. The contents of flexible amino acids became lower among yeasts with higher optimal temperatures for growth as their growth rates increased.
Collapse
Affiliation(s)
- Marcelo Baeza
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Sergio Zúñiga
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Vicente Peragallo
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Salvador Barahona
- Centro de Biotecnología, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Jennifer Alcaino
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Víctor Cifuentes
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| |
Collapse
|
95
|
Suvorova YM, Kamionskaya AM, Korotkov EV. Search for SINE repeats in the rice genome using correlation-based position weight matrices. BMC Bioinformatics 2021; 22:42. [PMID: 33530928 PMCID: PMC7852121 DOI: 10.1186/s12859-021-03977-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 01/21/2021] [Indexed: 11/21/2022] Open
Abstract
Background Transposable elements (TEs) constitute a significant part of eukaryotic genomes. Short interspersed nuclear elements (SINEs) are non-autonomous TEs, which are widely represented in mammalian genomes and also found in plants. After insertion in a new position in the genome, TEs quickly accumulate mutations, which complicate their identification and annotation by modern bioinformatics methods. In this study, we searched for highly divergent SINE copies in the genome of rice (Oryza sativa subsp. japonica) using the Highly Divergent Repeat Search Method (HDRSM). Results The HDRSM considers correlations of neighboring symbols to construct position weight matrix (PWM) for a SINE family, which is then used to perform a search for new copies. In order to evaluate the accuracy of the method and compare it with the RepeatMasker program, we generated a set of SINE copies containing nucleotide substitutions and indels and inserted them into an artificial chromosome for analysis. The HDRSM showed better results both in terms of the number of identified inserted repeats and the accuracy of determining their boundaries. A search for the copies of 39 SINE families in the rice genome produced 14,030 hits; among them, 5704 were not detected by RepeatMasker. Conclusions The HDRSM could find divergent SINE copies, correctly determine their boundaries, and offer a high level of statistical significance. We also found that RepeatMasker is able to find relatively short copies of the SINE families with a higher level of similarity, while HDRSM is able to find more diverged copies. To obtain a comprehensive profile of SINE distribution in the genome, combined application of the HDRSM and RepeatMasker is recommended.
Collapse
Affiliation(s)
- Yulia M Suvorova
- Research Center of Biotechnology of the Russian Academy of Sciences, 60 let Oktjabrja pr-t, 7, bld. 1, Moscow, Russia.
| | - Anastasia M Kamionskaya
- Research Center of Biotechnology of the Russian Academy of Sciences, 60 let Oktjabrja pr-t, 7, bld. 1, Moscow, Russia
| | - Eugene V Korotkov
- Research Center of Biotechnology of the Russian Academy of Sciences, 60 let Oktjabrja pr-t, 7, bld. 1, Moscow, Russia
| |
Collapse
|
96
|
Investigating the Diversity and Host Range of Novel Parvoviruses from North American Ducks Using Epidemiology, Phylogenetics, Genome Structure, and Codon Usage Analysis. Viruses 2021; 13:v13020193. [PMID: 33525386 PMCID: PMC7912424 DOI: 10.3390/v13020193] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 01/22/2021] [Accepted: 01/26/2021] [Indexed: 01/03/2023] Open
Abstract
Parvoviruses are small single-stranded DNA viruses that can infect both vertebrates and invertebrates. We report here the full characterization of novel viruses we identified in ducks, including two viral species within the subfamily Hamaparvovirinae (duck-associated chapparvovirus, DAC) and a novel species within the subfamily Densovirinae (duck-associated ambidensovirus, DAAD). Overall, 5.7% and 21.1% of the 123 screened ducks (American black ducks, mallards, northern pintail) were positive for DAC and DAAD, respectively, and both viruses were more frequently detected in autumn than in winter. Genome organization and predicted transcription profiles of DAC and DAAD were similar to viruses of the genera Chaphamaparvovirus and Protoambidensovirus, respectively. Their association to these genera was also demonstrated by subfamily-wide phylogenetic and distance analyses of non-structural protein NS1 sequences. While DACs were included in a highly supported clade of avian viruses, no definitive conclusions could be drawn about the host type of DAAD because it was phylogenetically close to viruses found in vertebrates and invertebrates and analyses of codon usage bias and nucleotide frequencies of viruses within the family Parvoviridae showed no clear host-based viral segregation. This study highlights the high parvoviral diversity in the avian reservoir with many avian-associated parvoviruses likely yet to be discovered.
Collapse
|
97
|
Farhat S, Le P, Kayal E, Noel B, Bigeard E, Corre E, Maumus F, Florent I, Alberti A, Aury JM, Barbeyron T, Cai R, Da Silva C, Istace B, Labadie K, Marie D, Mercier J, Rukwavu T, Szymczak J, Tonon T, Alves-de-Souza C, Rouzé P, Van de Peer Y, Wincker P, Rombauts S, Porcel BM, Guillou L. Rapid protein evolution, organellar reductions, and invasive intronic elements in the marine aerobic parasite dinoflagellate Amoebophrya spp. BMC Biol 2021; 19:1. [PMID: 33407428 PMCID: PMC7789003 DOI: 10.1186/s12915-020-00927-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Accepted: 11/12/2020] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (~ 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. RESULTS We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions. CONCLUSION These results expand the range of atypical genome features found in basal dinoflagellates and raise questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this particular unicellular lineage.
Collapse
Affiliation(s)
- Sarah Farhat
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
- School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, New York, 11794, USA
| | - Phuong Le
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Ehsan Kayal
- Sorbonne Université, CNRS, FR2424, Station Biologique de Roscoff, Place Georges Teissier, 29680, Roscoff, France
| | - Benjamin Noel
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Estelle Bigeard
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France
| | - Erwan Corre
- Sorbonne Université, CNRS, FR2424, Station Biologique de Roscoff, Place Georges Teissier, 29680, Roscoff, France
| | - Florian Maumus
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Isabelle Florent
- Unité Molécules de Communication et Adaptation des Microorganismes (MCAM, UMR7245), Muséum national d'Histoire naturelle, CNRS, CP 52, 57 rue Cuvier, 75005, Paris, France
| | - Adriana Alberti
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Tristan Barbeyron
- Sorbonne Université, CNRS, UMR 8227, Station Biologique de Roscoff, Place Georges Teissier, 29680, Roscoff, France
| | - Ruibo Cai
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France
| | - Corinne Da Silva
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Karine Labadie
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Dominique Marie
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France
| | - Jonathan Mercier
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Tsinda Rukwavu
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Jeremy Szymczak
- Sorbonne Université, CNRS, FR2424, Station Biologique de Roscoff, Place Georges Teissier, 29680, Roscoff, France
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France
| | - Thierry Tonon
- Centre for Novel Agricultural Products, Department of Biology, University of York, Heslington, York, YO10 5DD, UK
| | - Catharina Alves-de-Souza
- Algal Resources Collection, MARBIONC, Center for Marine Sciences, University of North Carolina Wilmington, 5600 Marvin K. Moss Lane, Wilmington, NC, 28409, USA
| | - Pierre Rouzé
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Stephane Rombauts
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Betina M Porcel
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France.
| | - Laure Guillou
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France.
| |
Collapse
|
98
|
Chen X, Li D. Sequencing facility and DNA source associated patterns of virus-mappable reads in whole-genome sequencing data. Genomics 2021; 113:1189-1198. [PMID: 33301893 PMCID: PMC7856238 DOI: 10.1016/j.ygeno.2020.12.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 11/25/2020] [Accepted: 12/04/2020] [Indexed: 12/12/2022]
Abstract
Numerous viral sequences have been reported in the whole-genome sequencing (WGS) data of human blood. However, it is not clear to what degree the virus-mappable reads represent true viral sequences rather than random-mapping or noise originating from sample preparation, sequencing processes, or other sources. Identification of patterns of virus-mappable reads may generate novel indicators for evaluating the origins of these viral sequences. We characterized paired-end unmapped reads and reads aligned to viral references in human WGS datasets, then compared patterns of the virus-mappable reads among DNA sources and sequencing facilities which produced these datasets. We then examined potential origins of the source- and facility-associated viral reads. The proportions of clean unmapped reads among the seven sequencing facilities were significantly different (P < 2 × 10-16). We identified 260,339 reads that were mappable to a total of 99 viral references in 2535 samples. The majority (86.7%) of these virus-mappable reads (corresponding to 47 viral references), which can be classified into four groups based on their distinct patterns, were strongly associated with sequencing facility or DNA source (adjusted P value <0.01). Possible origins of these reads include artificial sequences in library preparation, recombinant vectors in cell culture, and phages co-contaminated with their host bacteria. The sequencing facility-associated virus-mappable reads and patterns were repeatedly observed in other datasets produced in the same facilities. We have constructed an analytic framework and profiled the unmapped reads mappable to viral references. The results provide a new understanding of sequencing facility- and DNA source-associated batch effects in deep sequencing data and may facilitate improved bioinformatics filtering of reads.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA; Department of Computer Science, University of Vermont, Burlington, VT 05405, USA; Neuroscience, Behavior, Health Initiative, University of Vermont, Burlington, VT 05405, USA.
| |
Collapse
|
99
|
Imhoff JF, Rahn T, Künzel S, Keller A, Neulinger SC. Osmotic Adaptation and Compatible Solute Biosynthesis of Phototrophic Bacteria as Revealed from Genome Analyses. Microorganisms 2020; 9:E46. [PMID: 33375353 PMCID: PMC7824335 DOI: 10.3390/microorganisms9010046] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 12/21/2020] [Accepted: 12/23/2020] [Indexed: 11/21/2022] Open
Abstract
Osmotic adaptation and accumulation of compatible solutes is a key process for life at high osmotic pressure and elevated salt concentrations. Most important solutes that can protect cell structures and metabolic processes at high salt concentrations are glycine betaine and ectoine. The genome analysis of more than 130 phototrophic bacteria shows that biosynthesis of glycine betaine is common among marine and halophilic phototrophic Proteobacteria and their chemotrophic relatives, as well as in representatives of Pirellulaceae and Actinobacteria, but are also found in halophilic Cyanobacteria and Chloroherpeton thalassium. This ability correlates well with the successful toleration of extreme salt concentrations. Freshwater bacteria in general lack the possibilities to synthesize and often also to take up these compounds. The biosynthesis of ectoine is found in the phylogenetic lines of phototrophic Alpha- and Gammaproteobacteria, most prominent in the Halorhodospira species and a number of Rhodobacteraceae. It is also common among Streptomycetes and Bacilli. The phylogeny of glycine-sarcosine methyltransferase (GMT) and diaminobutyrate-pyruvate aminotransferase (EctB) sequences correlate well with otherwise established phylogenetic groups. Most significantly, GMT sequences of cyanobacteria form two major phylogenetic branches and the branch of Halorhodospira species is distinct from all other Ectothiorhodospiraceae. A variety of transport systems for osmolytes are present in the studied bacteria.
Collapse
Affiliation(s)
| | - Tanja Rahn
- GEOMAR Helmholtz Centre for Ocean Research, 24105 Kiel, Germany;
| | - Sven Künzel
- Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany;
| | - Alexander Keller
- Center for Computational and Theoretical Biology, University Würzburg, 97074 Würzburg, Germany;
| | | |
Collapse
|
100
|
Rousseau-Gueutin M, Belser C, Da Silva C, Richard G, Istace B, Cruaud C, Falentin C, Boideau F, Boutte J, Delourme R, Deniot G, Engelen S, de Carvalho JF, Lemainque A, Maillet L, Morice J, Wincker P, Denoeud F, Chèvre AM, Aury JM. Long-read assembly of the Brassica napus reference genome Darmor-bzh. Gigascience 2020; 9:giaa137. [PMID: 33319912 PMCID: PMC7736779 DOI: 10.1093/gigascience/giaa137] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/18/2020] [Accepted: 11/09/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The combination of long reads and long-range information to produce genome assemblies is now accepted as a common standard. This strategy not only allows access to the gene catalogue of a given species but also reveals the architecture and organization of chromosomes, including complex regions such as telomeres and centromeres. The Brassica genus is not exempt, and many assemblies based on long reads are now available. The reference genome for Brassica napus, Darmor-bzh, which was published in 2014, was produced using short reads and its contiguity was extremely low compared with current assemblies of the Brassica genus. FINDINGS Herein, we report the new long-read assembly of Darmor-bzh genome (Brassica napus) generated by combining long-read sequencing data and optical and genetic maps. Using the PromethION device and 6 flowcells, we generated ∼16 million long reads representing 93× coverage and, more importantly, 6× with reads longer than 100 kb. This ultralong-read dataset allows us to generate one of the most contiguous and complete assemblies of a Brassica genome to date (contig N50 > 10 Mb). In addition, we exploited all the advantages of the nanopore technology to detect modified bases and sequence transcriptomic data using direct RNA to annotate the genome and focus on resistance genes. CONCLUSION Using these cutting-edge technologies, and in particular by relying on all the advantages of the nanopore technology, we provide the most contiguous Brassica napus assembly, a resource that will be valuable to the Brassica community for crop improvement and will facilitate the rapid selection of agronomically important traits.
Collapse
Affiliation(s)
| | - Caroline Belser
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Corinne Da Silva
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Gautier Richard
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Corinne Cruaud
- Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Cyril Falentin
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Franz Boideau
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Julien Boutte
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Regine Delourme
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Gwenaëlle Deniot
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Stefan Engelen
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | | | - Arnaud Lemainque
- Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Loeiz Maillet
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Jérôme Morice
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | - France Denoeud
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Anne-Marie Chèvre
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| |
Collapse
|