1
|
Recuerda M, Campagna L. How structural variants shape avian phenotypes: Lessons from model systems. Mol Ecol 2024; 33:e17364. [PMID: 38651830 DOI: 10.1111/mec.17364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/04/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024]
Abstract
Despite receiving significant recent attention, the relevance of structural variation (SV) in driving phenotypic diversity remains understudied, although recent advances in long-read sequencing, bioinformatics and pangenomic approaches have enhanced SV detection. We review the role of SVs in shaping phenotypes in avian model systems, and identify some general patterns in SV type, length and their associated traits. We found that most of the avian SVs so far identified are short indels in chickens, which are frequently associated with changes in body weight and plumage colouration. Overall, we found that relatively short SVs are more frequently detected, likely due to a combination of their prevalence compared to large SVs, and a detection bias, stemming primarily from the widespread use of short-read sequencing and associated analytical methods. SVs most commonly involve non-coding regions, especially introns, and when patterns of inheritance were reported, SVs associated primarily with dominant discrete traits. We summarise several examples of phenotypic convergence across different species, mediated by different SVs in the same or different genes and different types of changes in the same gene that can lead to various phenotypes. Complex rearrangements and supergenes, which can simultaneously affect and link several genes, tend to have pleiotropic phenotypic effects. Additionally, SVs commonly co-occur with single-nucleotide polymorphisms, highlighting the need to consider all types of genetic changes to understand the basis of phenotypic traits. We end by summarising expectations for when long-read technologies become commonly implemented in non-model birds, likely leading to an increase in SV discovery and characterisation. The growing interest in this subject suggests an increase in our understanding of the phenotypic effects of SVs in upcoming years.
Collapse
Affiliation(s)
- María Recuerda
- Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Ithaca, New York, USA
| | - Leonardo Campagna
- Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Ithaca, New York, USA
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
2
|
Goussarov G, Mysara M, Cleenwerck I, Claesen J, Leys N, Vandamme P, Van Houdt R. Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities. MICROBIOLOGY (READING, ENGLAND) 2024; 170. [PMID: 38916949 DOI: 10.1099/mic.0.001469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Bioinformatics group, Information Technology & Computer Science, Nile University, Giza, Egypt
| | - Ilse Cleenwerck
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Jürgen Claesen
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Present address: Department of Epidemiology & Biostatistics, Amsterdam UMC, VU University, Amsterdam, Netherlands
| | - Natalie Leys
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| |
Collapse
|
3
|
de Jong TV, Pan Y, Rastas P, Munro D, Tutaj M, Akil H, Benner C, Chen D, Chitre AS, Chow W, Colonna V, Dalgard CL, Demos WM, Doris PA, Garrison E, Geurts AM, Gunturkun HM, Guryev V, Hourlier T, Howe K, Huang J, Kalbfleisch T, Kim P, Li L, Mahaffey S, Martin FJ, Mohammadi P, Ozel AB, Polesskaya O, Pravenec M, Prins P, Sebat J, Smith JR, Solberg Woods LC, Tabakoff B, Tracey A, Uliano-Silva M, Villani F, Wang H, Sharp BM, Telese F, Jiang Z, Saba L, Wang X, Murphy TD, Palmer AA, Kwitek AE, Dwinell MR, Williams RW, Li JZ, Chen H. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. CELL GENOMICS 2024; 4:100527. [PMID: 38537634 PMCID: PMC11019364 DOI: 10.1016/j.xgen.2024.100527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/26/2023] [Accepted: 02/29/2024] [Indexed: 04/09/2024]
Abstract
The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared with its predecessor. Gene annotations are now more complete, improving the mapping precision of genomic, transcriptomic, and proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ∼20.0 million sequence variations, of which 18,700 are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.
Collapse
Affiliation(s)
- Tristan V de Jong
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Yanchao Pan
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Daniel Munro
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Department of Integrative Structural and Computational Biology, Scripps Research, San Diego, CA, USA
| | - Monika Tutaj
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Huda Akil
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Chris Benner
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Denghui Chen
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy; Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Clifton L Dalgard
- Department of Anatomy, Physiology & Genetics, The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Wendy M Demos
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Peter A Doris
- The Brown Foundation Institute of Molecular Medicine, Center for Human Genetics, University of Texas Health Science Center, Houston, TX, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Aron M Geurts
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Hakan M Gunturkun
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Victor Guryev
- Genome Structure and Ageing, University of Groningen, UMC, Groningen, the Netherlands
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Jun Huang
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ted Kalbfleisch
- Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Louisville, KY, USA
| | - Panjun Kim
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ling Li
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA; Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Ayse Bilge Ozel
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Jennifer R Smith
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Leah C Solberg Woods
- Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Boris Tabakoff
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | | | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Hongyang Wang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Burt M Sharp
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Francesca Telese
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Xusheng Wang
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Anne E Kwitek
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Melinda R Dwinell
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA.
| |
Collapse
|
4
|
Sánchez‐Costa M, Gola S, Rodríguez‐Sáiz M, Barredo J, Hidalgo A, Berenguer J. From accurate genome sequence to biotechnological application: The thermophile Mycolicibacterium hassiacum as experimental model. Microb Biotechnol 2024; 17:e14290. [PMID: 37498289 PMCID: PMC10832570 DOI: 10.1111/1751-7915.14290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 05/18/2023] [Accepted: 05/19/2023] [Indexed: 07/28/2023] Open
Abstract
Mycobacteria constitute a large group of microorganisms belonging to the phylum Actinobacteria encompassing some of the most relevant pathogenic bacteria and many saprophytic isolates that share a unique and complex cell envelope. Also unique to this group is the extensive capability to use and synthesize sterols, a class of molecules that include active signalling compounds of pharmaceutical use. However, few mycobacterial species and strains have been established as laboratory models to date, Mycolicibacterium smegmatis mc2 155 being the most common one. In this work, we focus on the use of a thermophilic mycobacterium, Mycolicibacterium hassiacum, which grows optimally above 50°C, as an emerging experimental model valid to extend our general knowledge of mycobacterial biology as well as for application purposes. To that end, accurate genomic sequences are key for gene mining, the study of pathogenicity or lack thereof and the potential for gene transfer. The combination of long- and short-massive sequencing technologies is strictly necessary to remove biases caused by errors specific to long-reads technology. By doing so in M. hassiacum, we obtained from the curated genome clues regarding the genetic manipulation potential of this microorganism from the presence of insertion sequences, CRISPR-Cas, type VII ESX secretion systems, as well as lack of plasmids. Finally, as a proof of concept of the applicability of M. hassiacum as a laboratory and industrial model, we used this high-quality genome of M. hassiacum to successfully knockout a gene involved in the use of phytosterols as source of carbon and energy, using an improved gene cassette for thermostable selection and a transformation protocol at high temperature.
Collapse
Affiliation(s)
- Mercedes Sánchez‐Costa
- Department of Molecular Biology, Faculty of SciencesCentre for Molecular Biology (UAM‐CSIC)University Institute for Molecular BiologyMadridSpain
| | - Susanne Gola
- Department of Medical Technology and BiotechnologyErnst‐Abbe‐Hochschule Jena, University of Applied SciencesJenaGermany
| | | | - José‐Luis Barredo
- Department of BiotechnologyCuria, Parque Tecnológico de LeónLeónSpain
| | - Aurelio Hidalgo
- Department of Molecular Biology, Faculty of SciencesCentre for Molecular Biology (UAM‐CSIC)University Institute for Molecular BiologyMadridSpain
| | - José Berenguer
- Department of Molecular Biology, Faculty of SciencesCentre for Molecular Biology (UAM‐CSIC)University Institute for Molecular BiologyMadridSpain
| |
Collapse
|
5
|
de Jong TV, Pan Y, Rastas P, Munro D, Tutaj M, Akil H, Benner C, Chen D, Chitre AS, Chow W, Colonna V, Dalgard CL, Demos WM, Doris PA, Garrison E, Geurts AM, Gunturkun HM, Guryev V, Hourlier T, Howe K, Huang J, Kalbfleisch T, Kim P, Li L, Mahaffey S, Martin FJ, Mohammadi P, Ozel AB, Polesskaya O, Pravenec M, Prins P, Sebat J, Smith JR, Solberg Woods LC, Tabakoff B, Tracey A, Uliano-Silva M, Villani F, Wang H, Sharp BM, Telese F, Jiang Z, Saba L, Wang X, Murphy TD, Palmer AA, Kwitek AE, Dwinell MR, Williams RW, Li JZ, Chen H. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.13.536694. [PMID: 37214860 PMCID: PMC10197727 DOI: 10.1101/2023.04.13.536694] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared to its predecessor. Gene annotations are now more complete, significantly improving the mapping precision of genomic, transcriptomic, and proteomics data sets. We jointly analyzed 163 short-read whole genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ~20.0 million sequence variations, of which 18.7 thousand are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.
Collapse
Affiliation(s)
- Tristan V de Jong
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Yanchao Pan
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Daniel Munro
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
- Department of Integrative Structural and Computational Biology, Scripps Research, San Diego, CA, USA
| | - Monika Tutaj
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Huda Akil
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Chris Benner
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Denghui Chen
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Clifton L Dalgard
- Department of Anatomy, Physiology & Genetics; The American Genome Center, Uniformed Services University of the Health Sciences, Washington DC, USA
| | - Wendy M Demos
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Peter A Doris
- The Brown Foundation Institute of Molecular Medicine, Center For Human Genetics, University of Texas Health Science Center, Houston, TX, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Aron M Geurts
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Hakan M Gunturkun
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Victor Guryev
- Genome Structure and Ageing, University of Groningen, UMC Groningen, The Netherlands
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Jun Huang
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ted Kalbfleisch
- Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Louisville, KY, USA
| | - Panjun Kim
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ling Li
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Ayse Bilge Ozel
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Jennifer R Smith
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Leah C Solberg Woods
- Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Boris Tabakoff
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | | | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Hongyang Wang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Burt M Sharp
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Francesca Telese
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Xusheng Wang
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Anne E Kwitek
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Melinda R Dwinell
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| |
Collapse
|
6
|
Lu N, Qiao Y, An P, Luo J, Bi C, Li M, Lu Z, Tu J. Exploration of whole genome amplification generated chimeric sequences in long-read sequencing data. Brief Bioinform 2023; 24:bbad275. [PMID: 37529913 DOI: 10.1093/bib/bbad275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/21/2023] [Accepted: 07/10/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION Multiple displacement amplification (MDA) has become the most commonly used method of whole genome amplification, generating a vast amount of DNA with higher molecular weight and greater genome coverage. Coupling with long-read sequencing, it is possible to sequence the amplicons of over 20 kb in length. However, the formation of chimeric sequences (chimeras, expressed as structural errors in sequencing data) in MDA seriously interferes with the bioinformatics analysis but its influence on long-read sequencing data is unknown. RESULTS We sequenced the phi29 DNA polymerase-mediated MDA amplicons on the PacBio platform and analyzed chimeras within the generated data. The 3rd-ChimeraMiner has been constructed as a pipeline for recognizing and restoring chimeras into the original structures in long-read sequencing data, improving the efficiency of using TGS data. Five long-read datasets and one high-fidelity long-read dataset with various amplification folds were analyzed. The result reveals that the mis-priming events in amplification are more frequently occurring than widely perceived, and the propor tion gradually accumulates from 42% to over 78% as the amplification continues. In total, 99.92% of recognized chimeric sequences were demonstrated to be artifacts, whose structures were wrongly formed in MDA instead of existing in original genomes. By restoring chimeras to their original structures, the vast majority of supplementary alignments that introduce false-positive structural variants are recycled, removing 97% of inversions on average and contributing to the analysis of structural variation in MDA-amplified samples. The impact of chimeras in long-read sequencing data analysis should be emphasized, and the 3rd-ChimeraMiner can help to quantify and reduce the influence of chimeras. AVAILABILITY AND IMPLEMENTATION The 3rd-ChimeraMiner is available on GitHub, https://github.com/dulunar/3rdChimeraMiner.
Collapse
Affiliation(s)
- Na Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Yi Qiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Pengfei An
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
- Monash University-Southeast University Joint Research Institute, Suzhou 215123, China
| | - Jiajian Luo
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Changwei Bi
- College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China
| | - Musheng Li
- Department of Physiology and Cell Biology, University of Nevada, Reno School of Medicine, Reno, NV 89511, USA
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Jing Tu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
7
|
Balarezo-Cisneros LN, Timouma S, Hanak A, Currin A, Valle F, Delneri D. High quality de novo genome assembly of the non-conventional yeast Kazachstania bulderi describes a potential low pH production host for biorefineries. Commun Biol 2023; 6:918. [PMID: 37679437 PMCID: PMC10484914 DOI: 10.1038/s42003-023-05285-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 08/25/2023] [Indexed: 09/09/2023] Open
Abstract
Kazachstania bulderi is a non-conventional yeast species able to grow efficiently on glucose and δ-gluconolactone at low pH. These unique traits make K. bulderi an ideal candidate for use in sustainable biotechnology processes including low pH fermentations and the production of green chemicals including organic acids. To accelerate strain development with this species, detailed information of its genetics is needed. Here, by employing long read sequencing we report a high-quality phased genome assembly for three strains of K. bulderi species, including the type strain. The sequences were assembled into 12 chromosomes with a total length of 14 Mb, and the genome was fully annotated at structural and functional levels, including allelic and structural variants, ribosomal array and mating type locus. This high-quality reference genome provides a resource to advance our fundamental knowledge of biotechnologically relevant non-conventional yeasts and to support the development of genetic tools for manipulating such strains towards their use as production hosts in biotechnological processes.
Collapse
Affiliation(s)
| | - Soukaina Timouma
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Alistair Hanak
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Andrew Currin
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | | | - Daniela Delneri
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK.
| |
Collapse
|
8
|
Debelli A, Kienzle L, Khorami HH, Angers A, Breton S. Validation of the male-specific ORF of the paternally-transmitted mtDNA in Mytilus edulis as a protein-coding gene. Gene 2023; 879:147586. [PMID: 37356740 DOI: 10.1016/j.gene.2023.147586] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/14/2023] [Accepted: 06/20/2023] [Indexed: 06/27/2023]
Abstract
There appears to be an additional set of sex-specific mtDNA-encoded proteins in bivalve species with doubly uniparental mitochondrial inheritance that may be involved in the transmission of the female and male mitogenomes. In the marine mussel Mytilus edulis, the translation of the female-specific open reading frame (F-ORF) was demonstrated but the translation of the male-specific ORF (M-ORF) remains to be shown. Here we validate the male-specific ORF of the paternal mitogenome in M. edulis as a protein-coding gene. The M-ORF protein was detected only in male gonads and localized in sperm mitochondria and acrosome, suggesting that it is involved in a key sperm function in Mytilus edulis.
Collapse
Affiliation(s)
- Alizée Debelli
- Département de Sciences Biologiques, Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Laura Kienzle
- Département de Sciences Biologiques, Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Hajar Hosseini Khorami
- Département de Sciences Biologiques, Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Annie Angers
- Département de Sciences Biologiques, Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Sophie Breton
- Département de Sciences Biologiques, Université de Montréal, Montréal, QC H3C 3J7, Canada.
| |
Collapse
|
9
|
Ruiz JL, Reimering S, Escobar-Prieto JD, Brancucci NMB, Echeverry DF, Abdi AI, Marti M, Gómez-Díaz E, Otto TD. From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA). Brief Bioinform 2023; 24:bbad248. [PMID: 37406192 PMCID: PMC10359078 DOI: 10.1093/bib/bbad248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/24/2023] [Accepted: 06/16/2023] [Indexed: 07/07/2023] Open
Abstract
Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA.
Collapse
Affiliation(s)
- José Luis Ruiz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Susanne Reimering
- Department for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Nicolas M B Brancucci
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, 4123 Allschwil, Switzerland
- University of Basel, 4001 Basel, Switzerland
| | - Diego F Echeverry
- Centro Internacional de Entrenamiento e Investigaciones Médicas (CIDEIM), Cali, Colombia
- Departamento de Microbiología, Facultad de Salud, Universidad del Valle, Cali, Colombia
| | | | - Matthias Marti
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| | - Elena Gómez-Díaz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Thomas D Otto
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| |
Collapse
|
10
|
Goussarov G, Mysara M, Vandamme P, Van Houdt R. Introduction to the principles and methods underlying the recovery of metagenome-assembled genomes from metagenomic data. Microbiologyopen 2022; 11:e1298. [PMID: 35765182 PMCID: PMC9179125 DOI: 10.1002/mbo3.1298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/18/2022] Open
Abstract
The rise of metagenomics offers a leap forward for understanding the genetic diversity of microorganisms in many different complex environments by providing a platform that can identify potentially unlimited numbers of known and novel microorganisms. As such, it is impossible to imagine new major initiatives without metagenomics. Nevertheless, it represents a relatively new discipline with various levels of complexity and demands on bioinformatics. The underlying principles and methods used in metagenomics are often seen as common knowledge and often not detailed or fragmented. Therefore, we reviewed these to guide microbiologists in taking the first steps into metagenomics. We specifically focus on a workflow aimed at reconstructing individual genomes, that is, metagenome‐assembled genomes, integrating DNA sequencing, assembly, binning, identification and annotation.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium.,Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| |
Collapse
|
11
|
Camacho E, González-de la Fuente S, Solana JC, Rastrojo A, Carrasco-Ramiro F, Requena JM, Aguado B. Gene Annotation and Transcriptome Delineation on a De Novo Genome Assembly for the Reference Leishmania major Friedlin Strain. Genes (Basel) 2021; 12:genes12091359. [PMID: 34573340 PMCID: PMC8468144 DOI: 10.3390/genes12091359] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 08/20/2021] [Accepted: 08/27/2021] [Indexed: 01/05/2023] Open
Abstract
Leishmania major is the main causative agent of cutaneous leishmaniasis in humans. The Friedlin strain of this species (LmjF) was chosen when a multi-laboratory consortium undertook the objective of deciphering the first genome sequence for a parasite of the genus Leishmania. The objective was successfully attained in 2005, and this represented a milestone for Leishmania molecular biology studies around the world. Although the LmjF genome sequence was done following a shotgun strategy and using classical Sanger sequencing, the results were excellent, and this genome assembly served as the reference for subsequent genome assemblies in other Leishmania species. Here, we present a new assembly for the genome of this strain (named LMJFC for clarity), generated by the combination of two high throughput sequencing platforms, Illumina short-read sequencing and PacBio Single Molecular Real-Time (SMRT) sequencing, which provides long-read sequences. Apart from resolving uncertain nucleotide positions, several genomic regions were reorganized and a more precise composition of tandemly repeated gene loci was attained. Additionally, the genome annotation was improved by adding 542 genes and more accurate coding-sequences defined for around two hundred genes, based on the transcriptome delimitation also carried out in this work. As a result, we are providing gene models (including untranslated regions and introns) for 11,238 genes. Genomic information ultimately determines the biology of every organism; therefore, our understanding of molecular mechanisms will depend on the availability of precise genome sequences and accurate gene annotations. In this regard, this work is providing an improved genome sequence and updated transcriptome annotations for the reference L. major Friedlin strain.
Collapse
|