1
|
Chain FJJ, Assis R. BLAST from the Past: Impacts of Evolving Approaches on Studies of Evolution by Gene Duplication. Genome Biol Evol 2021; 13:evab149. [PMID: 34164667 PMCID: PMC8325566 DOI: 10.1093/gbe/evab149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/21/2021] [Indexed: 11/14/2022] Open
Abstract
In 1970, Susumu Ohno hypothesized that gene duplication was a major reservoir of adaptive innovation. However, it was not until over two decades later that DNA sequencing studies uncovered the ubiquity of gene duplication across all domains of life, highlighting its global importance in the evolution of phenotypic complexity and species diversification. Today, it seems that there are no limits to the study of evolution by gene duplication, as it has rapidly coevolved with numerous experimental and computational advances in genomics. In this perspective, we examine word stem usage in PubMed abstracts to infer how evolving discoveries and technologies have shaped the landscape of studying evolution by gene duplication, leading to a more refined understanding of its role in the emergence of novel phenotypes.
Collapse
Affiliation(s)
- Frédéric J J Chain
- Department of Biological Sciences, University of Massachusetts Lowell, Massachusetts, USA
| | - Raquel Assis
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, USA
- Institute for Human Health and Disease Intervention, Florida Atlantic University, Boca Raton, Florida, USA
| |
Collapse
|
2
|
Rodriguez OL, Gibson WS, Parks T, Emery M, Powell J, Strahl M, Deikus G, Auckland K, Eichler EE, Marasco WA, Sebra R, Sharp AJ, Smith ML, Bashir A, Watson CT. A Novel Framework for Characterizing Genomic Haplotype Diversity in the Human Immunoglobulin Heavy Chain Locus. Front Immunol 2020; 11:2136. [PMID: 33072076 PMCID: PMC7539625 DOI: 10.3389/fimmu.2020.02136] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 08/06/2020] [Indexed: 02/06/2023] Open
Abstract
An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody-mediated processes. Due to locus complexity, standard high-throughput approaches have failed to accurately and comprehensively capture IGH polymorphism. As a result, the locus has only been fully characterized two times, severely limiting our knowledge of human IGH diversity. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize IGH variation in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, identifying 2 novel structural variants and 15 novel IGH alleles. We show multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and >97% decrease in false-positives) and array/imputation-based datasets. This framework establishes a desperately needed foundation for leveraging IG genomic data to study population-level variation in antibody-mediated immunity, critical for bettering our understanding of disease risk, and responses to vaccines and therapeutics.
Collapse
Affiliation(s)
- Oscar L Rodriguez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - William S Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Tom Parks
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Matthew Emery
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - James Powell
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Maya Strahl
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Kathryn Auckland
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, United States
| | - Wayne A Marasco
- Department of Cancer Immunology and AIDS, Dana-Farber Cancer Institute, Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Icahn Institute of Data Science and Genomic Technology, New York, NY, United States
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Melissa L Smith
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States.,Icahn Institute of Data Science and Genomic Technology, New York, NY, United States
| | - Ali Bashir
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| |
Collapse
|
3
|
Yohe LR, Davies KTJ, Simmons NB, Sears KE, Dumont ER, Rossiter SJ, Dávalos LM. Evaluating the performance of targeted sequence capture, RNA-Seq, and degenerate-primer PCR cloning for sequencing the largest mammalian multigene family. Mol Ecol Resour 2019; 20:140-153. [PMID: 31523924 DOI: 10.1111/1755-0998.13093] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2019] [Revised: 08/27/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022]
Abstract
Multigene families evolve from single-copy ancestral genes via duplication, and typically encode proteins critical to key biological processes. Molecular analyses of these gene families require high-confidence sequences, but the high sequence similarity of the members can create challenges for sequencing and downstream analyses. Focusing on the common vampire bat, Desmodus rotundus, we evaluated how different sequencing approaches performed in recovering the largest mammalian protein-coding multigene family: olfactory receptors (OR). Using the genome as a reference, we determined the proportion of intact protein-coding receptors recovered by: (a) amplicons from degenerate primers sequenced via Sanger technology, (b) RNA-Seq of the main olfactory epithelium, and (c) those genes captured with probes designed from transcriptomes of closely-related species. Our initial re-annotation of the high-quality vampire bat genome resulted in >400 intact OR genes, more than doubling the original estimate. Sanger-sequenced amplicons performed the poorest among the three approaches, detecting <33% of receptors in the genome. In contrast, the transcriptome reliably recovered >50% of the annotated genomic ORs, and targeted sequence capture recovered nearly 75% of annotated genes. Each sequencing approach assembled high-quality sequences, even if it did not recover all receptors in the genome. While some variation may be due to limitations of the study design (e.g., different individuals), variation among approaches was mostly caused by low coverage of some receptors rather than high rates of assembly error. Given this variability, we caution against using the counts of intact receptors per species to model the birth-death process of multigene families. Instead, our results support the use of orthologous sequences to explore and model the evolutionary processes shaping these genes.
Collapse
Affiliation(s)
- Laurel R Yohe
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA.,Department of Geology and Geophysics, Yale University, Stony Brook, NY, USA
| | - Kalina T J Davies
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | - Nancy B Simmons
- Department of Mammalogy, Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Karen E Sears
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, CA, USA
| | - Elizabeth R Dumont
- School of Natural Sciences, University of California Merced, Merced, CA, USA
| | - Stephen J Rossiter
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | - Liliana M Dávalos
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA.,Consortium for Inter-Disciplinary Environmental Research, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
4
|
Hoff SNK, Baalsrud HT, Tooming-Klunderud A, Skage M, Richmond T, Obernosterer G, Shirzadi R, Tørresen OK, Jakobsen KS, Jentoft S. Long-read sequence capture of the haemoglobin gene clusters across codfish species. Mol Ecol Resour 2018; 19:245-259. [PMID: 30329222 PMCID: PMC7379720 DOI: 10.1111/1755-0998.12955] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 10/05/2018] [Accepted: 10/09/2018] [Indexed: 11/30/2022]
Abstract
Combining high-throughput sequencing with targeted sequence capture has become an attractive tool to study specific genomic regions of interest. Most studies have so far focused on the exome using short-read technology. These approaches are not designed to capture intergenic regions needed to reconstruct genomic organization, including regulatory regions and gene synteny. Here, we demonstrate the power of combining targeted sequence capture with long-read sequencing technology for comparative genomic analyses of the haemoglobin (Hb) gene clusters across eight species separated by up to 70 million years. Guided by the reference genome assembly of the Atlantic cod (Gadus morhua) together with genome information from draft assemblies of selected codfishes, we designed probes covering the two Hb gene clusters. Use of custom-made barcodes combined with PacBio RSII sequencing led to highly continuous assemblies of the LA (~100 kb) and MN (~200 kb) clusters, which include syntenic regions of coding and intergenic sequences. Our results revealed an overall conserved genomic organization of the Hb genes within this lineage, yet with several, lineage-specific gene duplications. Moreover, for some of the species examined, we identified amino acid substitutions at two sites in the Hbb1 gene as well as length polymorphisms in its regulatory region, which has previously been linked to temperature adaptation in Atlantic cod populations. This study highlights the use of targeted long-read capture as a versatile approach for comparative genomic studies by generation of a cross-species genomic resource elucidating the evolutionary history of the Hb gene family across the highly divergent group of codfishes.
Collapse
Affiliation(s)
- Siv Nam Khang Hoff
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Helle T Baalsrud
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Ave Tooming-Klunderud
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Morten Skage
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | | | | | | | - Ole Kristian Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| |
Collapse
|