1
|
Clifton BD, Jimenez J, Kimura A, Chahine Z, Librado P, Sánchez-Gracia A, Abbassi M, Carranza F, Chan C, Marchetti M, Zhang W, Shi M, Vu C, Yeh S, Fanti L, Xia XQ, Rozas J, Ranz JM. Understanding the Early Evolutionary Stages of a Tandem Drosophilamelanogaster-Specific Gene Family: A Structural and Functional Population Study. Mol Biol Evol 2021; 37:2584-2600. [PMID: 32359138 PMCID: PMC7475035 DOI: 10.1093/molbev/msaa109] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Gene families underlie genetic innovation and phenotypic diversification. However, our understanding of the early genomic and functional evolution of tandemly arranged gene families remains incomplete as paralog sequence similarity hinders their accurate characterization. The Drosophila melanogaster-specific gene family Sdic is tandemly repeated and impacts sperm competition. We scrutinized Sdic in 20 geographically diverse populations using reference-quality genome assemblies, read-depth methodologies, and qPCR, finding that ∼90% of the individuals harbor 3-7 copies as well as evidence of population differentiation. In strains with reliable gene annotations, copy number variation (CNV) and differential transposable element insertions distinguish one structurally distinct version of the Sdic region per strain. All 31 annotated copies featured protein-coding potential and, based on the protein variant encoded, were categorized into 13 paratypes differing in their 3' ends, with 3-5 paratypes coexisting in any strain examined. Despite widespread gene conversion, the only copy present in all strains has functionally diverged at both coding and regulatory levels under positive selection. Contrary to artificial tandem duplications of the Sdic region that resulted in increased male expression, CNV in cosmopolitan strains did not correlate with expression levels, likely as a result of differential genome modifier composition. Duplicating the region did not enhance sperm competitiveness, suggesting a fitness cost at high expression levels or a plateau effect. Beyond facilitating a minimally optimal expression level, Sdic CNV acts as a catalyst of protein and regulatory diversity, showcasing a possible evolutionary path recently formed tandem multigene families can follow toward long-term consolidation in eukaryotic genomes.
Collapse
Affiliation(s)
- Bryan D Clifton
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Jamie Jimenez
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Ashlyn Kimura
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Zeinab Chahine
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Pablo Librado
- Laboratoire AMIS CNRS UMR 5288, Faculté de Médicine de Purpan, Université Paul Sabatier, Toulouse, France
| | - Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadistica, Universitat de Barcelona, Barcelona, Spain.,Institut de Recerca de la Biodiversitat, Universitat de Barcelona, Barcelona, Spain
| | - Mashya Abbassi
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Francisco Carranza
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Carolus Chan
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Marcella Marchetti
- Istituto Pasteur Italia, Fondazione Cenci-Bolognetti, Rome, Italy.,Department of Biology and Biotechnology "C. Darwin", Sapienza University of Rome, Rome, Italy
| | - Wanting Zhang
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Mijuan Shi
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Christine Vu
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Shudan Yeh
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA.,Department of Life Sciences, National Central University, Taoyuan City, Zhongli District, Taiwan
| | - Laura Fanti
- Istituto Pasteur Italia, Fondazione Cenci-Bolognetti, Rome, Italy.,Department of Biology and Biotechnology "C. Darwin", Sapienza University of Rome, Rome, Italy
| | - Xiao-Qin Xia
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadistica, Universitat de Barcelona, Barcelona, Spain.,Institut de Recerca de la Biodiversitat, Universitat de Barcelona, Barcelona, Spain
| | - José M Ranz
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| |
Collapse
|
2
|
Ranz J, Clifton B. Characterization and evolutionary dynamics of complex regions in eukaryotic genomes. Sci China Life Sci 2019; 62:467-88. [PMID: 30810961 DOI: 10.1007/s11427-018-9458-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 11/05/2018] [Indexed: 01/07/2023]
Abstract
Complex regions in eukaryotic genomes are typically characterized by duplications of chromosomal stretches that often include one or more genes repeated in a tandem array or in relatively close proximity. Nevertheless, the repetitive nature of these regions, together with the often high sequence identity among repeats, have made complex regions particularly recalcitrant to proper molecular characterization, often being misassembled or completely absent in genome assemblies. This limitation has prevented accurate functional and evolutionary analyses of these regions. This is becoming increasingly relevant as evidence continues to support a central role for complex genomic regions in explaining human disease, developmental innovations, and ecological adaptations across phyla. With the advent of long-read sequencing technologies and suitable assemblers, the development of algorithms that can accommodate sample heterozygosity, and the adoption of a pangenomic-like view of these regions, accurate reconstructions of complex regions are now within reach. These reconstructions will finally allow for accurate functional and evolutionary studies of complex genomic regions, underlying the generation of genotype-phenotype maps of unprecedented resolution.
Collapse
|
3
|
Borràs DM, Vossen RHAM, Liem M, Buermans HPJ, Dauwerse H, van Heusden D, Gansevoort RT, den Dunnen JT, Janssen B, Peters DJM, Losekoot M, Anvar SY. Detecting PKD1 variants in polycystic kidney disease patients by single-molecule long-read sequencing. Hum Mutat 2017; 38:870-879. [PMID: 28378423 PMCID: PMC5488171 DOI: 10.1002/humu.23223] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Revised: 03/28/2017] [Accepted: 03/29/2017] [Indexed: 01/23/2023]
Abstract
A genetic diagnosis of autosomal-dominant polycystic kidney disease (ADPKD) is challenging due to allelic heterogeneity, high GC content, and homology of the PKD1 gene with six pseudogenes. Short-read next-generation sequencing approaches, such as whole-genome sequencing and whole-exome sequencing, often fail at reliably characterizing complex regions such as PKD1. However, long-read single-molecule sequencing has been shown to be an alternative strategy that could overcome PKD1 complexities and discriminate between homologous regions of PKD1 and its pseudogenes. In this study, we present the increased power of resolution for complex regions using long-read sequencing to characterize a cohort of 19 patients with ADPKD. Our approach provided high sensitivity in identifying PKD1 pathogenic variants, diagnosing 94.7% of the patients. We show that reliable screening of ADPKD patients in a single test without interference of PKD1 homologous sequences, commonly introduced by residual amplification of PKD1 pseudogenes, by direct long-read sequencing is now possible. This strategy can be implemented in diagnostics and is highly suitable to sequence and resolve complex genomic regions that are of clinical relevance.
Collapse
Affiliation(s)
- Daniel M Borràs
- GenomeScan B.V, Leiden, The Netherlands.,Institut National de la Santé et de la Recherche Médicale (INSERM), Institut of Cardiovascular and Metabolic Disease, Toulouse, France.,Université Toulouse III Paul-Sabatier, Toulouse, France
| | - Rolf H A M Vossen
- Leiden Genome Technology Center (LGTC), Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | - Michael Liem
- Leiden Genome Technology Center (LGTC), Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | - Henk P J Buermans
- Leiden Genome Technology Center (LGTC), Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | - Hans Dauwerse
- Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | - Dave van Heusden
- Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | - Ron T Gansevoort
- Department of Nephrology, University Hospital Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Johan T den Dunnen
- Leiden Genome Technology Center (LGTC), Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands.,Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands.,Department of Clinical Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | | | - Dorien J M Peters
- Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | - Monique Losekoot
- Department of Clinical Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | - Seyed Yahya Anvar
- Leiden Genome Technology Center (LGTC), Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands.,Department of Human Genetics, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| |
Collapse
|