1
|
Srivastav SP, Feschotte C, Clark AG. Rapid evolution of piRNA clusters in the Drosophila melanogaster ovary. Genome Res 2024; 34:711-724. [PMID: 38749655 PMCID: PMC11216404 DOI: 10.1101/gr.278062.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 05/07/2024] [Indexed: 05/28/2024]
Abstract
The piRNA pathway is a highly conserved mechanism to repress transposable element (TE) activity in the animal germline via a specialized class of small RNAs called piwi-interacting RNAs (piRNAs). piRNAs are produced from discrete genomic regions called piRNA clusters (piCs). Although the molecular processes by which piCs function are relatively well understood in Drosophila melanogaster, much less is known about the origin and evolution of piCs in this or any other species. To investigate piC origin and evolution, we use a population genomic approach to compare piC activity and sequence composition across eight geographically distant strains of D. melanogaster with high-quality long-read genome assemblies. We perform annotations of ovary piCs and genome-wide TE content in each strain. Our analysis uncovers extensive variation in piC activity across strains and signatures of rapid birth and death of piCs. Most TEs inferred to be recently active show an enrichment of insertions into old and large piCs, consistent with the previously proposed "trap" model of piC evolution. In contrast, a small subset of active LTR families is enriched for the formation of new piCs, suggesting that these TEs have higher proclivity to form piCs. Thus, our findings uncover processes leading to the origin of piCs. We propose that piC evolution begins with the emergence of piRNAs from individual insertions of a few select TE families prone to seed new piCs that subsequently expand by accretion of insertions from most other TE families during evolution to form larger "trap" clusters. Our study shows that TEs themselves are the major force driving the rapid evolution of piCs.
Collapse
Affiliation(s)
- Satyam P Srivastav
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
2
|
Bylino OV, Ogienko AA, Batin MA, Georgiev PG, Omelina ES. Genetic, Environmental, and Stochastic Components of Lifespan Variability: The Drosophila Paradigm. Int J Mol Sci 2024; 25:4482. [PMID: 38674068 PMCID: PMC11050664 DOI: 10.3390/ijms25084482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 03/25/2024] [Accepted: 04/05/2024] [Indexed: 04/28/2024] Open
Abstract
Lifespan is a complex quantitative trait involving genetic and non-genetic factors as well as the peculiarities of ontogenesis. As with all quantitative traits, lifespan shows considerable variation within populations and between individuals. Drosophila, a favourite object of geneticists, has greatly advanced our understanding of how different forms of variability affect lifespan. This review considers the role of heritable genetic variability, phenotypic plasticity and stochastic variability in controlling lifespan in Drosophila melanogaster. We discuss the major historical milestones in the development of the genetic approach to study lifespan, the breeding of long-lived lines, advances in lifespan QTL mapping, the environmental factors that have the greatest influence on lifespan in laboratory maintained flies, and the mechanisms, by which individual development affects longevity. The interplay between approaches to study ageing and lifespan limitation will also be discussed. Particular attention will be paid to the interaction of different types of variability in the control of lifespan.
Collapse
Affiliation(s)
- Oleg V. Bylino
- Department of Regulation of Genetic Processes, Laboratory of Molecular Organization of the Genome, Institute of Gene Biology RAS, 119334 Moscow, Russia
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Institute of Gene Biology, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Anna A. Ogienko
- Department of Regulation of Genetic Processes, Institute of Molecular and Cellular Biology SB RAS, 630090 Novosibirsk, Russia
| | - Mikhail A. Batin
- Open Longevity, 15260 Ventura Blvd., Sherman Oaks, Los Angeles, CA 91403, USA
| | - Pavel G. Georgiev
- Department of Regulation of Genetic Processes, Laboratory of Molecular Organization of the Genome, Institute of Gene Biology RAS, 119334 Moscow, Russia
| | - Evgeniya S. Omelina
- Department of Regulation of Genetic Processes, Institute of Molecular and Cellular Biology SB RAS, 630090 Novosibirsk, Russia
| |
Collapse
|
3
|
Carpinteyro-Ponce J, Machado CA. The Complex Landscape of Structural Divergence Between the Drosophila pseudoobscura and D. persimilis Genomes. Genome Biol Evol 2024; 16:evae047. [PMID: 38482945 PMCID: PMC10980976 DOI: 10.1093/gbe/evae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/07/2024] [Indexed: 04/01/2024] Open
Abstract
Structural genomic variants are key drivers of phenotypic evolution. They can span hundreds to millions of base pairs and can thus affect large numbers of genetic elements. Although structural variation is quite common within and between species, its characterization depends upon the quality of genome assemblies and the proportion of repetitive elements. Using new high-quality genome assemblies, we report a complex and previously hidden landscape of structural divergence between the genomes of Drosophila persimilis and D. pseudoobscura, two classic species in speciation research, and study the relationships among structural variants, transposable elements, and gene expression divergence. The new assemblies confirm the already known fixed inversion differences between these species. Consistent with previous studies showing higher levels of nucleotide divergence between fixed inversions relative to collinear regions of the genome, we also find a significant overrepresentation of INDELs inside the inversions. We find that transposable elements accumulate in regions with low levels of recombination, and spatial correlation analyses reveal a strong association between transposable elements and structural variants. We also report a strong association between differentially expressed (DE) genes and structural variants and an overrepresentation of DE genes inside the fixed chromosomal inversions that separate this species pair. Interestingly, species-specific structural variants are overrepresented in DE genes involved in neural development, spermatogenesis, and oocyte-to-embryo transition. Overall, our results highlight the association of transposable elements with structural variants and their importance in driving evolutionary divergence.
Collapse
Affiliation(s)
| | - Carlos A Machado
- Department of Biology, University of Maryland, College Park, MD, USA
| |
Collapse
|
4
|
Zheng Y, Shang X. SVvalidation: A long-read-based validation method for genomic structural variation. PLoS One 2024; 19:e0291741. [PMID: 38181020 PMCID: PMC10769053 DOI: 10.1371/journal.pone.0291741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 09/05/2023] [Indexed: 01/07/2024] Open
Abstract
Although various methods have been developed to detect structural variations (SVs) in genomic sequences, few are used to validate these results. Several commonly used SV callers produce many false positive SVs, and existing validation methods are not accurate enough. Therefore, a highly efficient and accurate validation method is essential. In response, we propose SVvalidation-a new method that uses long-read sequencing data for validating SVs with higher accuracy and efficiency. Compared to existing methods, SVvalidation performs better in validating SVs in repeat regions and can determine the homozygosity or heterozygosity of an SV. Additionally, SVvalidation offers the highest recall, precision, and F1-score (improving by 7-16%) across all datasets. Moreover, SVvalidation is suitable for different types of SVs. The program is available at https://github.com/nwpuzhengyan/SVvalidation.
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
5
|
Han R, Han L, Zhao X, Wang Q, Xia Y, Li H. Haplotype-resolved Genome of Sika Deer Reveals Allele-specific Gene Expression and Chromosome Evolution. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:470-482. [PMID: 36395998 PMCID: PMC10787017 DOI: 10.1016/j.gpb.2022.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 10/24/2022] [Accepted: 11/07/2022] [Indexed: 11/16/2022]
Abstract
Despite the scientific and medicinal importance of diploid sika deer (Cervus nippon), its genome resources are limited and haplotype-resolved chromosome-scale assembly is urgently needed. To explore mechanisms underlying the expression patterns of the allele-specific genes in antlers and the chromosome evolution in Cervidae, we report, for the first time, a high-quality haplotype-resolved chromosome-scale genome of sika deer by integrating multiple sequencing strategies, which was anchored to 32 homologous groups with a pair of sex chromosomes (XY). Several expanded genes (RET, PPP2R1A, PPP2R1B, YWHAB, YWHAZ, and RPS6) and positively selected genes (eIF4E, Wnt8A, Wnt9B, BMP4, and TP53) were identified, which could contribute to rapid antler growth without carcinogenesis. A comprehensive and systematic genome-wide analysis of allele expression patterns revealed that most alleles were functionally equivalent in regulating rapid antler growth and inhibiting oncogenesis. Comparative genomic analysis revealed that chromosome fission might occur during the divergence of sika deer and red deer (Cervus elaphus), and the olfactory sensation of sika deer might be more powerful than that of red deer. Obvious inversion regions containing olfactory receptor genes were also identified, which arose since the divergence. In conclusion, the high-quality allele-aware reference genome provides valuable resources for further illustration of the unique biological characteristics of antler, chromosome evolution, and multi-omics research of cervid animals.
Collapse
Affiliation(s)
- Ruobing Han
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin 150040, China
| | - Lei Han
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin 150040, China
| | - Xunwu Zhao
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin 150040, China
| | - Qianghui Wang
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin 150040, China
| | - Yanling Xia
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin 150040, China
| | - Heping Li
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin 150040, China.
| |
Collapse
|
6
|
Srivastav S, Feschotte C, Clark AG. Rapid evolution of piRNA clusters in the Drosophila melanogaster ovary. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.08.539910. [PMID: 37214865 PMCID: PMC10197564 DOI: 10.1101/2023.05.08.539910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Animal genomes are parasitized by a horde of transposable elements (TEs) whose mutagenic activity can have catastrophic consequences. The piRNA pathway is a conserved mechanism to repress TE activity in the germline via a specialized class of small RNAs associated with effector Piwi proteins called piwi-associated RNAs (piRNAs). piRNAs are produced from discrete genomic regions called piRNA clusters (piCs). While piCs are generally enriched for TE sequences and the molecular processes by which they are transcribed and regulated are relatively well understood in Drosophila melanogaster, much less is known about the origin and evolution of piCs in this or any other species. To investigate piC evolution, we use a population genomics approach to compare piC activity and sequence composition across 8 geographically distant strains of D. melanogaster with high quality long-read genome assemblies. We perform extensive annotations of ovary piCs and TE content in each strain and test predictions of two proposed models of piC evolution. The 'de novo' model posits that individual TE insertions can spontaneously attain the status of a small piC to generate piRNAs silencing the entire TE family. The 'trap' model envisions large and evolutionary stable genomic clusters where TEs tend to accumulate and serves as a long-term "memory" of ancient TE invasions and produce a great variety of piRNAs protecting against related TEs entering the genome. It remains unclear which model best describes the evolution of piCs. Our analysis uncovers extensive variation in piC activity across strains and signatures of rapid birth and death of piCs in natural populations. Most TE families inferred to be recently or currently active show an enrichment of strain-specific insertions into large piCs, consistent with the trap model. By contrast, only a small subset of active LTR retrotransposon families is enriched for the formation of strain-specific piCs, suggesting that these families have an inherent proclivity to form de novo piCs. Thus, our findings support aspects of both 'de novo' and 'trap' models of piC evolution. We propose that these two models represent two extreme stages along an evolutionary continuum, which begins with the emergence of piCs de novo from a few specific LTR retrotransposon insertions that subsequently expand by accretion of other TE insertions during evolution to form larger 'trap' clusters. Our study shows that piCs are evolutionarily labile and that TEs themselves are the major force driving the formation and evolution of piCs.
Collapse
Affiliation(s)
- Satyam Srivastav
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, USA
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, USA
| | - Andrew G. Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, USA
| |
Collapse
|
7
|
Zheng Y, Shang X. SVcnn: an accurate deep learning-based method for detecting structural variation based on long-read data. BMC Bioinformatics 2023; 24:213. [PMID: 37221476 DOI: 10.1186/s12859-023-05324-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 05/06/2023] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Structural variations (SVs) refer to variations in an organism's chromosome structure that exceed a length of 50 base pairs. They play a significant role in genetic diseases and evolutionary mechanisms. While long-read sequencing technology has led to the development of numerous SV caller methods, their performance results have been suboptimal. Researchers have observed that current SV callers often miss true SVs and generate many false SVs, especially in repetitive regions and areas with multi-allelic SVs. These errors are due to the messy alignments of long-read data, which are affected by their high error rate. Therefore, there is a need for a more accurate SV caller method. RESULT We propose a new method-SVcnn, a more accurate deep learning-based method for detecting SVs by using long-read sequencing data. We run SVcnn and other SV callers in three real datasets and find that SVcnn improves the F1-score by 2-8% compared with the second-best method when the read depth is greater than 5×. More importantly, SVcnn has better performance for detecting multi-allelic SVs. CONCLUSIONS SVcnn is an accurate deep learning-based method to detect SVs. The program is available at https://github.com/nwpuzhengyan/SVcnn .
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| |
Collapse
|
8
|
Zheng Y, Shang X, Sung WK. SVsearcher: A more accurate structural variation detection method in long read data. Comput Biol Med 2023; 158:106843. [PMID: 37019014 DOI: 10.1016/j.compbiomed.2023.106843] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/03/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023]
Abstract
Structural variations (SVs) represent genomic rearrangements (such as deletions, insertions, and inversions) whose sizes are larger than 50bp. They play important roles in genetic diseases and evolution mechanism. Due to the advance of long-read sequencing (i.e. PacBio long-read sequencing and Oxford Nanopore (ONT) long-read sequencing), we can call SVs accurately. However, for ONT long reads, we observe that existing long read SV callers miss a lot of true SVs and call a lot of false SVs in repetitive regions and in regions with multi-allelic SVs. Those errors are caused by messy alignments of ONT reads due to their high error rate. Hence, we propose a novel method, SVsearcher, to solve these issues. We run SVsearcher and other callers in three real datasets and find that SVsearcher improves the F1 score by approximately 10% for high coverage (50×) datasets and more than 25% for low coverage (10×) datasets. More importantly, SVsearcher can identify 81.7%-91.8% multi-allelic SVs while existing methods only identify 13.2% (Sniffles)-54.0% (nanoSV) of them. SVsearcher is available at https://github.com/kensung-lab/SVsearcher.
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, 710072 Xi'an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, 710072 Xi'an, China.
| | - Wing-Kin Sung
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China; Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China; Laboratory of Computational Genomics, Li Ka Shing Institute of Health Science, The Chinese University of Hong Kong, Hong Kong, China.
| |
Collapse
|
9
|
Jiang T, Liu S, Cao S, Wang Y. Structural Variant Detection from Long-Read Sequencing Data with cuteSV. Methods Mol Biol 2022; 2493:137-151. [PMID: 35751813 DOI: 10.1007/978-1-0716-2293-3_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Structural Variation (SV) represents genomic rearrangements and is strongly associated with human health and disease. Recently, long-read sequencing technologies provide the opportunity to more comprehensive identification of SVs at an ever-high resolution. However, under the circumstance of high sequencing errors and the complexity of SVs, there remains lots of technical issues to be settled. Hence, we propose cuteSV, a sensitive, fast, and scalable alignment-based SV detection approach to complete comprehensive discovery of diverse SVs. The benchmarking results indicate cuteSV is suitable for large-scale genome project since its excellent SV yields and ultra-fast speed. Here, we explain the overall framework for providing a detailed outline for users to apply cuteSV correctly and comprehensively. More details are available at https://github.com/tjiangHIT/cuteSV .
Collapse
Affiliation(s)
- Tao Jiang
- Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Shiqi Liu
- Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Shuqi Cao
- Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Wang
- Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
10
|
Dosage sensitivity and exon shuffling shape the landscape of polymorphic duplicates in Drosophila and humans. Nat Ecol Evol 2021; 6:273-287. [PMID: 34969986 DOI: 10.1038/s41559-021-01614-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 11/10/2021] [Indexed: 11/08/2022]
Abstract
Despite polymorphic duplicate genes' importance for the early stages of duplicate gene evolution, they are less studied than old gene duplicates. Two essential questions thus remain poorly addressed: how does dosage sensitivity, imposed by stoichiometry in protein complexes or by X chromosome dosage compensation, affect the emergence of complete duplicate genes? Do introns facilitate intergenic and intragenic chimaerism as predicted by the theory of exon shuffling? Here, we analysed new data for Drosophila and public data for humans, to characterize polymorphic duplicate genes with respect to dosage, exon-intron structures and allele frequencies. We found that complete duplicate genes are under dosage constraint induced by protein stoichiometry but potentially tolerated by X chromosome dosage compensation. We also found that in the intron-rich human genome, gene fusions and intragenic duplications extensively use intronic breakpoints generating in-frame proteins, in accordance with the theory of exon shuffling. Finally, we found that only a small proportion of complete or partial duplicates are at high frequencies, indicating the deleterious nature of dosage or gene structural changes. Altogether, we demonstrate how mechanistic factors including dosage sensitivity and exon-intron structure shape the short-term functional consequences of gene duplication.
Collapse
|
11
|
Xu X, Wang BS, Yu H. Intraspecies Genomic Divergence of a Fig Wasp Species Is Due to Geographical Barrier and Adaptation. Front Ecol Evol 2021. [DOI: 10.3389/fevo.2021.764828] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Understanding how intraspecies divergence results in speciation has great importance for our knowledge of evolutionary biology. Here we applied population genomics approaches to a fig wasp species (Valisia javana complex sp 1) to reveal its intraspecies differentiation and the underlying evolutionary dynamics. With re-sequencing data, we prove the Hainan Island population (DA) of sp1 genetically differ from the continental ones, then reveal the differed divergence pattern. DA has reduced SNP diversity but a higher proportion of population-specific structural variations (SVs), implying a restricted gene exchange. Based on SNPs, 32 differentiated islands containing 204 genes were detected, along with 1,532 population-specific SVs of DA overlapping 4,141 genes. The gene ontology (GO) enrichment analysis performed on differentiated islands linked to three significant GO terms on a basic metabolism process, with most of the genes failing to enrich. In contrast, population-specific SVs contributed more to the adaptation than the SNPs by linking to 59 terms that are crucial for wasp speciation, such as host reorganization and development regulation. In addition, the generalized dissimilarity modeling confirms the importance of environment difference on the genetic divergence within sp1. Hence, we assume the genetic divergence between DA and the continent due to not only the strait as a geographic barrier, but also adaptation. We reconstruct the demographic history within sp1. DA shares a similar population history with the nearby continental population, suggesting an incomplete divergence. Summarily, our results reveal how geographic barriers and adaptation both influence the genetic divergence at population-level, thereby increasing our knowledge on the potential speciation of non-model organisms.
Collapse
|
12
|
Liu DX, Rajaby R, Wei LL, Zhang L, Yang ZQ, Yang QY, Sung WK. Calling large indels in 1047 Arabidopsis with IndelEnsembler. Nucleic Acids Res 2021; 49:10879-10894. [PMID: 34643730 PMCID: PMC8565333 DOI: 10.1093/nar/gkab904] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 09/01/2021] [Accepted: 09/28/2021] [Indexed: 01/23/2023] Open
Abstract
Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap.
Collapse
Affiliation(s)
- Dong-Xu Liu
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.,Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ramesh Rajaby
- School of Computing, National University of Singapore, 117417 Singapore.,NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, 117456, Singapore
| | - Lu-Lu Wei
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.,Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Lei Zhang
- Precision Medical Laboratory, Wuhan Children's Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science & Technology, Wuhan 430016, China
| | - Zhi-Quan Yang
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.,Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Qing-Yong Yang
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.,Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.,School of Computing, National University of Singapore, 117417 Singapore
| | - Wing-Kin Sung
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.,Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.,School of Computing, National University of Singapore, 117417 Singapore.,Genome Institute of Singapore, Genome, 138672 Singapore
| |
Collapse
|
13
|
Ma X, Fan J, Wu Y, Zhao S, Zheng X, Sun C, Tan L. Whole-genome de novo assemblies reveal extensive structural variations and dynamic organelle-to-nucleus DNA transfers in African and Asian rice. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 104:596-612. [PMID: 32748498 PMCID: PMC7693357 DOI: 10.1111/tpj.14946] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/17/2020] [Accepted: 07/22/2020] [Indexed: 05/05/2023]
Abstract
Asian cultivated rice (Oryza sativa) and African cultivated rice (Oryza glaberrima) originated from the wild rice species Oryza rufipogon and Oryza barthii, respectively. The genomes of both cultivated species have undergone profound changes during domestication. Whole-genome de novo assemblies of O. barthii, O. glaberrima, O. rufipogon and Oryza nivara, produced using PacBio single-molecule real-time (SMRT) and next-generation sequencing (NGS) technologies, showed that Gypsy-like retrotransposons are the major contributors to genome size variation in African and Asian rice. Through the detection of genome-wide structural variations (SVs), we observed that besides 28 shared SV hot spots, another 67 hot spots existed in either the Asian or African rice genomes. Based on gene annotation information of the SVs, we established that organelle-to-nucleus DNA transfers resulted in numerous SVs that participated in the nuclear genome divergence of rice species and subspecies. We detected 52 giant nuclear integrants of organelle DNA (NORGs, defined as >10 kb) in six Oryza AA genomes. In addition, we developed an effective method to genotype giant NORGs, based on genome assembly, and first showed the dynamic change in the distribution of giant NORGs in rice natural population. Interestingly, 16 highly differentiated giant NORGs tended to accumulate in natural populations of Asian rice from higher latitude regions, grown at lower temperatures and light intensities. Our study provides new insight into the genome divergence of African and Asian rice, and establishes that organelle-to-nucleus DNA transfers, as potentially powerful contributors to environmental adaptation during rice evolution, play a major role in producing SVs in rice genomes.
Collapse
Affiliation(s)
- Xin Ma
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
- State Key Laboratory of AgrobiotechnologyChina Agricultural UniversityBeijing100193China
| | - Jinjian Fan
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
- State Key Laboratory of AgrobiotechnologyChina Agricultural UniversityBeijing100193China
| | - Yongzhen Wu
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
| | - Shuangshuang Zhao
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
| | - Xu Zheng
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
| | - Chuanqing Sun
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
- State Key Laboratory of Plant Physiology and BiochemistryChina Agricultural UniversityBeijing100193China
| | - Lubin Tan
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
- State Key Laboratory of AgrobiotechnologyChina Agricultural UniversityBeijing100193China
| |
Collapse
|
14
|
Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, Liu Y, Liu B, Wang Y. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 2020; 21:189. [PMID: 32746918 PMCID: PMC7477834 DOI: 10.1186/s13059-020-02107-y] [Citation(s) in RCA: 147] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 07/14/2020] [Indexed: 01/01/2023] Open
Abstract
Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yongzhuang Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yue Jiang
- Nebula Genomics, Harbin, 150030, Heilongjiang, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, Guangdong, China
| | - Yan Gao
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Zhe Cui
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yadong Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| |
Collapse
|
15
|
Wen Y, He H, Liu H, An Q, Wang D, Ding X, Shi Q, Feng Y, Wang E, Lei C, Zhang Z, Huang Y. Copy number variation of the USP16 gene and its association with milk traits in Chinese Holstein cattle. Anim Biotechnol 2020; 33:98-103. [PMID: 32646283 DOI: 10.1080/10495398.2020.1777148] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Copy number variations (CNVs) were similar to single nucleotide polymorphisms (SNPs) and insertion-deletion (InDel), regarded as genetic variations in many species. CNV is defined as the variable change of DNA segment length compared with the reference genome, including gains or losses from 50 bp to several mega bases. The functions of USP16 gene are diverse, such as regulating the cell cycle, DNA damage, histone H2A deubiquitination or mitotic nuclear division. To analyze the relationship between CNV of USP16 gene and milk traits in Chinese Holstein, we used qPCR to detect the individuals of Chinese Holstein (n = 180). The results showed that the effect of USP16 gene CNV on daily milk yield and fat percentage had significant difference (p < 0.05). The gain was the advantage type in daily milk yield and the loss was the advantage type in fat percentage. Therefore, CNV of USP16 gene is an important factor of milk traits in Chinese Holstein. Meanwhile, it may be used as a molecular marker for assisted selection of milk traits in Chinese Holstein, which provides a theoretical basis for the genetic improvement of cow breeds in China.
Collapse
Affiliation(s)
- Yifan Wen
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, People's Republic of China
| | - Hua He
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi, People's Republic of China
| | - Hongbing Liu
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, People's Republic of China
| | - Qingming An
- College of Agriculture and Forestry Engineering, Tongren Unviersity, Tongren, Guizhou, People's Republic of China
| | - Dahui Wang
- College of Agriculture and Forestry Engineering, Tongren Unviersity, Tongren, Guizhou, People's Republic of China
| | - Xiaoting Ding
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, People's Republic of China
| | - Qiaoting Shi
- Henan Academy of Agricultural Sciences, Institute of Animal Husbandry and Veterinary Science, Zhengzhou, Henan, People's Republic of China
| | - Yajie Feng
- Henan Academy of Agricultural Sciences, Institute of Animal Husbandry and Veterinary Science, Zhengzhou, Henan, People's Republic of China
| | - Eryao Wang
- Henan Academy of Agricultural Sciences, Institute of Animal Husbandry and Veterinary Science, Zhengzhou, Henan, People's Republic of China
| | - Chuzhao Lei
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, People's Republic of China
| | - Zijing Zhang
- Henan Academy of Agricultural Sciences, Institute of Animal Husbandry and Veterinary Science, Zhengzhou, Henan, People's Republic of China
| | - Yongzhen Huang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, People's Republic of China
| |
Collapse
|
16
|
Luan MW, Zhang XM, Zhu ZB, Chen Y, Xie SQ. Evaluating Structural Variation Detection Tools for Long-Read Sequencing Datasets in Saccharomyces cerevisiae. Front Genet 2020; 11:159. [PMID: 32211024 PMCID: PMC7075250 DOI: 10.3389/fgene.2020.00159] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 02/11/2020] [Indexed: 01/01/2023] Open
Abstract
Structural variation (SV) represents a major form of genetic variations that contribute to polymorphic variations, human diseases, and phenotypes in many organisms. Long-read sequencing has been successfully used to identify novel and complex SVs. However, comparison of SV detection tools for long-read sequencing datasets has not been reported. Therefore, we developed an analysis workflow that combined two alignment tools (NGMLR and minimap2) and five callers (Sniffles, Picky, smartie-sv, PBHoney, and NanoSV) to evaluate the SV detection in six datasets of Saccharomyces cerevisiae. The accuracy of SV regions was validated by re-aligning raw reads in diverse alignment tools, SV callers, experimental conditions, and sequencing platforms. The results showed that SV detection between NGMLR and minimap2 was not significant when using the same caller. The PBHoney was with the highest average accuracy (89.04%) and Picky has the lowest average accuracy (35.85%). The accuracy of NanoSV, Sniffles, and smartie-sv was 68.67%, 60.47%, and 57.67%, respectively. In addition, smartie-sv and NanoSV detected the most and least number of SVs, and SV detection from the PacBio sequencing platform was significantly more than that from ONT (p = 0.000173).
Collapse
Affiliation(s)
- Mei-Wei Luan
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Xiao-Ming Zhang
- College of Grassland, Resources and Environment, Inner Mongolia Agricultural University, Huhhot, China
| | - Zi-Bin Zhu
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Ying Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Shang-Qian Xie
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| |
Collapse
|
17
|
Frochaux MV, Bou Sleiman M, Gardeux V, Dainese R, Hollis B, Litovchenko M, Braman VS, Andreani T, Osman D, Deplancke B. cis-regulatory variation modulates susceptibility to enteric infection in the Drosophila genetic reference panel. Genome Biol 2020; 21:6. [PMID: 31948474 PMCID: PMC6966807 DOI: 10.1186/s13059-019-1912-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Resistance to enteric pathogens is a complex trait at the crossroads of multiple biological processes. We have previously shown in the Drosophila Genetic Reference Panel (DGRP) that resistance to infection is highly heritable, but our understanding of how the effects of genetic variants affect different molecular mechanisms to determine gut immunocompetence is still limited. RESULTS To address this, we perform a systems genetics analysis of the gut transcriptomes from 38 DGRP lines that were orally infected with Pseudomonas entomophila. We identify a large number of condition-specific, expression quantitative trait loci (local-eQTLs) with infection-specific ones located in regions enriched for FOX transcription factor motifs. By assessing the allelic imbalance in the transcriptomes of 19 F1 hybrid lines from a large round robin design, we independently attribute a robust cis-regulatory effect to only 10% of these detected local-eQTLs. However, additional analyses indicate that many local-eQTLs may act in trans instead. Comparison of the transcriptomes of DGRP lines that were either susceptible or resistant to Pseudomonas entomophila infection reveals nutcracker as the only differentially expressed gene. Interestingly, we find that nutcracker is linked to infection-specific eQTLs that correlate with its expression level and to enteric infection susceptibility. Further regulatory analysis reveals one particular eQTL that significantly decreases the binding affinity for the repressor Broad, driving differential allele-specific nutcracker expression. CONCLUSIONS Our collective findings point to a large number of infection-specific cis- and trans-acting eQTLs in the DGRP, including one common non-coding variant that lowers enteric infection susceptibility.
Collapse
Affiliation(s)
- Michael V. Frochaux
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maroun Bou Sleiman
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Current Address: Laboratory of Integrative Systems Physiology, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Gardeux
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Riccardo Dainese
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Brian Hollis
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Current Address: Department of Biological Sciences, University of South Carolina, Columbia, South Carolina USA
| | - Maria Litovchenko
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Virginie S. Braman
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Tommaso Andreani
- Computational Biology and Data Mining Group, Institute of Molecular Biology, Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | - Dani Osman
- Faculty of Sciences III and Azm Center for Research in Biotechnology and its Applications, LBA3B, EDST, Lebanese University, Tripoli, 1300 Lebanon
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
18
|
A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data. G3-GENES GENOMES GENETICS 2019; 9:3575-3582. [PMID: 31455677 PMCID: PMC6829143 DOI: 10.1534/g3.119.400596] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Copy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications of machine learning in genomics, we describe a method to detect duplications and deletions in short-read sequencing data. In low coverage data, machine learning appears to be more powerful in the detection of CNVs than the gold-standard methods of coverage estimation alone, and of equal power in high coverage data. We also demonstrate how replicating training sets allows a more precise detection of CNVs, even identifying novel CNVs in two genomes previously surveyed thoroughly for CNVs using long read data.
Collapse
|
19
|
Ghavi-Helm Y, Jankowski A, Meiers S, Viales RR, Korbel JO, Furlong EEM. Highly rearranged chromosomes reveal uncoupling between genome topology and gene expression. Nat Genet 2019; 51:1272-1282. [PMID: 31308546 PMCID: PMC7116017 DOI: 10.1038/s41588-019-0462-3] [Citation(s) in RCA: 202] [Impact Index Per Article: 40.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 06/05/2019] [Indexed: 12/21/2022]
Abstract
Chromatin topology is intricately linked to gene expression, yet its functional requirement remains unclear. Here, we comprehensively assessed the interplay between genome topology and gene expression using highly rearranged chromosomes (balancers) spanning ~75% of the Drosophila genome. Using transheterozyte (balancer/wild-type) embryos, we measured allele-specific changes in topology and gene expression in cis, whilst minimizing trans effects. Through genome sequencing, we resolved eight large nested inversions, smaller inversions, duplications, and thousands of deletions. These extensive rearrangements caused many changes to chromatin topology, including long-range loops, TADs and promoter interactions, yet these are not predictive of changes in expression. Gene expression is generally not altered around inversion breakpoints, indicating that mis-appropriate enhancer-promoter activation is a rare event. Similarly, shuffling or fusing TADs, changing intra-TAD connections and disrupting long-range inter-TAD loops, does not alter expression for the majority of genes. Our results suggest that properties other than chromatin topology ensure productive enhancer-promoter interactions.
Collapse
Affiliation(s)
- Yad Ghavi-Helm
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany. .,Institut de Génomique Fonctionnelle de Lyon, Univ Lyon, CNRS UMR 5242, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, Lyon, France.
| | | | - Sascha Meiers
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Rebecca R Viales
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
| | - Eileen E M Furlong
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
| |
Collapse
|
20
|
Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 2019; 20:117. [PMID: 31159850 PMCID: PMC6547561 DOI: 10.1186/s13059-019-1720-5] [Citation(s) in RCA: 236] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 05/20/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. RESULTS We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. CONCLUSION These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Xiaoxi Liu
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Chikashi Terao
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Michiaki Kubo
- RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| |
Collapse
|
21
|
Lin YL, Gokcumen O. Fine-Scale Characterization of Genomic Structural Variation in the Human Genome Reveals Adaptive and Biomedically Relevant Hotspots. Genome Biol Evol 2019; 11:1136-1151. [PMID: 30887040 PMCID: PMC6475128 DOI: 10.1093/gbe/evz058] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/16/2019] [Indexed: 12/25/2022] Open
Abstract
Genomic structural variants (SVs) are distributed nonrandomly across the human genome. The "hotspots" of SVs have been implicated in evolutionary innovations, as well as medical conditions. However, the evolutionary and biomedical features of these hotspots remain incompletely understood. Here, we analyzed data from 2,504 genomes to construct a refined map of 1,148 SV hotspots in human genomes. We confirmed that segmental duplication-related nonallelic homologous recombination is an important mechanistic driver of SV hotspot formation. However, to our surprise, we also found that a majority of SVs in hotspots do not form through such recombination-based mechanisms, suggesting diverse mechanistic and selective forces shaping hotspots. Indeed, our evolutionary analyses showed that the majority of SV hotspots are within gene-poor regions and evolve under relaxed negative selection or neutrality. However, we still found a small subset of SV hotspots harboring genes that are enriched for anthropologically crucial functions and evolve under geography-specific and balancing adaptive forces. These include two independent hotspots on different chromosomes affecting alpha and beta hemoglobin gene clusters. Biomedically, we found that the SV hotspots coincide with breakpoints of clinically relevant, large de novo SVs, significantly more often than genome-wide expectations. For example, we showed that the breakpoints of multiple large SVs, which lead to idiopathic short stature, coincide with SV hotspots. Therefore, the mutational instability in SV hotpots likely enables chromosomal breaks that lead to pathogenic structural variation formations. Overall, our study contributes to a better understanding of the mutational and adaptive landscape of the genome.
Collapse
Affiliation(s)
- Yen-Lung Lin
- Department of Biological Sciences, University at Buffalo
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo
- Corresponding author: E-mail: or
| |
Collapse
|
22
|
Li J, Jiang L, Wu CI, Lu X, Fang S, Ting CT. Small Segmental Duplications in Drosophila-High Rate of Emergence and Elimination. Genome Biol Evol 2019; 11:486-496. [PMID: 30689862 PMCID: PMC6380325 DOI: 10.1093/gbe/evz011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2019] [Indexed: 12/12/2022] Open
Abstract
Segmental duplications are an important class of mutations. Because a large proportion of segmental duplications may often be strongly deleterious, high frequency or fixed segmental duplications may represent only a tiny fraction of the mutational input. To understand the emergence and elimination of segmental duplications, we survey polymorphic duplications, including tandem and interspersed duplications, in natural populations of Drosophila by haploid embryo genomes. As haploid embryos are not expected to be heterozygous, the genome, sites of heterozygosity (referred to as pseudoheterozygous sites [PHS]), may likely represent recent duplications that have acquired new mutations. Among the 29 genomes of Drosophila melanogaster, we identify 2,282 polymorphic PHS duplications (linked PHS regions) in total or 154 PHS duplications per genome. Most PHS duplications are small (83.4% < 500 bp), Drosophila melanogaster lineage specific, and strain specific (72.6% singletons). The excess of the observed singleton PHS duplications deviates significantly from the neutral expectation, suggesting that most PHS duplications are strongly deleterious. In addition, these small segmental duplications are not evenly distributed in genomic regions and less common in noncoding functional element regions. The underrepresentation in RNA polymerase II binding sites and regions with active histone modifications is correlated with ages of duplications. In conclusion, small segmental duplications occur frequently in Drosophila but rapidly eliminated by natural selection.
Collapse
Affiliation(s)
- Juan Li
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China.,University of Chinese Academy of Sciences, Beijing, China.,Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, Taiwan
| | - Lan Jiang
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Chung-I Wu
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China.,Department of Ecology and Evolution, University of Chicago.,School of Life Science, Sun Yat-Sen University, Guangzhou, China
| | - Xuemei Lu
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China
| | - Shu Fang
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Chau-Ti Ting
- Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, Taiwan.,Department of Life Science, Center for Biotechnology, Center for Developmental Biology and Regenerative Medicine, National Taiwan University.,Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
| |
Collapse
|
23
|
Chain FJJ, Flynn JM, Bull JK, Cristescu ME. Accelerated rates of large-scale mutations in the presence of copper and nickel. Genome Res 2019; 29:64-73. [PMID: 30487211 PMCID: PMC6314161 DOI: 10.1101/gr.234724.118] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 11/22/2018] [Indexed: 12/13/2022]
Abstract
Mutation rate variation has been under intense investigation for decades. Despite these efforts, little is known about the extent to which environmental stressors accelerate mutation rates and influence the genetic load of populations. Moreover, most studies on stressors have focused on unicellular organisms and point mutations rather than large-scale deletions and duplications (copy number variations [CNVs]). We estimated mutation rates in Daphnia pulex exposed to low levels of environmental stressors as well as the effect of selection on de novo mutations. We conducted a mutation accumulation (MA) experiment in which selection was minimized, coupled with an experiment in which a population was propagated under competitive conditions in a benign environment. After an average of 103 generations of MA propagation, we sequenced 60 genomes and found significantly accelerated rates of deletions and duplications in MA lines exposed to ecologically relevant concentrations of metals. Whereas control lines had gene deletion and duplication rates comparable to other multicellular eukaryotes (1.8 × 10-6 per gene per generation), the presence of nickel and copper increased these rates fourfold. The realized mutation rate under selection was reduced to 0.4× that of control MA lines, providing evidence that CNVs contribute to mutational load. Our CNV breakpoint analysis revealed that nonhomologous recombination associated with regions of DNA fragility is the primary source of CNVs, plausibly linking metal-induced DNA strand breaks with higher CNV rates. Our findings suggest that environmental stress, in particular multiple stressors, can have profound effects on large-scale mutation rates and mutational load of multicellular organisms.
Collapse
Affiliation(s)
- Frédéric J J Chain
- Department of Biology, McGill University, Montréal, Québec H3A 1B1, Canada
| | - Jullien M Flynn
- Department of Biology, McGill University, Montréal, Québec H3A 1B1, Canada
| | - James K Bull
- Department of Biology, McGill University, Montréal, Québec H3A 1B1, Canada
| | | |
Collapse
|
24
|
Wang GD, Shao XJ, Bai B, Wang J, Wang X, Cao X, Liu YH, Wang X, Yin TT, Zhang SJ, Lu Y, Wang Z, Wang L, Zhao W, Zhang B, Ruan J, Zhang YP. Structural variation during dog domestication: insights from gray wolf and dhole genomes. Natl Sci Rev 2019; 6:110-122. [PMID: 34694297 PMCID: PMC8291444 DOI: 10.1093/nsr/nwy076] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 06/27/2018] [Accepted: 07/17/2018] [Indexed: 12/11/2022] Open
Abstract
Several processes like phenotypic evolution, disease susceptibility and environmental adaptations, which fashion the domestication of animals, are largely attributable to structural variations (SVs) in the genome. Here, we present high-quality draft genomes of the gray wolf (Canis lupus) and dhole (Cuon alpinus) with scaffold N50 of 6.04 Mb and 3.96 Mb, respectively. Sequence alignment comprising genomes of three canid species reveals SVs specific to the dog, particularly 16 315 insertions, 2565 deletions, 443 repeats, 16 inversions and 15 translocations. Functional annotation of the dog SVs associated with genes indicates their enrichments in energy metabolisms, neurological processes and immune systems. Interestingly, we identify and verify at population level an insertion fully covering a copy of the AKR1B1 (Aldo-Keto Reductase Family 1 Member B) transcript. Transcriptome analysis reveals a high level of expression of the new AKR1B1 copy in the small intestine and liver, implying an increase in de novo fatty acid synthesis and antioxidant ability in dog compared to gray wolf, likely in response to dietary shifts during the agricultural revolution. For the first time, we report a comprehensive analysis of the evolutionary dynamics of SVs during the domestication step of dogs. Our findings demonstrate that retroposition can birth new genes to facilitate domestication, and affirm the importance of large-scale genomic variants in domestication studies.
Collapse
Affiliation(s)
- Guo-Dong Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| | - Xiu-Juan Shao
- Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Bing Bai
- Medical Faculty, Kunming University of Science and Technology, Kunming 650504, China
- Department of Pediatrics, the First People's Hospital of Yunnan Province, Kunming 650032, China
| | - Junlong Wang
- College of Pharmacology, Soochow University, Suzhou 215123, China
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Xiaobo Wang
- Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Xue Cao
- Department of Laboratory Animal Science, Kunming Medical University, Kunming 650500, China
| | - Yan-Hu Liu
- Laboratory for Conservation and Utilization of Bio-Resources and Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming 650091, China
| | - Xuan Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China
| | - Ting-Ting Yin
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China
| | - Shao-Jie Zhang
- Laboratory for Conservation and Utilization of Bio-Resources and Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming 650091, China
| | - Yan Lu
- Beijing Zoo, Beijing 100044, China
| | | | - Lu Wang
- Laboratory for Conservation and Utilization of Bio-Resources and Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming 650091, China
| | - Wenming Zhao
- Core Genomic Facility, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Bing Zhang
- Core Genomic Facility, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jue Ruan
- Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Ya-Ping Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
25
|
Guggisberg A, Liu X, Suter L, Mansion G, Fischer MC, Fior S, Roumet M, Kretzschmar R, Koch MA, Widmer A. The genomic basis of adaptation to calcareous and siliceous soils in Arabidopsis lyrata. Mol Ecol 2018; 27:5088-5103. [PMID: 30411828 DOI: 10.1111/mec.14930] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/03/2018] [Accepted: 10/04/2018] [Indexed: 12/27/2022]
Abstract
Edaphic conditions are important determinants of plant fitness. While much has been learnt in recent years about plant adaptation to heavy metal contaminated soils, the genomic basis underlying adaptation to calcareous and siliceous substrates remains largely unknown. We performed a reciprocal germination experiment and whole-genome resequencing in natural calcareous and siliceous populations of diploid Arabidopsis lyrata to test for edaphic adaptation and detect signatures of selection at loci associated with soil-mediated divergence. In parallel, genome scans on respective diploid ecotypes from the Arabidopsis arenosa species complex were undertaken, to search for shared patterns of adaptive genetic divergence. Soil ecotypes of A. lyrata display significant genotype-by-treatment responses for seed germination. Sequence (SNPs) and copy-number variants (CNVs) point towards loci involved in ion transport as the main targets of adaptive genetic divergence. Two genes exhibiting high differentiation among soil types in A. lyrata further share trans-specific single nucleotide polymorphisms with A. arenosa. This work applies experimental and genomic approaches to study edaphic adaptation in A. lyrata and suggests that physiological response to elemental toxicity and deficiency underlies the evolution of calcareous and siliceous ecotypes. The discovery of shared adaptive variation between sister species indicates that ancient polymorphisms contribute to adaptive evolution.
Collapse
Affiliation(s)
| | - Xuanyu Liu
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Léonie Suter
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Guilhem Mansion
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Martin C Fischer
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Simone Fior
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Marie Roumet
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Ruben Kretzschmar
- Institute of Biogeochemistry and Pollutant Dynamics, ETH Zurich, Zurich, Switzerland
| | - Marcus A Koch
- Centre for Organismal Studies Heidelberg, Heidelberg University, Heidelberg, Germany
| | - Alex Widmer
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
26
|
Structural Variants and Selective Sweep Foci Contribute to Insecticide Resistance in the Drosophila Genetic Reference Panel. G3-GENES GENOMES GENETICS 2018; 8:3489-3497. [PMID: 30190421 PMCID: PMC6222580 DOI: 10.1534/g3.118.200619] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Patterns of nucleotide polymorphism within populations of Drosophila melanogaster suggest that insecticides have been the selective agents driving the strongest recent bouts of positive selection. However, there is a need to explicitly link selective sweeps to the particular insecticide phenotypes that could plausibly account for the drastic selective responses that are observed in these non-target insects. Here, we screen the Drosophila Genetic Reference Panel with two common insecticides; malathion (an organophosphate) and permethrin (a pyrethroid). Genome-wide association studies map survival on malathion to two of the largest sweeps in the D. melanogaster genome; Ace and Cyp6g1. Malathion survivorship also correlates with lines which have high levels of Cyp12d1, Jheh1 and Jheh2 transcript abundance. Permethrin phenotypes map to the largest cluster of P450 genes in the Drosophila genome, however in contrast to a selective sweep driven by insecticide use, the derived allele seems to be associated with susceptibility. These results underscore previous findings that highlight the importance of structural variation to insecticide phenotypes: Cyp6g1 exhibits copy number variation and transposable element insertions, Cyp12d1 is tandemly duplicated, the Jheh loci are associated with a Bari1 transposable element insertion, and a Cyp6a17 deletion is associated with susceptibility.
Collapse
|
27
|
Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz MC. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods 2018; 15:461-468. [PMID: 29713083 PMCID: PMC5990442 DOI: 10.1038/s41592-018-0001-7] [Citation(s) in RCA: 855] [Impact Index Per Article: 142.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 03/16/2018] [Indexed: 02/08/2023]
Abstract
Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.
Collapse
Affiliation(s)
- Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| | - Philipp Rescheneder
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, Vienna, Austria
| | - Moritz Smolka
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, Vienna, Austria
| | - Han Fang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Maria Nattestad
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, Vienna, Austria
- Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria
| | - Michael C Schatz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
28
|
Levin TC, Malik HS. Rapidly Evolving Toll-3/4 Genes Encode Male-Specific Toll-Like Receptors in Drosophila. Mol Biol Evol 2017; 34:2307-2323. [PMID: 28541576 PMCID: PMC5850136 DOI: 10.1093/molbev/msx168] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Animal Toll-like receptors (TLRs) have evolved through a pattern of duplication and divergence. Whereas mammalian TLRs directly recognize microbial ligands, Drosophila Tolls bind endogenous ligands downstream of both developmental and immune signaling cascades. Here, we find that most Toll genes in Drosophila evolve slowly with little gene turnover (gains/losses), consistent with their important roles in development and indirect roles in microbial recognition. In contrast, we find that the Toll-3/4 genes have experienced an unusually rapid rate of gene gains and losses, resulting in lineage-specific Toll-3/4s and vastly different gene repertoires among Drosophila species, from zero copies (e.g., D. mojavensis) to nineteen copies (e.g., D. willistoni). In D. willistoni, we find strong evidence for positive selection in Toll-3/4 genes, localized specifically to an extracellular region predicted to overlap with the binding site of Spätzle, the only known ligand of insect Tolls. However, because Spätzle genes are not experiencing similar selective pressures, we hypothesize that Toll-3/4s may be rapidly evolving because they bind to a different ligand, akin to TLRs outside of insects. We further find that most Drosophila Toll-3/4 genes are either weakly expressed or expressed exclusively in males, specifically in the germline. Unlike other Toll genes in D. melanogaster, Toll-3, and Toll-4 have apparently escaped from essential developmental roles, as knockdowns have no substantial effects on viability or male fertility. Based on these findings, we propose that the Toll-3/4 genes represent an exceptionally rapidly evolving lineage of Drosophila Toll genes, which play an unusual, as-yet-undiscovered role in the male germline.
Collapse
Affiliation(s)
- Tera C Levin
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Harmit S Malik
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA.,Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA
| |
Collapse
|
29
|
Graves JL, Hertweck KL, Phillips MA, Han MV, Cabral LG, Barter TT, Greer LF, Burke MK, Mueller LD, Rose MR. Genomics of Parallel Experimental Evolution in Drosophila. Mol Biol Evol 2017; 34:831-842. [PMID: 28087779 PMCID: PMC5400383 DOI: 10.1093/molbev/msw282] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
What are the genomic foundations of adaptation in sexual populations? We address this question using fitness–character and whole-genome sequence data from 30 Drosophila laboratory populations. These 30 populations are part of a nearly 40-year laboratory radiation featuring 3 selection regimes, each shared by 10 populations for up to 837 generations, with moderately large effective population sizes. Each of 3 sets of the 10 populations that shared a selection regime consists of 5 populations that have long been maintained under that selection regime, paired with 5 populations that had only recently been subjected to that selection regime. We find a high degree of evolutionary parallelism in fitness phenotypes when most-recent selection regimes are shared, as in previous studies from our laboratory. We also find genomic parallelism with respect to the frequencies of single-nucleotide polymorphisms, transposable elements, insertions, and structural variants, which was expected. Entirely unexpected was a high degree of parallelism for linkage disequilibrium. The evolutionary genetic changes among these sexual populations are rapid and genomically extensive. This pattern may be due to segregating functional genetic variation that is abundantly maintained genome-wide by selection, variation that responds immediately to changes of selection regime.
Collapse
Affiliation(s)
- J L Graves
- Joint School of Nanoscience and Nanoengineering, North Carolina A&T State University and UNC Greensboro, Greensboro, NC
| | - K L Hertweck
- Department of Biology, The University of Texas at Tyler, Tyler, TX
| | - M A Phillips
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA
| | - M V Han
- School of Life Sciences, University of Nevada, Las Vegas, NV
| | - L G Cabral
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA
| | - T T Barter
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA
| | - L F Greer
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA
| | - M K Burke
- Department of Integrative Biology, Oregon State University, Corvallis, OR
| | - L D Mueller
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA
| | - M R Rose
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA
| |
Collapse
|
30
|
Reexamining the P-Element Invasion of Drosophila melanogaster Through the Lens of piRNA Silencing. Genetics 2017; 203:1513-31. [PMID: 27516614 DOI: 10.1534/genetics.115.184119] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 05/25/2016] [Indexed: 11/18/2022] Open
Abstract
Transposable elements (TEs) are both important drivers of genome evolution and genetic parasites with potentially dramatic consequences for host fitness. The recent explosion of research on regulatory RNAs reveals that small RNA-mediated silencing is a conserved genetic mechanism through which hosts repress TE activity. The invasion of the Drosophila melanogaster genome by P elements, which happened on a historical timescale, represents an incomparable opportunity to understand how small RNA-mediated silencing of TEs evolves. Repression of P-element transposition emerged almost concurrently with its invasion. Recent studies suggest that this repression is implemented in part, and perhaps predominantly, by the Piwi-interacting RNA (piRNA) pathway, a small RNA-mediated silencing pathway that regulates TE activity in many metazoan germlines. In this review, I consider the P-element invasion from both a molecular and evolutionary genetic perspective, reconciling classic studies of P-element regulation with the new mechanistic framework provided by the piRNA pathway. I further explore the utility of the P-element invasion as an exemplar of the evolution of piRNA-mediated silencing. In light of the highly-conserved role for piRNAs in regulating TEs, discoveries from this system have taxonomically broad implications for the evolution of repression.
Collapse
|
31
|
Tandem duplications lead to novel expression patterns through exon shuffling in Drosophila yakuba. PLoS Genet 2017; 13:e1006795. [PMID: 28531189 PMCID: PMC5460883 DOI: 10.1371/journal.pgen.1006795] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 06/06/2017] [Accepted: 05/03/2017] [Indexed: 01/06/2023] Open
Abstract
One common hypothesis to explain the impacts of tandem duplications is that whole gene duplications commonly produce additive changes in gene expression due to copy number changes. Here, we use genome wide RNA-seq data from a population sample of Drosophila yakuba to test this ‘gene dosage’ hypothesis. We observe little evidence of expression changes in response to whole transcript duplication capturing 5′ and 3′ UTRs. Among whole gene duplications, we observe evidence that dosage sharing across copies is likely to be common. The lack of expression changes after whole gene duplication suggests that the majority of genes are subject to tight regulatory control and therefore not sensitive to changes in gene copy number. Rather, we observe changes in expression level due to both shuffling of regulatory elements and the creation of chimeric structures via tandem duplication. Additionally, we observe 30 de novo gene structures arising from tandem duplications, 23 of which form with expression in the testes. Thus, the value of tandem duplications is likely to be more intricate than simple changes in gene dosage. The common regulatory effects from chimeric gene formation after tandem duplication may explain their contribution to genome evolution. The enclosed work shows that whole gene duplications rarely affect gene expression, in contrast to widely held views that the adaptive value of duplicate genes is related to additive changes in gene expression due to gene copy number. We further explain how tandem duplications that create shuffled gene structures can force upregulation of gene sequences, de novo gene creation, and multifold changes in transcript levels. These results show that tandem duplications can produce new genes that are a source of immediate novelty associated with more extreme expression changes than previously suggested by theory. Further, these gene expression changes are a potential source of both beneficial and pathogenic mutations, immediately relevant to clinical and medical genetics in humans and other metazoans.
Collapse
|
32
|
Du H, Yu Y, Ma Y, Gao Q, Cao Y, Chen Z, Ma B, Qi M, Li Y, Zhao X, Wang J, Liu K, Qin P, Yang X, Zhu L, Li S, Liang C. Sequencing and de novo assembly of a near complete indica rice genome. Nat Commun 2017; 8:15324. [PMID: 28469237 PMCID: PMC5418594 DOI: 10.1038/ncomms15324] [Citation(s) in RCA: 173] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 03/17/2017] [Indexed: 01/03/2023] Open
Abstract
A high-quality reference genome is critical for understanding genome structure, genetic variation and evolution of an organism. Here we report the de novo assembly of an indica rice genome Shuhui498 (R498) through the integration of single-molecule sequencing and mapping data, genetic map and fosmid sequence tags. The 390.3 Mb assembly is estimated to cover more than 99% of the R498 genome and is more continuous than the current reference genomes of japonica rice Nipponbare (MSU7) and Arabidopsis thaliana (TAIR10). We annotate high-quality protein-coding genes in R498 and identify genetic variations between R498 and Nipponbare and presence/absence variations by comparing them to 17 draft genomes in cultivated rice and its closest wild relatives. Our results demonstrate how to de novo assemble a highly contiguous and near-complete plant genome through an integrative strategy. The R498 genome will serve as a reference for the discovery of genes and structural variations in rice. High-quality reference genomes facilitate analysis of genome structure and variation. Here Du et al. create a near-complete assembly of the indica rice genome by combining single molecule sequencing with mapping data and fosmid sequences and identify genetic variants by comparison with other rice genomes.
Collapse
Affiliation(s)
- Huilong Du
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ying Yu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Yanfei Ma
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Qiang Gao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Yinghao Cao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Zhuo Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bin Ma
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Ming Qi
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Yan Li
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Xianfeng Zhao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Jing Wang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Kunfan Liu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Peng Qin
- Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Xin Yang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Lihuang Zhu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China
| | - Shigui Li
- Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Chengzhi Liang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
33
|
Pinharanda A, Martin SH, Barker SL, Davey JW, Jiggins CD. The comparative landscape of duplications in Heliconius melpomene and Heliconius cydno. Heredity (Edinb) 2017; 118:78-87. [PMID: 27925618 PMCID: PMC5176112 DOI: 10.1038/hdy.2016.107] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 08/24/2016] [Accepted: 08/25/2016] [Indexed: 01/01/2023] Open
Abstract
Gene duplications can facilitate adaptation and may lead to interpopulation divergence, causing reproductive isolation. We used whole-genome resequencing data from 34 butterflies to detect duplications in two Heliconius species, Heliconius cydno and Heliconius melpomene. Taking advantage of three distinctive signals of duplication in short-read sequencing data, we identified 744 duplicated loci in H. cydno and H. melpomene and evaluated the accuracy of our approach using single-molecule sequencing. We have found that duplications overlap genes significantly less than expected at random in H. melpomene, consistent with the action of background selection against duplicates in functional regions of the genome. Duplicate loci that are highly differentiated between H. melpomene and H. cydno map to four different chromosomes. Four duplications were identified with a strong signal of divergent selection, including an odorant binding protein and another in close proximity with a known wing colour pattern locus that differs between the two species.
Collapse
Affiliation(s)
- A Pinharanda
- Department of Zoology, University of
Cambridge, Cambridge, UK
| | - S H Martin
- Department of Zoology, University of
Cambridge, Cambridge, UK
| | - S L Barker
- Department of Zoology, University of
Cambridge, Cambridge, UK
| | - J W Davey
- Department of Zoology, University of
Cambridge, Cambridge, UK
| | - C D Jiggins
- Department of Zoology, University of
Cambridge, Cambridge, UK
| |
Collapse
|
34
|
Genetic variants regulating expression levels and isoform diversity during embryogenesis. Nature 2016; 541:402-406. [PMID: 28024300 DOI: 10.1038/nature20802] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2016] [Accepted: 11/16/2016] [Indexed: 12/17/2022]
Abstract
Embryonic development is driven by tightly regulated patterns of gene expression, despite extensive genetic variation among individuals. Studies of expression quantitative trait loci (eQTL) indicate that genetic variation frequently alters gene expression in cell-culture models and differentiated tissues. However, the extent and types of genetic variation impacting embryonic gene expression, and their interactions with developmental programs, remain largely unknown. Here we assessed the effect of genetic variation on transcriptional (expression levels) and post-transcriptional (3' RNA processing) regulation across multiple stages of metazoan development, using 80 inbred Drosophila wild isolates, identifying thousands of developmental-stage-specific and shared QTL. Given the small blocks of linkage disequilibrium in Drosophila, we obtain near base-pair resolution, resolving causal mutations in developmental enhancers, validated transcription-factor-binding sites and RNA motifs. This fine-grain mapping uncovered extensive allelic interactions within enhancers that have opposite effects, thereby buffering their impact on enhancer activity. QTL affecting 3' RNA processing identify new functional motifs leading to transcript isoform diversity and changes in the lengths of 3' untranslated regions. These results highlight how developmental stage influences the effects of genetic variation and uncover multiple mechanisms that regulate and buffer expression variation during embryogenesis.
Collapse
|
35
|
Evidence for the fixation of gene duplications by positive selection in Drosophila. Genome Res 2016; 26:787-98. [PMID: 27197209 PMCID: PMC4889967 DOI: 10.1101/gr.199323.115] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 04/11/2016] [Indexed: 11/30/2022]
Abstract
Gene duplications play a key role in the emergence of novel traits and in adaptation. But despite their centrality to evolutionary processes, it is still largely unknown how new gene duplicates are initially fixed within populations and later maintained in genomes. Long-standing debates on the evolution of gene duplications could be settled by determining the relative importance of genetic drift vs. positive selection in the fixation of new gene duplicates. Using the Drosophila Global Diversity Lines (GDL), we have combined genome-wide SNP polymorphism data with a novel set of copy number variant calls and gene expression profiles to characterize the polymorphic phase of new genes. We found that approximately half of the roughly 500 new complete gene duplications segregating in the GDL lead to significant increases in the expression levels of the duplicated genes and that these duplications are more likely to be found at lower frequencies, suggesting a negative impact on fitness. However, we also found that six of the nine gene duplications that are fixed or close to fixation in at least one of the five populations in our study show signs of being under positive selection, and that these duplications are likely beneficial because of dosage effects, with a possible role for additional mutations in two duplications. Our work suggests that in Drosophila, theoretical models that posit that gene duplications are immediately beneficial and fixed by positive selection are most relevant to explain the long-term evolution of gene duplications in this species.
Collapse
|
36
|
Bai Z, Chen J, Liao Y, Wang M, Liu R, Ge S, Wing RA, Chen M. The impact and origin of copy number variations in the Oryza species. BMC Genomics 2016; 17:261. [PMID: 27025496 PMCID: PMC4812662 DOI: 10.1186/s12864-016-2589-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2015] [Accepted: 03/15/2016] [Indexed: 02/16/2023] Open
Abstract
Background Copy number variation (CNV), a complex genomic rearrangement, has been extensively studied in humans and other organisms. In plants, CNVs of several genes were found to be responsible for various important traits; however, the cause and consequence of CNVs remains largely unknown. Recently released next-generation sequencing (NGS) data provide an opportunity for a genome-wide study of CNVs in rice. Results Here, by an NGS-based approach, we generated a CNV map comprising 9,196 deletions compared to the reference genome ‘Nipponbare’. Using Oryza glaberrima as the outgroup, 80 % of the CNV events turned out to be insertions in Nipponbare. There were 2,806 annotated genes affected by these CNV events. We experimentally validated 28 functional CNV genes including OsMADS56, BPH14, OsDCL2b and OsMADS30, implying that CNVs might have contributed to phenotypic variations in rice. Most CNV genes were found to be located in non-co-linear positions by comparison to O. glaberrima. One of the origins of these non-co-linear genes was genomic duplications caused by transposon activity or double-strand break repair. Comprehensive analysis of mutation mechanisms suggested an abundance of CNVs formed by non-homologous end-joining and mobile element insertion. Conclusions This study showed the impact and origin of copy number variations in rice on a genomic scale. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2589-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zetao Bai
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Jinfeng Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yi Liao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Meijiao Wang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Rong Liu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Song Ge
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Rod A Wing
- Arizona Genomics Institute, School of Plant Science, University of Arizona, Tucson, AZ, 85721, USA
| | - Mingsheng Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
| |
Collapse
|
37
|
Structural Variant Detection by Large-scale Sequencing Reveals New Evolutionary Evidence on Breed Divergence between Chinese and European Pigs. Sci Rep 2016; 6:18501. [PMID: 26729041 PMCID: PMC4700453 DOI: 10.1038/srep18501] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 11/19/2015] [Indexed: 01/28/2023] Open
Abstract
In this study, we performed a genome-wide SV detection among the genomes of thirteen pigs from diverse Chinese and European originated breeds by next genetation sequencing, and constrcuted a single-nucleotide resolution map involving 56,930 putative SVs. We firstly identified a SV hotspot spanning 35 Mb region on the X chromosome specifically in the genomes of Chinese originated individuals. Further scrutinizing this region by large-scale sequencing data of extra 111 individuals, we obtained the confirmatory evidence on our initial finding. Moreover, thirty five SV-related genes within the hotspot region, being of importance for reproduction ability, rendered significant different evolution rates between Chinese and European originated breeds. The SV hotspot identified herein offers a novel evidence for assessing phylogenetic relationships, as well as likely explains the genetic difference of corresponding phenotypes and features, among Chinese and European pig breeds. Furthermore, we employed various SVs to infer genetic structure of individuls surveyed. We found SVs can clearly detect the difference of genetic background among individuals. This clues us that genome-wide SVs can capture majority of geneic variation and be applied into cladistic analyses. Characterizing whole genome SVs demonstrated that SVs are significantly enriched/depleted with various genomic features.
Collapse
|
38
|
Shadow Enhancers Are Pervasive Features of Developmental Regulatory Networks. Curr Biol 2015; 26:38-51. [PMID: 26687625 PMCID: PMC4712172 DOI: 10.1016/j.cub.2015.11.034] [Citation(s) in RCA: 151] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2015] [Revised: 11/16/2015] [Accepted: 11/17/2015] [Indexed: 11/22/2022]
Abstract
Embryogenesis is remarkably robust to segregating mutations and environmental variation; under a range of conditions, embryos of a given species develop into stereotypically patterned organisms. Such robustness is thought to be conferred, in part, through elements within regulatory networks that perform similar, redundant tasks. Redundant enhancers (or "shadow" enhancers), for example, can confer precision and robustness to gene expression, at least at individual, well-studied loci. However, the extent to which enhancer redundancy exists and can thereby have a major impact on developmental robustness remains unknown. Here, we systematically assessed this, identifying over 1,000 predicted shadow enhancers during Drosophila mesoderm development. The activity of 23 elements, associated with five genes, was examined in transgenic embryos, while natural structural variation among individuals was used to assess their ability to buffer against genetic variation. Our results reveal three clear properties of enhancer redundancy within developmental systems. First, it is much more pervasive than previously anticipated, with 64% of loci examined having shadow enhancers. Their spatial redundancy is often partial in nature, while the non-overlapping function may explain why these enhancers are maintained within a population. Second, over 70% of loci do not follow the simple situation of having only two shadow enhancers-often there are three (rols), four (CadN and ade5), or five (Traf1), at least one of which can be deleted with no obvious phenotypic effects. Third, although shadow enhancers can buffer variation, patterns of segregating variation suggest that they play a more complex role in development than generally considered.
Collapse
|
39
|
Najarro MA, Hackett JL, Smith BR, Highfill CA, King EG, Long AD, Macdonald SJ. Identifying Loci Contributing to Natural Variation in Xenobiotic Resistance in Drosophila. PLoS Genet 2015; 11:e1005663. [PMID: 26619284 PMCID: PMC4664282 DOI: 10.1371/journal.pgen.1005663] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/21/2015] [Indexed: 12/12/2022] Open
Abstract
Natural populations exhibit a great deal of interindividual genetic variation in the response to toxins, exemplified by the variable clinical efficacy of pharmaceutical drugs in humans, and the evolution of pesticide resistant insects. Such variation can result from several phenomena, including variable metabolic detoxification of the xenobiotic, and differential sensitivity of the molecular target of the toxin. Our goal is to genetically dissect variation in the response to xenobiotics, and characterize naturally-segregating polymorphisms that modulate toxicity. Here, we use the Drosophila Synthetic Population Resource (DSPR), a multiparent advanced intercross panel of recombinant inbred lines, to identify QTL (Quantitative Trait Loci) underlying xenobiotic resistance, and employ caffeine as a model toxic compound. Phenotyping over 1,700 genotypes led to the identification of ten QTL, each explaining 4.5-14.4% of the broad-sense heritability for caffeine resistance. Four QTL harbor members of the cytochrome P450 family of detoxification enzymes, which represent strong a priori candidate genes. The case is especially strong for Cyp12d1, with multiple lines of evidence indicating the gene causally impacts caffeine resistance. Cyp12d1 is implicated by QTL mapped in both panels of DSPR RILs, is significantly upregulated in the presence of caffeine, and RNAi knockdown robustly decreases caffeine tolerance. Furthermore, copy number variation at Cyp12d1 is strongly associated with phenotype in the DSPR, with a trend in the same direction observed in the DGRP (Drosophila Genetic Reference Panel). No additional plausible causative polymorphisms were observed in a full genomewide association study in the DGRP, or in analyses restricted to QTL regions mapped in the DSPR. Just as in human populations, replicating modest-effect, naturally-segregating causative variants in an association study framework in flies will likely require very large sample sizes.
Collapse
Affiliation(s)
- Michael A. Najarro
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Jennifer L. Hackett
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Brittny R. Smith
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Chad A. Highfill
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Elizabeth G. King
- Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America
| | - Anthony D. Long
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Stuart J. Macdonald
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
- Center for Computational Biology, University of Kansas, Lawrence, Kansas, United States of America
- * E-mail:
| |
Collapse
|
40
|
Rahman R, Chirn GW, Kanodia A, Sytnikova YA, Brembs B, Bergman CM, Lau NC. Unique transposon landscapes are pervasive across Drosophila melanogaster genomes. Nucleic Acids Res 2015; 43:10655-72. [PMID: 26578579 PMCID: PMC4678822 DOI: 10.1093/nar/gkv1193] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Accepted: 10/24/2015] [Indexed: 01/01/2023] Open
Abstract
To understand how transposon landscapes (TLs) vary across animal genomes, we describe a new method called the Transposon Insertion and Depletion AnaLyzer (TIDAL) and a database of >300 TLs in Drosophila melanogaster (TIDAL-Fly). Our analysis reveals pervasive TL diversity across cell lines and fly strains, even for identically named sub-strains from different laboratories such as the ISO1 strain used for the reference genome sequence. On average, >500 novel insertions exist in every lab strain, inbred strains of the Drosophila Genetic Reference Panel (DGRP), and fly isolates in the Drosophila Genome Nexus (DGN). A minority (<25%) of transposon families comprise the majority (>70%) of TL diversity across fly strains. A sharp contrast between insertion and depletion patterns indicates that many transposons are unique to the ISO1 reference genome sequence. Although TL diversity from fly strains reaches asymptotic limits with increasing sequencing depth, rampant TL diversity causes unsaturated detection of TLs in pools of flies. Finally, we show novel transposon insertions negatively correlate with Piwi-interacting RNA (piRNA) levels for most transposon families, except for the highly-abundant roo retrotransposon. Our study provides a useful resource for Drosophila geneticists to understand how transposons create extensive genomic diversity in fly cell lines and strains.
Collapse
Affiliation(s)
- Reazur Rahman
- Department of Biology and Rosenstiel Basic Medical Science Research Center, Brandeis University, Waltham, MA 02454, USA
| | - Gung-wei Chirn
- Department of Biology and Rosenstiel Basic Medical Science Research Center, Brandeis University, Waltham, MA 02454, USA
| | - Abhay Kanodia
- Department of Biology and Rosenstiel Basic Medical Science Research Center, Brandeis University, Waltham, MA 02454, USA
| | - Yuliya A Sytnikova
- Department of Biology and Rosenstiel Basic Medical Science Research Center, Brandeis University, Waltham, MA 02454, USA
| | - Björn Brembs
- Institute of Zoology, Universität Regensburg, Regensburg, Germany
| | - Casey M Bergman
- Faculty of Life Sciences, University of Manchester, Manchester M21 0RG, UK
| | - Nelson C Lau
- Department of Biology and Rosenstiel Basic Medical Science Research Center, Brandeis University, Waltham, MA 02454, USA
| |
Collapse
|
41
|
Schröder J, Girirajan S, Papenfuss AT, Medvedev P. Improving the Power of Structural Variation Detection by Augmenting the Reference. PLoS One 2015; 10:e0136771. [PMID: 26322511 PMCID: PMC4556445 DOI: 10.1371/journal.pone.0136771] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 08/07/2015] [Indexed: 11/18/2022] Open
Abstract
The uses of the Genome Reference Consortium’s human reference sequence can be roughly categorized into three related but distinct categories: as a representative species genome, as a coordinate system for identifying variants, and as an alignment reference for variation detection algorithms. However, the use of this reference sequence as simultaneously a representative species genome and as an alignment reference leads to unnecessary artifacts for structural variation detection algorithms and limits their accuracy. We show how decoupling these two references and developing a separate alignment reference can significantly improve the accuracy of structural variation detection, lead to improved genotyping of disease related genes, and decrease the cost of studying polymorphism in a population.
Collapse
Affiliation(s)
- Jan Schröder
- The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia; Peter MacCallum Cancer Centre, Melbourne, Australia
| | - Santhosh Girirajan
- Genomic Sciences Institute of the Huck, The Pennsylvania State University, State College, United States of America; Department of Computer Science and Engineering, The Pennsylvania State University, State College, United States of America
| | - Anthony T Papenfuss
- The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Department of Medical Biology, The University of Melbourne, Melbourne, Australia; Peter MacCallum Cancer Centre, Melbourne, Australia
| | - Paul Medvedev
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, State College, United States of America; Genomic Sciences Institute of the Huck, The Pennsylvania State University, State College, United States of America; Department of Computer Science and Engineering, The Pennsylvania State University, State College, United States of America
| |
Collapse
|
42
|
Abstract
Gut immunocompetence involves immune, stress and regenerative processes. To investigate the determinants underlying inter-individual variation in gut immunocompetence, we perform enteric infection of 140 Drosophila lines with the entomopathogenic bacterium Pseudomonas entomophila and observe extensive variation in survival. Using genome-wide association analysis, we identify several novel immune modulators. Transcriptional profiling further shows that the intestinal molecular state differs between resistant and susceptible lines, already before infection, with one transcriptional module involving genes linked to reactive oxygen species (ROS) metabolism contributing to this difference. This genetic and molecular variation is physiologically manifested in lower ROS activity, lower susceptibility to ROS-inducing agent, faster pathogen clearance and higher stem cell activity in resistant versus susceptible lines. This study provides novel insights into the determinants underlying population-level variability in gut immunocompetence, revealing how relatively minor, but systematic genetic and transcriptional variation can mediate overt physiological differences that determine enteric infection susceptibility. Animals rely on a multitude of resistance and tolerance mechanisms to resist harmful gut microbes. Here, the authors explore the genetic, molecular and physiological basis underlying the remarkable phenotypic variation in resistance to enteric bacterial infection in Drosophila melanogaster.
Collapse
|
43
|
Rogers RL, Cridland JM, Shao L, Hu TT, Andolfatto P, Thornton KR. Tandem Duplications and the Limits of Natural Selection in Drosophila yakuba and Drosophila simulans. PLoS One 2015; 10:e0132184. [PMID: 26176952 PMCID: PMC4503668 DOI: 10.1371/journal.pone.0132184] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 06/10/2015] [Indexed: 11/30/2022] Open
Abstract
Tandem duplications are an essential source of genetic novelty, and their variation in natural populations is expected to influence adaptive walks. Here, we describe evolutionary impacts of recently-derived, segregating tandem duplications in Drosophila yakuba and Drosophila simulans. We observe an excess of duplicated genes involved in defense against pathogens, insecticide resistance, chorion development, cuticular peptides, and lipases or endopeptidases associated with the accessory glands across both species. The observed agreement is greater than expectations on chance alone, suggesting large amounts of convergence across functional categories. We document evidence of widespread selection on the D. simulans X, suggesting adaptation through duplication is common on the X. Despite the evidence for positive selection, duplicates display an excess of low frequency variants consistent with largely detrimental impacts, limiting the variation that can effectively facilitate adaptation. Standing variation for tandem duplications spans less than 25% of the genome in D. yakuba and D. simulans, indicating that evolution will be strictly limited by mutation, even in organisms with large population sizes. Effective whole gene duplication rates are low at 1.17 × 10-9 per gene per generation in D. yakuba and 6.03 × 10-10 per gene per generation in D. simulans, suggesting long wait times for new mutations on the order of thousands of years for the establishment of sweeps. Hence, in cases where adaptation depends on individual tandem duplications, evolution will be severely limited by mutation. We observe low levels of parallel recruitment of the same duplicated gene in different species, suggesting that the span of standing variation will define evolutionary outcomes in spite of convergence across gene ontologies consistent with rapidly evolving phenotypes.
Collapse
Affiliation(s)
- Rebekah L. Rogers
- Ecology and Evolutionary Biology, University of California, Berkeley, California, United States of America
| | - Julie M. Cridland
- Ecology and Evolutionary Biology, University of California, Davis, Davis, California, United States of America
| | - Ling Shao
- Ecology and Evolutionary Biology, University of California, Irvine, Irvine, California, United States of America
| | - Tina T. Hu
- Ecology and Evolutionary Biology and the Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Peter Andolfatto
- Ecology and Evolutionary Biology and the Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Kevin R. Thornton
- Ecology and Evolutionary Biology, University of California, Irvine, Irvine, California, United States of America
| |
Collapse
|
44
|
Zhang Z, Mao L, Chen H, Bu F, Li G, Sun J, Li S, Sun H, Jiao C, Blakely R, Pan J, Cai R, Luo R, Van de Peer Y, Jacobsen E, Fei Z, Huang S. Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber. THE PLANT CELL 2015; 27:1595-604. [PMID: 26002866 PMCID: PMC4498199 DOI: 10.1105/tpc.114.135848] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Revised: 03/26/2015] [Accepted: 04/30/2015] [Indexed: 05/18/2023]
Abstract
Structural variations (SVs) represent a major source of genetic diversity. However, the functional impact and formation mechanisms of SVs in plant genomes remain largely unexplored. Here, we report a nucleotide-resolution SV map of cucumber (Cucumis sativas) that comprises 26,788 SVs based on deep resequencing of 115 diverse accessions. The largest proportion of cucumber SVs was formed through nonhomologous end-joining rearrangements, and the occurrence of SVs is closely associated with regions of high nucleotide diversity. These SVs affect the coding regions of 1676 genes, some of which are associated with cucumber domestication. Based on the map, we discovered a copy number variation (CNV) involving four genes that defines the Female (F) locus and gives rise to gynoecious cucumber plants, which bear only female flowers and set fruit at almost every node. The CNV arose from a recent 30.2-kb duplication at a meiotically unstable region, likely via microhomology-mediated break-induced replication. The SV set provides a snapshot of structural variations in plants and will serve as an important resource for exploring genes underlying key traits and for facilitating practical breeding in cucumber.
Collapse
Affiliation(s)
- Zhonghua Zhang
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China
| | - Linyong Mao
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853
| | - Huiming Chen
- Hunan Vegetable Research Institute, Hunan Academy of Agricultural Sciences, Changsha 410125, China
| | - Fengjiao Bu
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China Agricultural Genomic Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Guangcun Li
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Jinjing Sun
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China
| | - Shuai Li
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China
| | - Honghe Sun
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853
| | - Chen Jiao
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853
| | - Rachel Blakely
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853
| | - Junsong Pan
- Shanghai Jiaotong University, Shanghai 200240, China
| | - Run Cai
- Shanghai Jiaotong University, Shanghai 200240, China
| | - Ruibang Luo
- Department of Computer Science, University of Hong Kong, Hong Kong 999077, China
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium Genomics Research Institute, University of Pretoria, Pretoria 0028, South Africa
| | - Evert Jacobsen
- Deparment of Plant Sciences, Laboratory of Plant Breeding, Wageningen University and Research Centre, 6700AA Wageningen, The Netherlands
| | - Zhangjun Fei
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853 USDA-ARS Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853
| | - Sanwen Huang
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China Agricultural Genomic Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| |
Collapse
|
45
|
Lin K, Smit S, Bonnema G, Sanchez-Perez G, de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform 2014; 16:852-64. [PMID: 25504367 DOI: 10.1093/bib/bbu047] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Indexed: 01/01/2023] Open
Abstract
From prokaryotes to eukaryotes, phenotypic variation, adaptation and speciation has been associated with structural variation between genomes of individuals within the same species. Many computer algorithms detecting such variations (callers) have recently been developed, spurred by the advent of the next-generation sequencing technology. Such callers mainly exploit split-read mapping or paired-end read mapping. However, as different callers are geared towards different types of structural variation, there is still no single caller that can be considered a community standard; instead, increasingly the various callers are combined in integrated pipelines. In this article, we review a wide range of callers, discuss challenges in the integration step and present a survey of pipelines used in population genomics studies. Based on our findings, we provide general recommendations on how to set-up such pipelines. Finally, we present an outlook on future challenges in structural variation detection.
Collapse
|
46
|
Duvaux L, Geissmann Q, Gharbi K, Zhou JJ, Ferrari J, Smadja CM, Butlin RK. Dynamics of copy number variation in host races of the pea aphid. Mol Biol Evol 2014; 32:63-80. [PMID: 25234705 PMCID: PMC4271520 DOI: 10.1093/molbev/msu266] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Copy number variation (CNV) makes a major contribution to overall genetic variation and is suspected to play an important role in adaptation. However, aside from a few model species, the extent of CNV in natural populations has seldom been investigated. Here, we report on CNV in the pea aphid Acyrthosiphon pisum, a powerful system for studying the genetic architecture of host-plant adaptation and speciation thanks to multiple host races forming a continuum of genetic divergence. Recent studies have highlighted the potential importance of chemosensory genes, including the gustatory and olfactory receptor gene families (Gr and Or, respectively), in the process of host race formation. We used targeted resequencing to achieve a very high depth of coverage, and thereby revealed the extent of CNV of 434 genes, including 150 chemosensory genes, in 104 individuals distributed across eight host races of the pea aphid. We found that CNV was widespread in our global sample, with a significantly higher occurrence in multigene families, especially in Ors. We also observed a decrease in the gene probability of being completely duplicated or deleted (CDD) with increase in coding sequence length. Genes with CDD variants were usually more polymorphic for copy number, especially in the P450 gene family where toxin resistance may be related to gene dosage. We found that Gr were overrepresented among genes discriminating host races, as were CDD genes and pseudogenes. Our observations shed new light on CNV dynamics and are consistent with CNV playing a role in both local adaptation and speciation.
Collapse
Affiliation(s)
- Ludovic Duvaux
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Quentin Geissmann
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Karim Gharbi
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh, Edinburgh, United Kingdom
| | - Jing-Jiang Zhou
- Department of Biological Chemistry and Crop Protection, Rothamsted Research, Harpenden, United Kingdom
| | - Julia Ferrari
- Department of Biology, University of York, York, United Kingdom
| | - Carole M Smadja
- Institut des Sciences de l'Evolution (UMR 5554), CNRS, IRD, Université Montpellier 2, Montpellier, France
| | - Roger K Butlin
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom Sven Lovén Centre for Marine Sciences-Tjärnö, University of Gothenburg, Strömstad, Sweden
| |
Collapse
|
47
|
Fan S, Meyer A. Evolution of genomic structural variation and genomic architecture in the adaptive radiations of African cichlid fishes. Front Genet 2014; 5:163. [PMID: 24917883 PMCID: PMC4042683 DOI: 10.3389/fgene.2014.00163] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Accepted: 05/15/2014] [Indexed: 12/30/2022] Open
Abstract
African cichlid fishes are an ideal system for studying explosive rates of speciation and the origin of diversity in adaptive radiation. Within the last few million years, more than 2000 species have evolved in the Great Lakes of East Africa, the largest adaptive radiation in vertebrates. These young species show spectacular diversity in their coloration, morphology and behavior. However, little is known about the genomic basis of this astonishing diversity. Recently, five African cichlid genomes were sequenced, including that of the Nile Tilapia (Oreochromis niloticus), a basal and only relatively moderately diversified lineage, and the genomes of four representative endemic species of the adaptive radiations, Neolamprologus brichardi, Astatotilapia burtoni, Metriaclima zebra, and Pundamila nyererei. Using the Tilapia genome as a reference genome, we generated a high-resolution genomic variation map, consisting of single nucleotide polymorphisms (SNPs), short insertions and deletions (indels), inversions and deletions. In total, around 18.8, 17.7, 17.0, and 17.0 million SNPs, 2.3, 2.2, 1.4, and 1.9 million indels, 262, 306, 162, and 154 inversions, and 3509, 2705, 2710, and 2634 deletions were inferred to have evolved in N. brichardi, A. burtoni, P. nyererei, and M. zebra, respectively. Many of these variations affected the annotated gene regions in the genome. Different patterns of genetic variation were detected during the adaptive radiation of African cichlid fishes. For SNPs, the highest rate of evolution was detected in the common ancestor of N. brichardi, A. burtoni, P. nyererei, and M. zebra. However, for the evolution of inversions and deletions, we found that the rates at the terminal taxa are substantially higher than the rates at the ancestral lineages. The high-resolution map provides an ideal opportunity to understand the genomic bases of the adaptive radiation of African cichlid fishes.
Collapse
Affiliation(s)
- Shaohua Fan
- Lehrstuhl für Zoologie und Evolutionsbiologie, Department of Biology, University of Konstanz Konstanz, Germany
| | - Axel Meyer
- Lehrstuhl für Zoologie und Evolutionsbiologie, Department of Biology, University of Konstanz Konstanz, Germany
| |
Collapse
|
48
|
Lucas-Lledó JI, Vicente-Salvador D, Aguado C, Cáceres M. Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm. BMC Bioinformatics 2014; 15:163. [PMID: 24884587 PMCID: PMC4055234 DOI: 10.1186/1471-2105-15-163] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 05/14/2014] [Indexed: 11/21/2022] Open
Abstract
Background Population genetics and association studies usually rely on a set of known variable sites that are then genotyped in subsequent samples, because it is easier to genotype than to discover the variation. This is also true for structural variation detected from sequence data. However, the genotypes at known variable sites can only be inferred with uncertainty from low coverage data. Thus, statistical approaches that infer genotype likelihoods, test hypotheses, and estimate population parameters without requiring accurate genotypes are becoming popular. Unfortunately, the current implementations of these methods are intended to analyse only single nucleotide and short indel variation, and they usually assume that the two alleles in a heterozygous individual are sampled with equal probability. This is generally false for structural variants detected with paired ends or split reads. Therefore, the population genetics of structural variants cannot be studied, unless a painstaking and potentially biased genotyping is performed first. Results We present svgem, an expectation-maximization implementation to estimate allele and genotype frequencies, calculate genotype posterior probabilities, and test for Hardy-Weinberg equilibrium and for population differences, from the numbers of times the alleles are observed in each individual. Although applicable to single nucleotide variation, it aims at bi-allelic structural variation of any type, observed by either split reads or paired ends, with arbitrarily high allele sampling bias. We test svgem with simulated and real data from the 1000 Genomes Project. Conclusions svgem makes it possible to use low-coverage sequencing data to study the population distribution of structural variants without having to know their genotypes. Furthermore, this advance allows the combined analysis of structural and nucleotide variation within the same genotype-free statistical framework, thus preventing biases introduced by genotype imputation.
Collapse
Affiliation(s)
- José Ignacio Lucas-Lledó
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, 08193 Bellaterra (Barcelona), Spain.
| | | | | | | |
Collapse
|
49
|
Vicens A, Tourmente M, Roldan ERS. Structural evolution of CatSper1 in rodents is influenced by sperm competition, with effects on sperm swimming velocity. BMC Evol Biol 2014; 14:106. [PMID: 24884901 PMCID: PMC4041144 DOI: 10.1186/1471-2148-14-106] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Accepted: 04/28/2014] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Competition between spermatozoa from rival males for success in fertilization (i.e., sperm competition) is an important selective force driving the evolution of male reproductive traits and promoting positive selection in genes related to reproductive function. Positive selection has been identified in reproductive proteins showing rapid divergence at nucleotide level. Other mutations, such as insertions and deletions (indels), also occur in protein-coding sequences. These structural changes, which exist in reproductive genes and result in length variation in coded proteins, could also be subjected to positive selection and be under the influence of sperm competition. Catsper1 is one such reproductive gene coding for a germ-line specific voltage-gated calcium channel essential for sperm motility and fertilization. Positive selection appears to promote fixation of indels in the N-terminal region of CatSper1 in mammalian species. However, it is not known which selective forces underlie these changes and their implications for sperm function. RESULTS We tested if length variation in the N-terminal region of CatSper1 is influenced by sperm competition intensity in a group of closely related rodent species of the subfamily Murinae. Our results revealed a negative correlation between sequence length of CatSper1 and relative testes mass, a very good proxy of sperm competition levels. Since CatSper1 is important for sperm flagellar motility, we examined if length variation in the N-terminus of CatSper1 is linked to changes in sperm swimming velocity. We found a negative correlation between CatSper1 length and several sperm velocity parameters. CONCLUSIONS Altogether, our results suggest that sperm competition selects for a shortening of the intracellular region of CatSper1 which, in turn, enhances sperm swimming velocity, an essential and adaptive trait for fertilization success.
Collapse
Affiliation(s)
| | | | - Eduardo R S Roldan
- Reproductive Ecology and Biology Group, Museo Nacional de Ciencias Naturales (CSIC), c/Jose Gutierrez Abascal 2, 28006 Madrid, Spain.
| |
Collapse
|
50
|
Yang HC, Lin CW, Chen CW, Chen JJ. Applying genome-wide gene-based expression quantitative trait locus mapping to study population ancestry and pharmacogenetics. BMC Genomics 2014; 15:319. [PMID: 24779372 PMCID: PMC4236814 DOI: 10.1186/1471-2164-15-319] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 04/15/2014] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Gene-based analysis has become popular in genomic research because of its appealing biological and statistical properties compared with those of a single-locus analysis. However, only a few, if any, studies have discussed a mapping of expression quantitative trait loci (eQTL) in a gene-based framework. Neither study has discussed ancestry-informative eQTL nor investigated their roles in pharmacogenetics by integrating single nucleotide polymorphism (SNP)-based eQTL (s-eQTL) and gene-based eQTL (g-eQTL). RESULTS In this g-eQTL mapping study, the transcript expression levels of genes (transcript-level genes; T-genes) were correlated with the SNPs of genes (sequence-level genes; S-genes) by using a method of gene-based partial least squares (PLS). Ancestry-informative transcripts were identified using a rank-score-based multivariate association test, and ancestry-informative eQTL were identified using Fisher's exact test. Furthermore, key ancestry-predictive eQTL were selected in a flexible discriminant analysis. We analyzed SNPs and gene expression of 210 independent people of African-, Asian- and European-descent. We identified numerous cis- and trans-acting g-eQTL and s-eQTL for each population by using PLS. We observed ancestry information enriched in eQTL. Furthermore, we identified 2 ancestry-informative eQTL associated with adverse drug reactions and/or drug response. Rs1045642, located on MDR1, is an ancestry-informative eQTL (P = 2.13E-13, using Fisher's exact test) associated with adverse drug reactions to amitriptyline and nortriptyline and drug responses to morphine. Rs20455, located in KIF6, is an ancestry-informative eQTL (P = 2.76E-23, using Fisher's exact test) associated with the response to statin drugs (e.g., pravastatin and atorvastatin). The ancestry-informative eQTL of drug biotransformation genes were also observed; cross-population cis-acting expression regulators included SPG7, TAP2, SLC7A7, and CYP4F2. Finally, we also identified key ancestry-predictive eQTL and established classification models with promising training and testing accuracies in separating samples from close populations. CONCLUSIONS In summary, we developed a gene-based PLS procedure and a SAS macro for identifying g-eQTL and s-eQTL. We established data archives of eQTL for global populations. The program and data archives are accessible at http://www.stat.sinica.edu.tw/hsinchou/genetics/eQTL/HapMapII.htm. Finally, the results from our investigations regarding the interrelationship between eQTL, ancestry information, and pharmacodynamics provide rich resources for future eQTL studies and practical applications in population genetics and medical genetics.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, No 128, Academia Road, Section 2, Nankang, Taipei, Taiwan.
| | | | | | | |
Collapse
|