1
|
Luo C, Liu YH, Zhou XM. VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing. Nat Commun 2024; 15:6956. [PMID: 39138168 DOI: 10.1038/s41467-024-51282-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.
Collapse
Affiliation(s)
- Can Luo
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
2
|
Zhang J, Shi Y, Yang Y, Oakeshott JG, Wu Y. Differentiation in detoxification gene complements, including neofunctionalization of duplicated cytochrome P450 genes, between lineages of cotton bollworm, Helicoverpa armigera. Mol Ecol 2024; 33:e17463. [PMID: 38984610 DOI: 10.1111/mec.17463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 06/23/2024] [Accepted: 06/26/2024] [Indexed: 07/11/2024]
Abstract
Here we investigate the evolutionary dynamics of five enzyme superfamilies (CYPs, GSTs, UGTs, CCEs and ABCs) involved in detoxification in Helicoverpa armigera. The reference assembly for an African isolate of the major lineages, H. a. armigera, has 373 genes in the five superfamilies. Most of its CYPs, GSTs, UGTs and CCEs and a few of its ABCs occur in blocks and most of the clustered genes are in subfamilies specifically implicated in detoxification. Most of the genes have orthologues in the reference genome for the Oceania lineage, H. a. conferta. However, clustered orthologues and subfamilies specifically implicated in detoxification show greater sequence divergence and less constraint on non-synonymous differences between the two assemblies than do other members of the five superfamilies. Two duplicated CYPs, which were found in the H. a. armigera but not H. a. conferta reference genome, were also missing in 16 Chinese populations spanning two different lineages of H. a. armigera. The enzyme produced by one of these duplicates has higher activity against esfenvalerate than a previously described chimeric CYP mutant conferring pyrethroid resistance. Various transposable elements were found in the introns of most detoxification genes, generating diverse gene structures. Extensive resequencing data for the Chinese H. a. armigera and H. a. conferta lineages also revealed complex copy number polymorphisms in 17 CCE001s in a cluster also implicated in pyrethroid metabolism, with substantial haplotype differences between all three lineages. Our results suggest that cotton bollworm has a versatile complement of detoxification genes which are evolving in diverse ways across its range.
Collapse
Affiliation(s)
- Jianpeng Zhang
- College of Plant Protection, Nanjing Agricultural University, Nanjing, China
- School of Wetlands, Yancheng Teachers University, Yancheng, China
| | - Yu Shi
- College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| | - Yihua Yang
- College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| | - John G Oakeshott
- Applied Biosciences, Macquarie University, Sydney, New South Wales, Australia
| | - Yidong Wu
- College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
3
|
Li H, Marin M, Farhat MR. Exploring gene content with pangene graphs. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae456. [PMID: 39041615 DOI: 10.1093/bioinformatics/btae456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/28/2024] [Accepted: 07/22/2024] [Indexed: 07/24/2024]
Abstract
MOTIVATION The gene content regulates the biology of an organism. It varies between species and between individuals of the same species. Although tools have been developed to identify gene content changes in bacterial genomes, none is applicable to collections of large eukaryotic genomes such as the human pangenome. RESULTS We developed pangene, a computational tool to identify gene orientation, gene order and gene copy-number changes in a collection of genomes. Pangene aligns a set of input protein sequences to the genomes, resolves redundancies between protein sequences and constructs a gene graph with each genome represented as a walk in the graph. It additionally finds subgraphs, which we call bibubbles, that capture gene content changes. Applied to the human pangenome, pangene identifies known gene-level variations and reveals complex haplotypes that are not well studied before. Pangene also works with high-quality bacterial pangenome and reports similar numbers of core and accessory genes in comparison to existing tools. AVAILABILITY AND IMPLEMENTATION Source code at https://github.com/lh3/pangene; pre-built pangene graphs can be downloaded from https://zenodo.org/records/8118576 and visualized at https://pangene.bioinweb.org.
Collapse
Affiliation(s)
- Heng Li
- Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Harvard Medical School, Boston, MA 02215, USA
- Broad Insitute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | - Maha Reda Farhat
- Harvard Medical School, Boston, MA 02215, USA
- Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, USA
| |
Collapse
|
4
|
Bolognini D, Halgren A, Lou RN, Raveane A, Rocha JL, Guarracino A, Soranzo N, Chin J, Garrison E, Sudmant PH. Global diversity, recurrent evolution, and recent selection on amylase structural haplotypes in humans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.07.579378. [PMID: 38370750 PMCID: PMC10871346 DOI: 10.1101/2024.02.07.579378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
The adoption of agriculture, first documented ~12,000 years ago in the Fertile Crescent, triggered a rapid shift toward starch-rich diets in human populations. Amylase genes facilitate starch digestion and increased salivary amylase copy number has been observed in some modern human populations with high starch intake, though evidence of recent selection is lacking. Here, using 52 long-read diploid assemblies and short read data from ~5,600 contemporary and ancient humans, we resolve the diversity, evolutionary history, and selective impact of structural variation at the amylase locus. We find that amylase genes have higher copy numbers in populations with agricultural subsistence compared to fishing, hunting, and pastoral groups. We identify 28 distinct amylase structural architectures and demonstrate that nearly identical structures have arisen recurrently on different haplotype backgrounds throughout recent human history. AMY1 and AMY2A genes each exhibit multiple duplications/deletions with mutation rates >10,000-fold the SNP mutation rate, whereas AMY2B gene duplications share a single origin. Using a pangenome graph-based approach to infer structural haplotypes across thousands of humans, we identify extensively duplicated haplotypes present at higher frequencies in modern day populations with traditionally agricultural diets. Leveraging 533 ancient human genomes we find that duplication-containing haplotypes (i.e. haplotypes with more amylase gene copies than the ancestral haplotype) have increased in frequency more than seven-fold over the last 12,000 years providing evidence for recent selection in West Eurasians. Together, our study highlights the potential impacts of the agricultural revolution on human genomes and the importance of long-read sequencing in identifying signatures of selection at structurally complex loci.
Collapse
Affiliation(s)
| | - Alma Halgren
- Department of Integrative Biology, University of California Berkeley, Berkeley, USA
| | - Runyang Nicolas Lou
- Department of Integrative Biology, University of California Berkeley, Berkeley, USA
| | | | - Joana L Rocha
- Department of Integrative Biology, University of California Berkeley, Berkeley, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, USA
| | | | - Jason Chin
- Foundation for Biological Data Science, Belmont, USA
| | - Erik Garrison
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, USA
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, USA
- Center for Computational Biology, University of California Berkeley, Berkeley, USA
| |
Collapse
|
5
|
Canas JJ, Arregui SW, Zhang S, Knox T, Calvert C, Saxena V, Schwaderer AL, Hains DS. DEFA1A3 DNA gene-dosage regulates the kidney innate immune response during upper urinary tract infection. Life Sci Alliance 2024; 7:e202302462. [PMID: 38580392 PMCID: PMC10997819 DOI: 10.26508/lsa.202302462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 03/26/2024] [Accepted: 03/27/2024] [Indexed: 04/07/2024] Open
Abstract
Antimicrobial peptides (AMPs) are host defense effectors with potent neutralizing and immunomodulatory functions against invasive pathogens. The AMPs α-Defensin 1-3/DEFA1A3 participate in innate immune responses and influence patient outcomes in various diseases. DNA copy-number variations in DEFA1A3 have been associated with severity and outcomes in infectious diseases including urinary tract infections (UTIs). Specifically, children with lower DNA copy numbers were more susceptible to UTIs. The mechanism of action by which α-Defensin 1-3/DEFA1A3 copy-number variations lead to UTI susceptibility remains to be explored. In this study, we use a previously characterized transgenic knock-in of the human DEFA1A3 gene mouse to dissect α-Defensin 1-3 gene dose-dependent antimicrobial and immunomodulatory roles during uropathogenic Escherichia coli (UPEC) UTI. We elucidate the relationship between kidney neutrophil- and collecting duct intercalated cell-derived α-Defensin 1-3/DEFA1A3 expression and UTI. We further describe cooperative effects between α-Defensin 1-3 and other AMPs that potentiate the neutralizing activity against UPEC. Cumulatively, we demonstrate that DEFA1A3 directly protects against UPEC meanwhile impacting pro-inflammatory innate immune responses in a gene dosage-dependent manner.
Collapse
Affiliation(s)
- Jorge J Canas
- Division of Pediatric Nephrology, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Microbiology and Immunology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Samuel W Arregui
- Division of Pediatric Nephrology, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Kidney and Urology Translational Research Center, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Shaobo Zhang
- Division of Pediatric Nephrology, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Kidney and Urology Translational Research Center, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Taylor Knox
- Kidney and Urology Translational Research Center, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Christi Calvert
- Kidney and Urology Translational Research Center, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Vijay Saxena
- Division of Pediatric Nephrology, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Kidney and Urology Translational Research Center, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Andrew L Schwaderer
- Division of Pediatric Nephrology, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Riley Hospital for Children, Indiana University Health, Indianapolis, IN, USA
- Kidney and Urology Translational Research Center, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, USA
| | - David S Hains
- Division of Pediatric Nephrology, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Microbiology and Immunology, Indiana University School of Medicine, Indianapolis, IN, USA
- Riley Hospital for Children, Indiana University Health, Indianapolis, IN, USA
- Kidney and Urology Translational Research Center, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
6
|
Arefnejad B, Zeinalabedini M, Talebi R, Mardi M, Ghaffari MR, Vahidi MF, Nekouei MK, Szmatoła T, Salekdeh GH. Unveiling the population genetic structure of Iranian horses breeds by whole-genome resequencing analysis. Mamm Genome 2024; 35:201-227. [PMID: 38520527 DOI: 10.1007/s00335-024-10035-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 02/14/2024] [Indexed: 03/25/2024]
Abstract
Preserving genetic diversity is pivotal for enhancing genetic improvement and facilitating adaptive responses to selection. This study focuses on identifying key genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs), while exploring the genomic evolutionary connectedness among seven Iranian horses representing five indigenous breeds: Caspian, Turkemen, DareShuri, Kurdish, and Asil. Using whole-genome resequencing, we generated 2.7 Gb of sequence data, with raw reads ranging from 1.2 Gb for Caspian horses to 0.38 Gb for Turkoman horses. Post-filtering, approximately 1.9 Gb of reads remained, with ~ 1.5 Gb successfully mapped to the horse reference genome (EquCab3.0), achieving mapping rates between 76.4% (Caspian) and 98.35% (Turkoman). We identified 2,909,816 SNPs in Caspian horses, constituting around 0.1% of the genome. Notably, 71% of these SNPs were situated in intergenic regions, while 8.5 and 6.8% were located upstream and downstream, respectively. A comparative analysis of SNPs between Iranian and non-Iranian horse breeds showed that Caspian horses had the lowest number of shared SNPs with Turkoman horses. Instead, they showed a closer genetic relationship with DareShuri, Quarter, Arabian, Standardbred, and Asil breeds. Hierarchical clustering highlighted Caspian horses as a distinct cluster, underscoring their distinctive genomic signature. Caspian horses exhibit a unique genetic profile marked by an enrichment of private mutations in neurological genes, influencing sensory perception and awareness. This distinct genetic makeup shapes mating preferences and signifies a separate evolutionary trajectory. Additionally, significant non-synonymous single nucleotide polymorphisms (nsSNPs) in reproductive genes offer intervention opportunities for managing Caspian horses. These findings reveal the population genetic structure of Iranian horse breeds, contributing to the advancement of knowledge in areas such as conservation, performance traits, climate adaptation, reproduction, and resistance to diseases in equine science.
Collapse
Affiliation(s)
- Babak Arefnejad
- Department of Animal Science, University of Tehran, Karaj, Iran
| | - Mehrshad Zeinalabedini
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran.
| | - Reza Talebi
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran
| | - Mohsen Mardi
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran
| | - Mohammad Reza Ghaffari
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran
| | - Mohammad Farhad Vahidi
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran
| | | | - Tomasz Szmatoła
- Centre of Experimental and Innovative Medicine, University of Agriculture in Kraków, Al. Mickiewicza 24/28, 30-059, Kraków, Poland
- Department of Animal Molecular Biology, National Research Institute of Animal Production, Krakowska 1, 32‑083, Balice, Poland
| | | |
Collapse
|
7
|
Li H, Marin M, Farhat MR. Exploring gene content with pangene graphs. ARXIV 2024:arXiv:2402.16185v3. [PMID: 38463499 PMCID: PMC10925376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Motivation The gene content regulates the biology of an organism. It varies between species and between individuals of the same species. Although tools have been developed to identify gene content changes in bacterial genomes, none is applicable to collections of large eukaryotic genomes such as the human pangenome. Results We developed pangene, a computational tool to identify gene orientation, gene order and gene copy-number changes in a collection of genomes. Pangene aligns a set of input protein sequences to the genomes, resolves redundancies between protein sequences and constructs a gene graph with each genome represented as a walk in the graph. It additionally finds subgraphs, which we call bibubbles, that capture gene content changes. Applied to the human pangenome, pangene identifies known gene-level variations and reveals complex haplotypes that are not well studied before. Pangene also works with high-quality bacterial pangenome and reports similar numbers of core and accessory genes in comparison to existing tools. Availability and implementation Source code at https://github.com/lh3/pangene; pre-built pangene graphs can be downloaded from https://zenodo.org/records/8118576 and visualized at https://pangene.bioinweb.org.
Collapse
Affiliation(s)
- Heng Li
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
- Harvard Medical School, 10 Shattuck St, Boston, MA 02215, USA
- Broad Insitute of Harvard and MIT, 415 Main St, Cambridge, MA 02142, USA
| | | | - Maha Reda Farhat
- Harvard Medical School, 10 Shattuck St, Boston, MA 02215, USA
- Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, USA
| |
Collapse
|
8
|
Yang L, Yin H, Bai L, Yao W, Tao T, Zhao Q, Gao Y, Teng J, Xu Z, Lin Q, Diao S, Pan Z, Guan D, Li B, Zhou H, Zhou Z, Zhao F, Wang Q, Pan Y, Zhang Z, Li K, Fang L, Liu GE. Mapping and functional characterization of structural variation in 1060 pig genomes. Genome Biol 2024; 25:116. [PMID: 38715020 PMCID: PMC11075355 DOI: 10.1186/s13059-024-03253-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Accepted: 04/19/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Structural variations (SVs) have significant impacts on complex phenotypes by rearranging large amounts of DNA sequence. RESULTS We present a comprehensive SV catalog based on the whole-genome sequence of 1060 pigs (Sus scrofa) representing 101 breeds, covering 9.6% of the pig genome. This catalog includes 42,487 deletions, 37,913 mobile element insertions, 3308 duplications, 1664 inversions, and 45,184 break ends. Estimates of breed ancestry and hybridization using genotyped SVs align well with those from single nucleotide polymorphisms. Geographically stratified deletions are observed, along with known duplications of the KIT gene, responsible for white coat color in European pigs. Additionally, we identify a recent SINE element insertion in MYO5A transcripts of European pigs, potentially influencing alternative splicing patterns and coat color alterations. Furthermore, a Yorkshire-specific copy number gain within ABCG2 is found, impacting chromatin interactions and gene expression across multiple tissues over a stretch of genomic region of ~200 kb. Preliminary investigations into SV's impact on gene expression and traits using the Pig Genotype-Tissue Expression (PigGTEx) data reveal SV associations with regulatory variants and gene-trait pairs. For instance, a 51-bp deletion is linked to the lead eQTL of the lipid metabolism regulating gene FADS3, whose expression in embryo may affect loin muscle area, as revealed by our transcriptome-wide association studies. CONCLUSIONS This SV catalog serves as a valuable resource for studying diversity, evolutionary history, and functional shaping of the pig genome by processes like domestication, trait-based breeding, and adaptive evolution.
Collapse
Affiliation(s)
- Liu Yang
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | - Hongwei Yin
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lijing Bai
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Wenye Yao
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Tan Tao
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Qianyi Zhao
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Yahui Gao
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | - Jinyan Teng
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhiting Xu
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Qing Lin
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Shuqi Diao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhangyuan Pan
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Dailu Guan
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Bingjie Li
- Animal and Veterinary Sciences, Scotland's Rural College (SRUC), Roslin Institute Building, Easter Bush, Midlothian, EH25 9RG, United Kingdom
| | - Huaijun Zhou
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Zhongyin Zhou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Fuping Zhao
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Kui Li
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
| |
Collapse
|
9
|
Pokrovac I, Rohner N, Pezer Ž. The prevalence of copy number increase at multiallelic copy number variants associated with cave colonization. Mol Ecol 2024; 33:e17339. [PMID: 38556927 DOI: 10.1111/mec.17339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 03/16/2024] [Accepted: 03/22/2024] [Indexed: 04/02/2024]
Abstract
Copy number variation is a common contributor to phenotypic diversity, yet its involvement in ecological adaptation is not easily discerned. Instances of parallelly evolving populations of the same species in a similar environment marked by strong selective pressures present opportunities to study the role of copy number variants (CNVs) in adaptation. By identifying CNVs that repeatedly occur in multiple populations of the derived ecotype and are not (or are rarely) present in the populations of the ancestral ecotype, the association of such CNVs with adaptation to the novel environment can be inferred. We used this paradigm to identify CNVs associated with recurrent adaptation of the Mexican tetra (Astyanax mexicanus) to cave environment. Using a read-depth approach, we detected CNVs from previously re-sequenced genomes of 44 individuals belonging to two ancestral surfaces and three derived cave populations. We identified 102 genes and 292 genomic regions that repeatedly diverge in copy number between the two ecotypes and occupy 0.8% of the reference genome. Functional analysis revealed their association with processes previously recognized to be relevant for adaptation, such as vision, immunity, oxygen consumption, metabolism, and neural function and we propose that these variants have been selected for in the cave or surface waters. The majority of the ecotype-divergent CNVs are multiallelic and display copy number increases in cavefish compared to surface fish. Our findings suggest that multiallelic CNVs - including gene duplications - and divergence in copy number provide a fast route to produce novel phenotypes associated with adaptation to subterranean life.
Collapse
Affiliation(s)
| | - Nicolas Rohner
- Stowers Institute for Medical Research, Kansas City, Missouri, USA
| | | |
Collapse
|
10
|
Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
Collapse
|
11
|
Danaeifar M, Najafi A. Artificial Intelligence and Computational Biology in Gene Therapy: A Review. Biochem Genet 2024:10.1007/s10528-024-10799-1. [PMID: 38635012 DOI: 10.1007/s10528-024-10799-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024]
Abstract
One of the trending fields in almost all areas of science and technology is artificial intelligence. Computational biology and artificial intelligence can help gene therapy in many steps including: gene identification, gene editing, vector design, development of new macromolecules and modeling of gene delivery. There are various tools used by computational biology and artificial intelligence in this field, such as genomics, transcriptomic and proteomics data analysis, machine learning algorithms and molecular interaction studies. These tools can introduce new gene targets, novel vectors, optimized experiment conditions, predict the outcomes and suggest the best solutions to avoid undesired immune responses following gene therapy treatment.
Collapse
Affiliation(s)
- Mohsen Danaeifar
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Science, P.O. Box 19395-5487, Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Science, P.O. Box 19395-5487, Tehran, Iran.
| |
Collapse
|
12
|
Paus T. Population Neuroscience: Principles and Advances. Curr Top Behav Neurosci 2024. [PMID: 38589637 DOI: 10.1007/7854_2024_474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
In population neuroscience, three disciplines come together to advance our knowledge of factors that shape the human brain: neuroscience, genetics, and epidemiology (Paus, Human Brain Mapping 31:891-903, 2010). Here, I will come back to some of the background material reviewed in more detail in our previous book (Paus, Population Neuroscience, 2013), followed by a brief overview of current advances and challenges faced by this integrative approach.
Collapse
Affiliation(s)
- Tomáš Paus
- Department of Psychiatry and Neuroscience, Faculty of Medicine, University of Montreal, Montreal, QC, Canada
| |
Collapse
|
13
|
Kalnapenkis A, Jõeloo M, Lepik K, Kukuškina V, Kals M, Alasoo K, Mägi R, Esko T, Võsa U. Genetic determinants of plasma protein levels in the Estonian population. Sci Rep 2024; 14:7694. [PMID: 38565889 PMCID: PMC10987560 DOI: 10.1038/s41598-024-57966-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 03/23/2024] [Indexed: 04/04/2024] Open
Abstract
The proteome holds great potential as an intermediate layer between the genome and phenome. Previous protein quantitative trait locus studies have focused mainly on describing the effects of common genetic variations on the proteome. Here, we assessed the impact of the common and rare genetic variations as well as the copy number variants (CNVs) on 326 plasma proteins measured in up to 500 individuals. We identified 184 cis and 94 trans signals for 157 protein traits, which were further fine-mapped to credible sets for 101 cis and 87 trans signals for 151 proteins. Rare genetic variation contributed to the levels of 7 proteins, with 5 cis and 14 trans associations. CNVs were associated with the levels of 11 proteins (7 cis and 5 trans), examples including a 3q12.1 deletion acting as a hub for multiple trans associations; and a CNV overlapping NAIP, a sensor component of the NAIP-NLRC4 inflammasome which is affecting pro-inflammatory cytokine interleukin 18 levels. In summary, this work presents a comprehensive resource of genetic variation affecting the plasma protein levels and provides the interpretation of identified effects.
Collapse
Affiliation(s)
- Anette Kalnapenkis
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia.
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia.
| | - Maarja Jõeloo
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Kaido Lepik
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| | - Viktorija Kukuškina
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Mart Kals
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Reedik Mägi
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Tõnu Esko
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia.
| | - Urmo Võsa
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia.
| |
Collapse
|
14
|
Hujoel MLA, Handsaker RE, Sherman MA, Kamitaki N, Barton AR, Mukamel RE, Terao C, McCarroll SA, Loh PR. Protein-altering variants at copy number-variable regions influence diverse human phenotypes. Nat Genet 2024; 56:569-578. [PMID: 38548989 PMCID: PMC11018521 DOI: 10.1038/s41588-024-01684-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 02/08/2024] [Indexed: 04/09/2024]
Abstract
Copy number variants (CNVs) are among the largest genetic variants, yet CNVs have not been effectively ascertained in most genetic association studies. Here we ascertained protein-altering CNVs from UK Biobank whole-exome sequencing data (n = 468,570) using haplotype-informed methods capable of detecting subexonic CNVs and variation within segmental duplications. Incorporating CNVs into analyses of rare variants predicted to cause gene loss of function (LOF) identified 100 associations of predicted LOF variants with 41 quantitative traits. A low-frequency partial deletion of RGL3 exon 6 conferred one of the strongest protective effects of gene LOF on hypertension risk (odds ratio = 0.86 (0.82-0.90)). Protein-coding variation in rapidly evolving gene families within segmental duplications-previously invisible to most analysis methods-generated some of the human genome's largest contributions to variation in type 2 diabetes risk, chronotype and blood cell traits. These results illustrate the potential for new genetic insights from genomic variation that has escaped large-scale analysis to date.
Collapse
Affiliation(s)
- Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Serinus Biosciences Inc., New York, NY, USA
| | - Nolan Kamitaki
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
15
|
Linderman MD, Wallace J, van der Heyde A, Wieman E, Brey D, Shi Y, Hansen P, Shamsi Z, Liu J, Gelb BD, Bashir A. NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data. Bioinformatics 2024; 40:btae129. [PMID: 38444093 PMCID: PMC10955255 DOI: 10.1093/bioinformatics/btae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/15/2024] [Accepted: 03/04/2024] [Indexed: 03/07/2024] Open
Abstract
MOTIVATION Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation. RESULTS NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs. AVAILABILITY AND IMPLEMENTATION Python/C++ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Jacob Wallace
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Alderik van der Heyde
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Eliza Wieman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Daniel Brey
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Yiran Shi
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Peter Hansen
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | | | | | - Bruce D Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Ali Bashir
- Google, Mountain View, CA 94043, United States
| |
Collapse
|
16
|
Ling E, Nemesh J, Goldman M, Kamitaki N, Reed N, Handsaker RE, Genovese G, Vogelgsang JS, Gerges S, Kashin S, Ghosh S, Esposito JM, Morris K, Meyer D, Lutservitz A, Mullally CD, Wysoker A, Spina L, Neumann A, Hogan M, Ichihara K, Berretta S, McCarroll SA. A concerted neuron-astrocyte program declines in ageing and schizophrenia. Nature 2024; 627:604-611. [PMID: 38448582 PMCID: PMC10954558 DOI: 10.1038/s41586-024-07109-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 01/23/2024] [Indexed: 03/08/2024]
Abstract
Human brains vary across people and over time; such variation is not yet understood in cellular terms. Here we describe a relationship between people's cortical neurons and cortical astrocytes. We used single-nucleus RNA sequencing to analyse the prefrontal cortex of 191 human donors aged 22-97 years, including healthy individuals and people with schizophrenia. Latent-factor analysis of these data revealed that, in people whose cortical neurons more strongly expressed genes encoding synaptic components, cortical astrocytes more strongly expressed distinct genes with synaptic functions and genes for synthesizing cholesterol, an astrocyte-supplied component of synaptic membranes. We call this relationship the synaptic neuron and astrocyte program (SNAP). In schizophrenia and ageing-two conditions that involve declines in cognitive flexibility and plasticity1,2-cells divested from SNAP: astrocytes, glutamatergic (excitatory) neurons and GABAergic (inhibitory) neurons all showed reduced SNAP expression to corresponding degrees. The distinct astrocytic and neuronal components of SNAP both involved genes in which genetic risk factors for schizophrenia were strongly concentrated. SNAP, which varies quantitatively even among healthy people of similar age, may underlie many aspects of normal human interindividual differences and may be an important point of convergence for multiple kinds of pathophysiology.
Collapse
Affiliation(s)
- Emi Ling
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - James Nemesh
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Melissa Goldman
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Nolan Kamitaki
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Nora Reed
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Robert E Handsaker
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Giulio Genovese
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jonathan S Vogelgsang
- McLean Hospital, Belmont, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
| | - Sherif Gerges
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Seva Kashin
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Sulagna Ghosh
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | | | | | - Daniel Meyer
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Alyssa Lutservitz
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Christopher D Mullally
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Alec Wysoker
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Liv Spina
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Anna Neumann
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Marina Hogan
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Kiku Ichihara
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Sabina Berretta
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- McLean Hospital, Belmont, MA, USA.
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA.
- Program in Neuroscience, Harvard Medical School, Boston, MA, USA.
| | - Steven A McCarroll
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
17
|
Cerdán-Vélez D, Tress ML. The T2T-CHM13 reference assembly uncovers essential WASH1 and GPRIN2 paralogues. BIOINFORMATICS ADVANCES 2024; 4:vbae029. [PMID: 38464973 PMCID: PMC10924726 DOI: 10.1093/bioadv/vbae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/02/2024] [Accepted: 02/26/2024] [Indexed: 03/12/2024]
Abstract
Summary The recently published T2T-CHM13 reference assembly completed the annotation of the final 8% of the human genome. It introduced 1956 genes, close to 100 of which are predicted to be coding because they have a protein coding parent gene. Here, we confirm the coding status and functional relevance of two of these genes, paralogues of WASHC1 and GPRIN2. We find that LOC124908094, one of four novel subtelomeric WASH1 genes uncovered in the new assembly, produces the WASH1 protein that forms part of the vital actin-regulatory WASH complex. Its coding status is supported by abundant proteomics, conservation, and cDNA evidence. It was previously assumed that gene WASHC1 produced the functional WASH1 protein, but new evidence shows that WASHC1 is a human-derived duplication and likely to be one of 12 WASH1 pseudogenes in the human gene set. We also find that the T2T-CHM13 assembly has added a functionally important copy of GPRIN2 to the human gene set. We demonstrate that uniquely mapping peptides from proteomics databases support the novel LOC124900631 rather than the GRCh38 assembly GPRIN2 gene. These new additions to the set of human coding genes underlines the importance of the new T2T-CHM13 assembly. Availability and implementation None.
Collapse
Affiliation(s)
- Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Michael Liam Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| |
Collapse
|
18
|
Wang H, Chang TS, Dombroski BA, Cheng PL, Si YQ, Tucci A, Patil V, Valiente-Banuet L, Farrell K, Mclean C, Molina-Porcel L, Alex R, Paul De Deyn P, Le Bastard N, Gearing M, Donker Kaat L, Van Swieten JC, Dopper E, Ghetti BF, Newell KL, Troakes C, G de Yébenes J, Rábano-Gutierrez A, Meller T, Oertel WH, Respondek G, Stamelou M, Arzberger T, Roeber S, Müller U, Hopfner F, Pastor P, Brice A, Durr A, Ber IL, Beach TG, Serrano GE, Hazrati LN, Litvan I, Rademakers R, Ross OA, Galasko D, Boxer AL, Miller BL, Seeley WW, Van Deerlin VM, Lee EB, White CL, Morris HR, de Silva R, Crary JF, Goate AM, Friedman JS, Leung YY, Coppola G, Naj AC, Wang LS, Dickson DW, Höglinger GU, Tzeng JY, Geschwind DH, Schellenberg GD, Lee WP. Association of Structural Forms of 17q21.31 with the Risk of Progressive Supranuclear Palsy and MAPT Sub-haplotypes. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.26.24303379. [PMID: 38464214 PMCID: PMC10925353 DOI: 10.1101/2024.02.26.24303379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Importance The chromosome 17q21.31 region, containing a 900 Kb inversion that defines H1 and H2 haplotypes, represents the strongest genetic risk locus in progressive supranuclear palsy (PSP). In addition to H1 and H2, various structural forms of 17q21.31, characterized by the copy number of α, β, and γ duplications, have been identified. However, the specific effect of each structural form on the risk of PSP has never been evaluated in a large cohort study. Objective To assess the association of different structural forms of 17q.21.31, defined by the copy numbers of α, β, and γ duplications, with the risk of PSP and MAPT sub-haplotypes. Design setting and participants Utilizing whole genome sequencing data of 1,684 (1,386 autopsy confirmed) individuals with PSP and 2,392 control subjects, a case-control study was conducted to investigate the association of copy numbers of α, β, and γ duplications and structural forms of 17q21.31 with the risk of PSP. All study subjects were selected from the Alzheimer's Disease Sequencing Project (ADSP) Umbrella NG00067.v7. Data were analyzed between March 2022 and November 2023. Main outcomes and measures The main outcomes were the risk (odds ratios [ORs]) for PSP with 95% CIs. Risks for PSP were evaluated by logistic regression models. Results The copy numbers of α and β were associated with the risk of PSP only due to their correlation with H1 and H2, while the copy number of γ was independently associated with the increased risk of PSP. Each additional duplication of γ was associated with 1.10 (95% CI, 1.04-1.17; P = 0.0018) fold of increased risk of PSP when conditioning H1 and H2. For the H1 haplotype, addition γ duplications displayed a higher odds ratio for PSP: the odds ratio increases from 1.21 (95%CI 1.10-1.33, P = 5.47 × 10-5) for H1β1γ1 to 1.29 (95%CI 1.16-1.43, P = 1.35 × 10-6) for H1β1γ2, 1.45 (95%CI 1.27-1.65, P = 3.94 × 10-8) for H1β1γ3, and 1.57 (95%CI 1.10-2.26, P = 1.35 × 10-2) for H1β1γ4. Moreover, H1β1γ3 is in linkage disequilibrium with H1c (R2 = 0.31), a widely recognized MAPT sub-haplotype associated with increased risk of PSP. The proportion of MAPT sub-haplotypes associated with increased risk of PSP (i.e., H1c, H1d, H1g, H1o, and H1h) increased from 34% in H1β1γ1 to 77% in H1β1γ4. Conclusions and relevance This study revealed that the copy number of γ was associated with the risk of PSP independently from H1 and H2. The H1 haplotype with more γ duplications showed a higher odds ratio for PSP and were associated with MAPT sub-haplotypes with increased risk of PSP. These findings expand our understanding of how the complex structure at 17q21.31 affect the risk of PSP.
Collapse
Affiliation(s)
- Hui Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Timothy S Chang
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Beth A Dombroski
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Po-Liang Cheng
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ya-Qin Si
- Bioinformatics Research Center, North Carolina State University, NC, USA
| | - Albert Tucci
- Bioinformatics Research Center, North Carolina State University, NC, USA
| | - Vishakha Patil
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Leopoldo Valiente-Banuet
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kurt Farrell
- Department of Pathology, Department of Artificial Intelligence & Human Health, Nash Family, Department of Neuroscience, Ronald M. Loeb Center for Alzheimer’s Disease, Friedman Brain, Institute, Neuropathology Brain Bank & Research CoRE, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Catriona Mclean
- Victorian Brain Bank, The Florey Institute of Neuroscience and Mental Health, Parkville, Victoria, Australia
| | - Laura Molina-Porcel
- Alzheimer’s disease and other cognitive disorders unit. Neurology Service, Hospital Clínic, Fundació Recerca Clínic Barcelona (FRCB). Institut d’Investigacions Biomediques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
- Neurological Tissue Bank of the Biobanc-Hospital Clínic-IDIBAPS, Barcelona, Spain
| | - Rajput Alex
- Movement Disorders Program, Division of Neurology, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Peter Paul De Deyn
- Laboratory of Neurochemistry and Behavior, Experimental Neurobiology Unit, University of Antwerp, Wilrijk (Antwerp), Belgium
- Department of Neurology, University Medical Center Groningen, NL-9713 AV Groningen, Netherlands
| | | | - Marla Gearing
- Department of Pathology and Laboratory Medicine and Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA
| | | | | | - Elise Dopper
- Netherlands Brain Bank and Erasmus University, Netherlands
| | - Bernardino F Ghetti
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Kathy L Newell
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Claire Troakes
- London Neurodegenerative Diseases Brain Bank, King’s College London, London, UK
| | | | - Alberto Rábano-Gutierrez
- Fundación CIEN (Centro de Investigación de Enfermedades Neurológicas) - Centro Alzheimer Fundación Reina Sofía, Madrid, Spain
| | - Tina Meller
- Department of Neurology, Philipps-Universität, Marburg, Germany
| | | | - Gesine Respondek
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
| | - Maria Stamelou
- Parkinson’s disease and Movement Disorders Department, HYGEIA Hospital, Athens, Greece
- European University of Cyprus, Nicosia, Cyprus
| | - Thomas Arzberger
- Department of Psychiatry and Psychotherapy, University Hospital Munich, Ludwig-Maximilians-University Munich, Germany
- Center for Neuropathology and Prion Research, Ludwig-Maximilians-University Munich, Germany
| | | | | | - Franziska Hopfner
- Department of Neurology, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) München; German Center for Neurodegenerative Diseases (DZNE), Munich, Germany; and Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Pau Pastor
- Unit of Neurodegenerative diseases, Department of Neurology, University Hospital Germans Trias i Pujol, Badalona, Barcelona, Spain
- Neurosciences, The Germans Trias i Pujol Research Institute (IGTP) Badalona, Badalona, Spain
| | - Alexis Brice
- Sorbonne Université, Paris Brain Institute – Institut du Cerveau – ICM, Inserm U1127, CNRS UMR 7225, APHP - Hôpital Pitié-Salpêtrière, Paris, France
| | - Alexandra Durr
- Sorbonne Université, Paris Brain Institute – Institut du Cerveau – ICM, Inserm U1127, CNRS UMR 7225, APHP - Hôpital Pitié-Salpêtrière, Paris, France
| | - Isabelle Le Ber
- Sorbonne Université, Paris Brain Institute – Institut du Cerveau – ICM, Inserm U1127, CNRS UMR 7225, APHP - Hôpital Pitié-Salpêtrière, Paris, France
| | | | | | | | - Irene Litvan
- Department of Neuroscience, University of California, San Diego, CA, USA
| | - Rosa Rademakers
- VIB Center for Molecular Neurology, University of Antwerp, Belgium
- Department of Neuroscience, Mayo Clinic Jacksonville, FL, USA
| | - Owen A Ross
- Department of Neuroscience, Mayo Clinic Jacksonville, FL, USA
| | - Douglas Galasko
- Department of Neuroscience, University of California, San Diego, CA, USA
| | - Adam L Boxer
- Memory and Aging Center, University of California, San Francisco, CA, USA
| | - Bruce L Miller
- Memory and Aging Center, University of California, San Francisco, CA, USA
| | - Willian W Seeley
- Memory and Aging Center, University of California, San Francisco, CA, USA
| | - Vivianna M Van Deerlin
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Edward B Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Center for Neurodegenerative Disease Research, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | - Charles L White
- University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Huw R Morris
- Departmento of Clinical and Movement Neuroscience, University College of London, London, UK
| | - Rohan de Silva
- Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, London, UK
| | - John F Crary
- Department of Pathology, Department of Artificial Intelligence & Human Health, Nash Family, Department of Neuroscience, Ronald M. Loeb Center for Alzheimer’s Disease, Friedman Brain, Institute, Neuropathology Brain Bank & Research CoRE, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alison M Goate
- Department of Genetics and Genomic Sciences, New York, NY, USA; Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jeffrey S Friedman
- Friedman Bioventure, Inc., Del Mar, CA, USA: Department of Genetics and Genomic Sciences, New York, NY, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Giovanni Coppola
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Psychiatry, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, USA
| | - Adam C Naj
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | | | - Günter U Höglinger
- Department of Neurology, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) München; German Center for Neurodegenerative Diseases (DZNE), Munich, Germany; and Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, NC, USA
- Department of Statistics, North Carolina State University, NC, USA
| | - Daniel H Geschwind
- Movement Disorders Programs, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
19
|
Edwards MM, Wang N, Massey DJ, Bhatele S, Egli D, Koren A. Incomplete reprogramming of DNA replication timing in induced pluripotent stem cells. Cell Rep 2024; 43:113664. [PMID: 38194345 PMCID: PMC11231959 DOI: 10.1016/j.celrep.2023.113664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/27/2023] [Accepted: 12/21/2023] [Indexed: 01/10/2024] Open
Abstract
Induced pluripotent stem cells (iPSCs) are the foundation of cell therapy. Differences in gene expression, DNA methylation, and chromatin conformation, which could affect differentiation capacity, have been identified between iPSCs and embryonic stem cells (ESCs). Less is known about whether DNA replication timing, a process linked to both genome regulation and genome stability, is efficiently reprogrammed to the embryonic state. To answer this, we compare genome-wide replication timing between ESCs, iPSCs, and cells reprogrammed by somatic cell nuclear transfer (NT-ESCs). While NT-ESCs replicate their DNA in a manner indistinguishable from ESCs, a subset of iPSCs exhibits delayed replication at heterochromatic regions containing genes downregulated in iPSCs with incompletely reprogrammed DNA methylation. DNA replication delays are not the result of gene expression or DNA methylation aberrations and persist after cells differentiate to neuronal precursors. Thus, DNA replication timing can be resistant to reprogramming and influence the quality of iPSCs.
Collapse
Affiliation(s)
- Matthew M Edwards
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Ning Wang
- Department of Pediatrics and Naomi Berrie Diabetes Center, Columbia University, New York, NY 10032, USA; Columbia University Stem Cell Initiative, New York, NY 10032, USA
| | - Dashiell J Massey
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Sakshi Bhatele
- Department of Pediatrics and Naomi Berrie Diabetes Center, Columbia University, New York, NY 10032, USA; Columbia University Stem Cell Initiative, New York, NY 10032, USA
| | - Dieter Egli
- Department of Pediatrics and Naomi Berrie Diabetes Center, Columbia University, New York, NY 10032, USA; Columbia University Stem Cell Initiative, New York, NY 10032, USA.
| | - Amnon Koren
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA; Department of Molecular and Cellular Biology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263, USA.
| |
Collapse
|
20
|
Ling E, Nemesh J, Goldman M, Kamitaki N, Reed N, Handsaker RE, Genovese G, Vogelgsang JS, Gerges S, Kashin S, Ghosh S, Esposito JM, French K, Meyer D, Lutservitz A, Mullally CD, Wysoker A, Spina L, Neumann A, Hogan M, Ichihara K, Berretta S, McCarroll SA. Concerted neuron-astrocyte gene expression declines in aging and schizophrenia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.07.574148. [PMID: 38260461 PMCID: PMC10802483 DOI: 10.1101/2024.01.07.574148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Human brains vary across people and over time; such variation is not yet understood in cellular terms. Here we describe a striking relationship between people's cortical neurons and cortical astrocytes. We used single-nucleus RNA-seq to analyze the prefrontal cortex of 191 human donors ages 22-97 years, including healthy individuals and persons with schizophrenia. Latent-factor analysis of these data revealed that in persons whose cortical neurons more strongly expressed genes for synaptic components, cortical astrocytes more strongly expressed distinct genes with synaptic functions and genes for synthesizing cholesterol, an astrocyte-supplied component of synaptic membranes. We call this relationship the Synaptic Neuron-and-Astrocyte Program (SNAP). In schizophrenia and aging - two conditions that involve declines in cognitive flexibility and plasticity 1,2 - cells had divested from SNAP: astrocytes, glutamatergic (excitatory) neurons, and GABAergic (inhibitory) neurons all reduced SNAP expression to corresponding degrees. The distinct astrocytic and neuronal components of SNAP both involved genes in which genetic risk factors for schizophrenia were strongly concentrated. SNAP, which varies quantitatively even among healthy persons of similar age, may underlie many aspects of normal human interindividual differences and be an important point of convergence for multiple kinds of pathophysiology.
Collapse
Affiliation(s)
- Emi Ling
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - James Nemesh
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Melissa Goldman
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Nolan Kamitaki
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Nora Reed
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Robert E. Handsaker
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Giulio Genovese
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Jonathan S. Vogelgsang
- McLean Hospital, Belmont, MA 02478, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA 02215, USA
| | - Sherif Gerges
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Seva Kashin
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Sulagna Ghosh
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | | | | | - Daniel Meyer
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Alyssa Lutservitz
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Christopher D. Mullally
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Alec Wysoker
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Liv Spina
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Anna Neumann
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Marina Hogan
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Kiku Ichihara
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Sabina Berretta
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- McLean Hospital, Belmont, MA 02478, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA 02215, USA
- Program in Neuroscience, Harvard Medical School, Boston, MA 02215, USA
| | - Steven A. McCarroll
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
21
|
Marin WM, Augusto DG, Wade KJ, Hollenbach JA. High-throughput complement component 4 genomic sequence analysis with C4Investigator. HLA 2024; 103:e15273. [PMID: 37899688 PMCID: PMC11099535 DOI: 10.1111/tan.15273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 09/01/2023] [Accepted: 10/13/2023] [Indexed: 10/31/2023]
Abstract
The complement component 4 gene loci, composed of the C4A and C4B genes and located on chromosome 6, encodes for complement component 4 (C4) proteins, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. C4A and C4B gene loci exhibit copy number variation, with each composite gene varying between 0 and 5 copies per haplotype. C4A and C4B genes also vary in size depending on the presence of the human endogenous retrovirus (HERV) in intron 9, denoted by C4(L) for long-form and C4(S) for short-form, which affects expression and is found in both C4A and C4B. Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. C4A and C4B copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high-throughput genomic sequence analysis of C4A and C4B variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high-throughput characterization of human C4A and C4B sequences from short-read sequencing data, named C4Investigator. Using paired-end targeted or whole genome sequence data as input, C4Investigator determines the overall gene copy numbers, as well as C4A, C4B, C4(Rodger), C4(Ch), C4(L), and C4(S). Additionally, C4Ivestigator reports the full overall C4A and C4B aligned sequence, enabling nucleotide level analysis. To demonstrate the utility of this workflow we have analyzed C4A and C4B variation in the 1000 Genomes Project Data set, showing that these genes are highly poly-allelic with many variants that have the potential to impact C4 protein function.
Collapse
Affiliation(s)
- Wesley M. Marin
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Danillo G. Augusto
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
- Department of Biological Sciences, University of North Carolina Charlotte, Charlotte, NC, United States
- Programa de Pós-Graduação em Genética, Universidade Federal do Paraná, Curitiba, Brazil
| | - Kristen J. Wade
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Jill A. Hollenbach
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, United States
| |
Collapse
|
22
|
Schmitz D, Li Z, Lo Faro V, Rask-Andersen M, Ameur A, Rafati N, Johansson Å. Copy number variations and their effect on the plasma proteome. Genetics 2023; 225:iyad179. [PMID: 37793096 PMCID: PMC10697815 DOI: 10.1093/genetics/iyad179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 08/25/2023] [Accepted: 09/15/2023] [Indexed: 10/06/2023] Open
Abstract
Structural variations, including copy number variations (CNVs), affect around 20 million bases in the human genome and are common causes of rare conditions. CNVs are rarely investigated in complex disease research because most CNVs are not targeted on the genotyping arrays or the reference panels for genetic imputation. In this study, we characterize CNVs in a Swedish cohort (N = 1,021) using short-read whole-genome sequencing (WGS) and use long-read WGS for validation in a subcohort (N = 15), and explore their effect on 438 plasma proteins. We detected 184,182 polymorphic CNVs and identified 15 CNVs to be associated with 16 proteins (P < 8.22×10-10). Of these, 5 CNVs could be perfectly validated using long-read sequencing, including a CNV which was associated with measurements of the osteoclast-associated immunoglobulin-like receptor (OSCAR) and located upstream of OSCAR, a gene important for bone health. Two other CNVs were identified to be clusters of many short repetitive elements and another represented a complex rearrangement including an inversion. Our findings provide insights into the structure of common CNVs and their effects on the plasma proteome, and highlights the importance of investigating common CNVs, also in relation to complex diseases.
Collapse
Affiliation(s)
- Daniel Schmitz
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Zhiwei Li
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Valeria Lo Faro
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Mathias Rask-Andersen
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| | - Nima Rafati
- Department of Medical Biochemistry and Microbiology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Box 582, 751 23 Uppsala, Sweden
| | - Åsa Johansson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Box 815, 751 08 Uppsala, Sweden
| |
Collapse
|
23
|
Yang P, Wang G, Jiang S, Chen M, Zeng J, Pang Q, Du D, Zhou M. Comparative analysis of genome-wide copy number variations between Tibetan sheep and White Suffolk sheep. Anim Biotechnol 2023; 34:986-993. [PMID: 34865600 DOI: 10.1080/10495398.2021.2007937] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
The DNA copy number variations (CNVs) are widely involved in affecting various kinds of biological functions, such as environmental adaptation. Tibetan sheep and White Suffolk sheep are two representative indigenous and exotic breeds raised in Sichuan, China, and both of them have many contrasting biological characteristics. In this study, we employed high-throughput sequencing approach to investigate genome-wide CNVs between the two sheep breeds. A total of 11,135 CNV regions (CNVRs) consisting of 6,488 deletions and 4,647 duplications were detected, whose length ranged from 1,599 bp to 0.56 Mb with the mean of 4,658 bp. There were 281 CNVRs segregated between Tibetan sheep and White Suffolk sheep, and 18 of them have been fixed within both breeds. Functional analyses of candidate genes within the segregating CNVRs revealed the thyroid hormone signaling pathway and CTNNB1 gene that would be responsible for differential biological characteristics of breeds, such as energy metabolism, seasonal reproduction, and litter size. Furthermore, the segregating CNVRs identified in this study were overlapped with many known quantitative trait loci that are associated with growth, testis weight, and reproductive seasonality. In conclusion, these results help us better understanding differential biological characteristics between Tibetan sheep and White Suffolk sheep.
Collapse
Affiliation(s)
- Pinggui Yang
- Institute of Plateau Animals, Sichuan Academy of Grassland Sciences, Chengdu, China
| | - Gaofu Wang
- Chongqing Academy of Animal Sciences, Chongqing, China
| | - Shihai Jiang
- Institute of Plateau Animals, Sichuan Academy of Grassland Sciences, Chengdu, China
| | - Minghua Chen
- Institute of Plateau Animals, Sichuan Academy of Grassland Sciences, Chengdu, China
| | - Jie Zeng
- Institute of Plateau Animals, Sichuan Academy of Grassland Sciences, Chengdu, China
| | - Qian Pang
- Institute of Plateau Animals, Sichuan Academy of Grassland Sciences, Chengdu, China
| | - Dan Du
- Institute of Plateau Animals, Sichuan Academy of Grassland Sciences, Chengdu, China
| | - Mingliang Zhou
- Institute of Plateau Animals, Sichuan Academy of Grassland Sciences, Chengdu, China
| |
Collapse
|
24
|
Steyaert W, Haer-Wigman L, Pfundt R, Hellebrekers D, Steehouwer M, Hampstead J, de Boer E, Stegmann A, Yntema H, Kamsteeg EJ, Brunner H, Hoischen A, Gilissen C. Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation. Nat Commun 2023; 14:6845. [PMID: 37891200 PMCID: PMC10611741 DOI: 10.1038/s41467-023-42531-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 10/13/2023] [Indexed: 10/29/2023] Open
Abstract
The short lengths of short-read sequencing reads challenge the analysis of paralogous genomic regions in exome and genome sequencing data. Most genetic variants within these homologous regions therefore remain unidentified in standard analyses. Here, we present a method (Chameleolyser) that accurately identifies single nucleotide variants and small insertions/deletions (SNVs/Indels), copy number variants and ectopic gene conversion events in duplicated genomic regions using whole-exome sequencing data. Application to a cohort of 41,755 exome samples yields 20,432 rare homozygous deletions and 2,529,791 rare SNVs/Indels, of which we show that 338,084 are due to gene conversion events. None of the SNVs/Indels are detectable using regular analysis techniques. Validation by high-fidelity long-read sequencing in 20 samples confirms >88% of called variants. Focusing on variation in known disease genes leads to a direct molecular diagnosis in 25 previously undiagnosed patients. Our method can readily be applied to existing exome data.
Collapse
Affiliation(s)
- Wouter Steyaert
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Nijmegen, Netherlands
| | - Lonneke Haer-Wigman
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
| | - Rolph Pfundt
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
| | - Debby Hellebrekers
- Maastricht University Medical Center + , Department of Clinical Genetics, Maastricht, Netherlands
| | - Marloes Steehouwer
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
| | - Juliet Hampstead
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
| | - Elke de Boer
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands
| | - Alexander Stegmann
- Maastricht University Medical Center + , Department of Clinical Genetics, Maastricht, Netherlands
| | - Helger Yntema
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
| | - Erik-Jan Kamsteeg
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
| | - Han Brunner
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
- Maastricht University Medical Center + , Department of Clinical Genetics, Maastricht, Netherlands
| | - Alexander Hoischen
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Nijmegen, Netherlands
- Radboud University Medical Center, Department of Internal Medicine and Radboud Center for Infectious Diseases (RCI), Nijmegen, Netherlands
| | - Christian Gilissen
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525, GA, Nijmegen, The Netherlands.
- Radboud Institute for Molecular Life Sciences, Nijmegen, Netherlands.
| |
Collapse
|
25
|
Pajuste FD, Remm M. GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads. Sci Rep 2023; 13:17765. [PMID: 37853040 PMCID: PMC10584998 DOI: 10.1038/s41598-023-44636-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 10/10/2023] [Indexed: 10/20/2023] Open
Abstract
Genomes exhibit large regions with segmental copy number variation, many of which include entire genes and are multiallelic. We have developed a computational method GeneToCN that counts the frequencies of gene-specific k-mers in FASTQ files and uses this information to infer copy number of the gene. We validated the copy number predictions for amylase genes (AMY1, AMY2A, AMY2B) using experimental data from digital droplet PCR (ddPCR) on 39 individuals and observed a strong correlation (R = 0.99) between GeneToCN predictions and experimentally determined copy numbers. An additional validation on FCGR3 genes showed a higher concordance for FCGR3A compared to two other methods, but reduced accuracy for FCGR3B. We further tested the method on three different genomic regions (SMN, NPY4R, and LPA Kringle IV-2 domain). Predicted copy number distributions of these genes in a set of 500 individuals from the Estonian Biobank were in good agreement with the previously published studies. In addition, we investigated the possibility to use GeneToCN on sequencing data generated by different technologies by comparing copy number predictions from Illumina, PacBio, and Oxford Nanopore data of the same sample. Despite the differences in variability of k-mer frequencies, all three sequencing technologies give similar predictions with GeneToCN.
Collapse
Affiliation(s)
- Fanny-Dhelia Pajuste
- Institute of Molecular and Cell Biology, University of Tartu, 23 Riia Str., 51010, Tartu, Estonia.
| | - Maido Remm
- Institute of Molecular and Cell Biology, University of Tartu, 23 Riia Str., 51010, Tartu, Estonia
| |
Collapse
|
26
|
del Rosario RC, Krienen FM, Zhang Q, Goldman M, Mello C, Lutservitz A, Ichihara K, Wysoker A, Nemesh J, Feng G, McCarroll SA. Sibling chimerism among microglia in marmosets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.16.562516. [PMID: 37904944 PMCID: PMC10614798 DOI: 10.1101/2023.10.16.562516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Chimerism happens rarely among most mammals but is common in marmosets and tamarins, a result of fraternal twin or triplet birth patterns in which in utero connected circulatory systems (through which stem cells transit) lead to persistent blood chimerism (12-80%) throughout life. The presence of Y-chromosome DNA sequences in other organs of female marmosets has long suggested that chimerism might also affect these organs. However, a longstanding question is whether this chimerism is driven by blood-derived cells or involves contributions from other cell types. To address this question, we analyzed single-cell RNA-seq data from blood, liver, kidney and multiple brain regions across a number of marmosets, using transcribed single nucleotide polymorphisms (SNPs) to identify cells with the sibling's genome in various cell types within these tissues. Sibling-derived chimerism in all tissues arose entirely from cells of hematopoietic origin (i.e., myeloid and lymphoid lineages). In brain tissue this was reflected as sibling-derived chimerism among microglia (20-52%) and macrophages (18-64%) but not among other resident cell types (i.e., neurons, glia or ependymal cells). The percentage of microglia that were sibling-derived showed significant variation across brain regions, even within individual animals, likely reflecting distinct responses by siblings' microglia to local recruitment or proliferation cues or, potentially, distinct clonal expansion histories in different brain areas. In the animals and tissues we analyzed, microglial gene expression profiles bore a much stronger relationship to local/host context than to sibling genetic differences. Naturally occurring marmoset chimerism will provide new ways to understand the effects of genes, mutations and brain contexts on microglial biology and to distinguish between effects of microglia and other cell types on brain phenotypes.
Collapse
Affiliation(s)
- Ricardo C.H. del Rosario
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Fenna M. Krienen
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- current address: Princeton Neuroscience Institute
| | - Qiangge Zhang
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Melissa Goldman
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Curtis Mello
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alyssa Lutservitz
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kiku Ichihara
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alec Wysoker
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - James Nemesh
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Guoping Feng
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- current address: Princeton Neuroscience Institute
| | - Steven A. McCarroll
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
27
|
Liu G, Yang H, He Z. Detection of copy number variations based on a local distance using next-generation sequencing data. Front Genet 2023; 14:1147761. [PMID: 37811148 PMCID: PMC10556732 DOI: 10.3389/fgene.2023.1147761] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 09/14/2023] [Indexed: 10/10/2023] Open
Abstract
As one of the main types of structural variation in the human genome, copy number variation (CNV) plays an important role in the occurrence and development of human cancers. Next-generation sequencing (NGS) technology can provide base-level resolution, which provides favorable conditions for the accurate detection of CNVs. However, it is still a very challenging task to accurately detect CNVs from cancer samples with different purity and low sequencing coverage. Local distance-based CNV detection (LDCNV), an innovative computational approach to predict CNVs using NGS data, is proposed in this work. LDCNV calculates the average distance between each read depth (RD) and its k nearest neighbors (KNNs) to define the distance of KNNs of each RD, and the average distance between the KNNs for each RD to define their internal distance. Based on the above definitions, a local distance score is constructed using the ratio between the distance of KNNs and the internal distance of KNNs for each RD. The local distance scores are used to fit a normal distribution to evaluate the significance level of each RDS, and then use the hypothesis test method to predict the CNVs. The performance of the proposed method is verified with simulated and real data and compared with several popular methods. The experimental results show that the proposed method is superior to various other techniques. Therefore, the proposed method can be helpful for cancer diagnosis and targeted drug development.
Collapse
Affiliation(s)
- Guojun Liu
- School of Mathematics, Xi’an University of Finance and Economics, Xi’an, China
| | - Hongzhi Yang
- Department of Radiology, XD Group Hospital, Xi’an, China
| | - Zongzhen He
- School of Mathematics, Xi’an University of Finance and Economics, Xi’an, China
| |
Collapse
|
28
|
Badrane H, Cheng S, Dupont CL, Hao B, Driscoll E, Morder K, Liu G, Newbrough A, Fleres G, Kaul D, Espinoza JL, Clancy CJ, Nguyen MH. Genotypic diversity and unrecognized antifungal resistance among populations of Candida glabrata from positive blood cultures. Nat Commun 2023; 14:5918. [PMID: 37739935 PMCID: PMC10516878 DOI: 10.1038/s41467-023-41509-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 09/07/2023] [Indexed: 09/24/2023] Open
Abstract
The longstanding model is that most bloodstream infections (BSIs) are caused by a single organism. We perform whole genome sequencing of five-to-ten strains from blood culture (BC) bottles in each of ten patients with Candida glabrata BSI. We demonstrate that BCs contain mixed populations of clonal but genetically diverse strains. Genetically distinct strains from two patients exhibit phenotypes that are potentially important during BSIs, including differences in susceptibility to antifungal agents and phagocytosis. In both patients, the clinical microbiology lab recovered a fluconazole-susceptible index strain, but we identify mixed fluconazole-susceptible and -resistant populations. Diversity in drug susceptibility is likely clinically relevant, as fluconazole-resistant strains were subsequently recovered by the clinical laboratory during persistent or relapsing infections. In one patient, unrecognized respiration-deficient small colony variants are fluconazole-resistant and significantly attenuated for virulence during murine candidiasis. Our data suggest a population-based model of C. glabrata genotypic and phenotypic diversity during BSIs.
Collapse
Affiliation(s)
| | | | | | - Binghua Hao
- University of Pittsburgh, Pittsburgh, PA, USA
| | | | | | - Guojun Liu
- University of Pittsburgh, Pittsburgh, PA, USA
| | | | | | - Drishti Kaul
- J. Craig Venter Institute, La Jolla, CA, 92037, USA
| | | | - Cornelius J Clancy
- University of Pittsburgh, Pittsburgh, PA, USA
- VA Pittsburgh Healthcare System, Pittsburgh, PA, USA
| | | |
Collapse
|
29
|
Antinucci M, Comas D, Calafell F. Population history modulates the fitness effects of Copy Number Variation in the Roma. Hum Genet 2023; 142:1327-1343. [PMID: 37311904 PMCID: PMC10449987 DOI: 10.1007/s00439-023-02579-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 06/02/2023] [Indexed: 06/15/2023]
Abstract
We provide the first whole genome Copy Number Variant (CNV) study addressing Roma, along with reference populations from South Asia, the Middle East and Europe. Using CNV calling software for short-read sequence data, we identified 3171 deletions and 489 duplications. Taking into account the known population history of the Roma, as inferred from whole genome nucleotide variation, we could discern how this history has shaped CNV variation. As expected, patterns of deletion variation, but not duplication, in the Roma followed those obtained from single nucleotide polymorphisms (SNPs). Reduced effective population size resulting in slightly relaxed natural selection may explain our observation of an increase in intronic (but not exonic) deletions within Loss of Function (LoF)-intolerant genes. Over-representation analysis for LoF-intolerant gene sets hosting intronic deletions highlights a substantial accumulation of shared biological processes in Roma, intriguingly related to signaling, nervous system and development features, which may be related to the known profile of private disease in the population. Finally, we show the link between deletions and known trait-related SNPs reported in the genome-wide association study (GWAS) catalog, which exhibited even frequency distributions among the studied populations. This suggests that, in general human populations, the strong association between deletions and SNPs associated to biomedical conditions and traits could be widespread across continental populations, reflecting a common background of potentially disease/trait-related CNVs.
Collapse
Affiliation(s)
- Marco Antinucci
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - David Comas
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Francesc Calafell
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
30
|
Babadi M, Fu JM, Lee SK, Smirnov AN, Gauthier LD, Walker M, Benjamin DI, Zhao X, Karczewski KJ, Wong I, Collins RL, Sanchis-Juan A, Brand H, Banks E, Talkowski ME. GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data. Nat Genet 2023; 55:1589-1597. [PMID: 37604963 PMCID: PMC10904014 DOI: 10.1038/s41588-023-01449-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 06/16/2023] [Indexed: 08/23/2023]
Abstract
Copy number variants (CNVs) are major contributors to genetic diversity and disease. While standardized methods, such as the genome analysis toolkit (GATK), exist for detecting short variants, technical challenges have confounded uniform large-scale CNV analyses from whole-exome sequencing (WES) data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, we developed GATK-gCNV, a flexible algorithm to discover rare CNVs from sequencing read-depth information, complete with open-source distribution via GATK. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data, finding up to 95% recall of rare coding CNVs at a resolution of more than two exons. We used GATK-gCNV to generate a reference catalog of rare coding CNVs in WES data from 197,306 individuals in the UK Biobank, and observed strong correlations between per-gene CNV rates and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in WES data, with broad applications.
Collapse
Affiliation(s)
- Mehrtash Babadi
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Jack M Fu
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Samuel K Lee
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Andrey N Smirnov
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laura D Gauthier
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark Walker
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - David I Benjamin
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Isaac Wong
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Ryan L Collins
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Harrison Brand
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Eric Banks
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
31
|
Mukamel RE, Handsaker RE, Sherman MA, Barton AR, Hujoel MLA, McCarroll SA, Loh PR. Repeat polymorphisms underlie top genetic risk loci for glaucoma and colorectal cancer. Cell 2023; 186:3659-3673.e23. [PMID: 37527660 PMCID: PMC10528368 DOI: 10.1016/j.cell.2023.07.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 04/07/2023] [Accepted: 07/03/2023] [Indexed: 08/03/2023]
Abstract
Many regions in the human genome vary in length among individuals due to variable numbers of tandem repeats (VNTRs). To assess the phenotypic impact of VNTRs genome-wide, we applied a statistical imputation approach to estimate the lengths of 9,561 autosomal VNTR loci in 418,136 unrelated UK Biobank participants and 838 GTEx participants. Association and statistical fine-mapping analyses identified 58 VNTRs that appeared to influence a complex trait in UK Biobank, 18 of which also appeared to modulate expression or splicing of a nearby gene. Non-coding VNTRs at TMCO1 and EIF3H appeared to generate the largest known contributions of common human genetic variation to risk of glaucoma and colorectal cancer, respectively. Each of these two VNTRs associated with a >2-fold range of risk across individuals. These results reveal a substantial and previously unappreciated role of non-coding VNTRs in human health and gene regulation.
Collapse
Affiliation(s)
- Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Bioinformatics and Integrative Genomics Program, Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
32
|
Lundtoft C, Eriksson D, Bianchi M, Aranda-Guillén M, Landegren N, Rantapää-Dahlqvist S, Söderkvist P, Meadows JRS, Bensing S, Pielberg GR, Lindblad-Toh K, Rönnblom L, Kämpe O. Relation between HLA and copy number variation of steroid 21-hydroxylase in a Swedish cohort of patients with autoimmune Addison's disease. Eur J Endocrinol 2023; 189:235-241. [PMID: 37553728 DOI: 10.1093/ejendo/lvad102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 03/27/2023] [Accepted: 06/26/2023] [Indexed: 08/10/2023]
Abstract
OBJECTIVE Autoantibodies against the adrenal enzyme 21-hydroxylase is a hallmark manifestation in autoimmune Addison's disease (AAD). Steroid 21-hydroxylase is encoded by CYP21A2, which is located in the human leucocyte antigen (HLA) region together with the highly similar pseudogene CYP21A1P. A high level of copy number variation is seen for the 2 genes, and therefore, we asked whether genetic variation of the CYP21 genes is associated with AAD. DESIGN Case-control study on patients with AAD and healthy controls. METHODS Using next-generation DNA sequencing, we estimated the copy number of CYP21A2 and CYP21A1P, together with HLA alleles, in 479 Swedish patients with AAD and autoantibodies against 21-hydroxylase and in 1393 healthy controls. RESULTS With 95% of individuals carrying 2 functional 21-hydroxylase genes, no difference in CYP21A2 copy number was found when comparing patients and controls. In contrast, we discovered a lower copy number of the pseudogene CYP21A1P among AAD patients (P = 5 × 10-44), together with associations of additional nucleotide variants, in the CYP21 region. However, the strongest association was found for HLA-DQB1*02:01 (P = 9 × 10-63), which, in combination with the DRB1*04:04-DQB1*03:02 haplotype, imposed the greatest risk of AAD. CONCLUSIONS We identified strong associations between copy number variants in the CYP21 region and risk of AAD, although these associations most likely are due to linkage disequilibrium with disease-associated HLA class II alleles.
Collapse
Affiliation(s)
| | - Daniel Eriksson
- Department of Medicine (Solna), Center for Molecular Medicine, Karolinska Instituttet, Stockholm, Sweden
- Department of Clinical Genetics, Uppsala University Hospital, Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Matteo Bianchi
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Maribel Aranda-Guillén
- Department of Medicine (Solna), Center for Molecular Medicine, Karolinska Instituttet, Stockholm, Sweden
| | - Nils Landegren
- Department of Medicine (Solna), Center for Molecular Medicine, Karolinska Instituttet, Stockholm, Sweden
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | | | - Peter Söderkvist
- Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Jennifer R S Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Sophie Bensing
- Department of Endocrinology, Karolinska University Hospital, Stockholm, Sweden
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Gerli Rosengren Pielberg
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- Broad Institute, MIT and Harvard, Cambridge, MA, United States
| | - Lars Rönnblom
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Olle Kämpe
- Department of Medicine (Solna), Center for Molecular Medicine, Karolinska Instituttet, Stockholm, Sweden
- Department of Endocrinology, Karolinska University Hospital, Stockholm, Sweden
- Department of Clinical Science, University of Bergen, Bergen, Norway
- K.G. Jebsen Center for Autoimmune Diseases, University of Bergen, Bergen, Norway
| |
Collapse
|
33
|
Soto DC, Uribe-Salazar JM, Shew CJ, Sekar A, McGinty S, Dennis MY. Genomic structural variation: A complex but important driver of human evolution. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2023; 181 Suppl 76:118-144. [PMID: 36794631 PMCID: PMC10329998 DOI: 10.1002/ajpa.24713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 01/21/2023] [Accepted: 02/05/2023] [Indexed: 02/17/2023]
Abstract
Structural variants (SVs)-including duplications, deletions, and inversions of DNA-can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single-nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well-documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single-nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever-expanding SV compendium propelled by biotechnology advancements.
Collapse
Affiliation(s)
- Daniela C. Soto
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - José M. Uribe-Salazar
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Aarthi Sekar
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Sean McGinty
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| |
Collapse
|
34
|
Marin WM, Augusto DG, Wade KJ, Hollenbach JA. High-throughput complement component 4 genomic sequence analysis with C4Investigator. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.18.549551. [PMID: 37503256 PMCID: PMC10370142 DOI: 10.1101/2023.07.18.549551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The complement component 4 gene locus, composed of the C4A and C4B genes and located on chromosome 6, encodes for C4 protein, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. The C4 gene locus exhibits copy number variation, with each composite gene varying between 0-5 copies per haplotype, C4 genes also vary in size depending on the presence of the HERV retrovirus in intron 9, denoted by C4(L) for long-form and C4(S) for short-form, which modulates expression and is found in both C4A and C4B. Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. C4 copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high-throughput genomic sequence analysis of C4 variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high-throughput characterization of human C4 sequence from short-read sequencing data, named C4Investigator. Using paired-end targeted or whole genome sequence data as input, C4Investigator determines gene copy number for overall C4, C4A, C4B, C4(Rodger), C4(Ch), C4(L), and C4(S), additionally, C4Ivestigator reports the full overall C4 aligned sequence, enabling nucleotide level analysis of C4. To demonstrate the utility of this workflow we have analyzed C4 variation in the 1000 Genomes Project Dataset, showing that the C4 genes are highly poly-allelic with many variants that have the potential to impact C4 protein function.
Collapse
Affiliation(s)
- Wesley M. Marin
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Danillo G. Augusto
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
- Department of Biological Sciences, University of North Carolina Charlotte, Charlotte, NC, United States
- Programa de Pós-Graduação em Genética, Universidade Federal do Paraná, Curitiba, Brazil
| | - Kristen J. Wade
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - Jill A. Hollenbach
- Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA, United States
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, United States
| |
Collapse
|
35
|
Edwards MM, Wang N, Massey DJ, Egli D, Koren A. Incomplete Reprogramming of DNA Replication Timing in Induced Pluripotent Stem Cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.12.544654. [PMID: 37398435 PMCID: PMC10312660 DOI: 10.1101/2023.06.12.544654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Induced pluripotent stem cells (iPSC) are a widely used cell system and a foundation for cell therapy. Differences in gene expression, DNA methylation, and chromatin conformation, which have the potential to affect differentiation capacity, have been identified between iPSCs and embryonic stem cells (ESCs). Less is known about whether DNA replication timing - a process linked to both genome regulation and genome stability - is efficiently reprogrammed to the embryonic state. To answer this, we profiled and compared genome-wide replication timing between ESCs, iPSCs, and cells reprogrammed by somatic cell nuclear transfer (NT-ESCs). While NT-ESCs replicated their DNA in a manner indistinguishable from ESCs, a subset of iPSCs exhibit delayed replication at heterochromatic regions containing genes downregulated in iPSC with incompletely reprogrammed DNA methylation. DNA replication delays were not the result of gene expression and DNA methylation aberrations and persisted after differentiating cells to neuronal precursors. Thus, DNA replication timing can be resistant to reprogramming and lead to undesirable phenotypes in iPSCs, establishing it as an important genomic feature to consider when evaluating iPSC lines.
Collapse
Affiliation(s)
- Matthew M. Edwards
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Ning Wang
- Department of Pediatrics and Naomi Berrie Diabetes Center, Columbia University, New York, New York 10032, USA
- Columbia University Stem Cell Initiative, New York, New York 10032, USA
| | - Dashiell J. Massey
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Dieter Egli
- Department of Pediatrics and Naomi Berrie Diabetes Center, Columbia University, New York, New York 10032, USA
- Columbia University Stem Cell Initiative, New York, New York 10032, USA
| | - Amnon Koren
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
36
|
Hujoel ML, Handsaker RE, Sherman MA, Kamitaki N, Barton AR, Mukamel RE, Terao C, McCarroll SA, Loh PR. Hidden protein-altering variants influence diverse human phenotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.07.544066. [PMID: 37333244 PMCID: PMC10274781 DOI: 10.1101/2023.06.07.544066] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Structural variants (SVs) comprise the largest genetic variants, altering from 50 base pairs to megabases of DNA. However, SVs have not been effectively ascertained in most genetic association studies, leaving a key gap in our understanding of human complex trait genetics. We ascertained protein-altering SVs from UK Biobank whole-exome sequencing data (n=468,570) using haplotype-informed methods capable of detecting sub-exonic SVs and variation within segmental duplications. Incorporating SVs into analyses of rare variants predicted to cause gene loss-of-function (pLoF) identified 100 associations of pLoF variants with 41 quantitative traits. A low-frequency partial deletion of RGL3 exon 6 appeared to confer one of the strongest protective effects of gene LoF on hypertension risk (OR = 0.86 [0.82-0.90]). Protein-coding variation in rapidly-evolving gene families within segmental duplications-previously invisible to most analysis methods-appeared to generate some of the human genome's largest contributions to variation in type 2 diabetes risk, chronotype, and blood cell traits. These results illustrate the potential for new genetic insights from genomic variation that has escaped large-scale analysis to date.
Collapse
Affiliation(s)
- Margaux L.A. Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Robert E. Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard University, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Maxwell A. Sherman
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nolan Kamitaki
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alison R. Barton
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Ronen E. Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Steven A. McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard University, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
37
|
Lee YL, Bosse M, Takeda H, Moreira GCM, Karim L, Druet T, Oget-Ebrad C, Coppieters W, Veerkamp RF, Groenen MAM, Georges M, Bouwman AC, Charlier C. High-resolution structural variants catalogue in a large-scale whole genome sequenced bovine family cohort data. BMC Genomics 2023; 24:225. [PMID: 37127590 PMCID: PMC10152703 DOI: 10.1186/s12864-023-09259-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 03/20/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND Structural variants (SVs) are chromosomal segments that differ between genomes, such as deletions, duplications, insertions, inversions and translocations. The genomics revolution enabled the discovery of sub-microscopic SVs via array and whole-genome sequencing (WGS) data, paving the way to unravel the functional impact of SVs. Recent human expression QTL mapping studies demonstrated that SVs play a disproportionally large role in altering gene expression, underlining the importance of including SVs in genetic analyses. Therefore, this study aimed to generate and explore a high-quality bovine SV catalogue exploiting a unique cattle family cohort data (total 266 samples, forming 127 trios). RESULTS We curated 13,731 SVs segregating in the population, consisting of 12,201 deletions, 1,509 duplications, and 21 multi-allelic CNVs (> 50-bp). Of these, we validated a subset of copy number variants (CNVs) utilising a direct genotyping approach in an independent cohort, indicating that at least 62% of the CNVs are true variants, segregating in the population. Among gene-disrupting SVs, we prioritised two likely high impact duplications, encompassing ORM1 and POPDC3 genes, respectively. Liver expression QTL mapping results revealed that these duplications are likely causing altered gene expression, confirming the functional importance of SVs. Although most of the accurately genotyped CNVs are tagged by single nucleotide polymorphisms (SNPs) ascertained in WGS data, most CNVs were not captured by individual SNPs obtained from a 50K genotyping array. CONCLUSION We generated a high-quality SV catalogue exploiting unique whole genome sequenced bovine family cohort data. Two high impact duplications upregulating the ORM1 and POPDC3 are putative candidates for postpartum feed intake and hoof health traits, thus warranting further investigation. Generally, CNVs were in low LD with SNPs on the 50K array. Hence, it remains crucial to incorporate CNVs via means other than tagging SNPs, such as investigation of tagging haplotypes, direct imputation of CNVs, or direct genotyping as done in the current study. The SV catalogue and the custom genotyping array generated in the current study will serve as valuable resources accelerating utilisation of full spectrum of genetic variants in bovine genomes.
Collapse
Affiliation(s)
- Young-Lim Lee
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands.
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium.
| | - Mirte Bosse
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Haruko Takeda
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | | | - Latifa Karim
- GIGA Institute, GIGA Genomics Platform, University of Liège, Liège, Belgium
| | - Tom Druet
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Claire Oget-Ebrad
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Wouter Coppieters
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
- GIGA Institute, GIGA Genomics Platform, University of Liège, Liège, Belgium
| | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Martien A M Groenen
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Michel Georges
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Carole Charlier
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| |
Collapse
|
38
|
Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, Paten B. A draft human pangenome reference. Nature 2023; 617:312-324. [PMID: 37165242 PMCID: PMC10172123 DOI: 10.1038/s41586-023-05896-x] [Citation(s) in RCA: 227] [Impact Index Per Article: 227.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Collapse
Affiliation(s)
- Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Mobin Asri
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Daniel Doerr
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Marina Haukness
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - Julian K Lucas
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jean Monlong
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haley J Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | - Xian H Chang
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Robert S Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, Québec, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Charles Markello
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Hugh E Olsen
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Trevor Pesout
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonas A Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Carl A Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | | | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Robert M Cook-Deegan
- Barrett and O'Connor Washington Center, Arizona State University, Washington, DC, USA
| | - Omar E Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L Felsenfeld
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nanibaa' A Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, CA, USA
| | | | - Jan O Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hugo Magalhães
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Alice B Popejoy
- Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I Schultz
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Michael W Smith
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J Sofia
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children's Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Aleksey V Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Karen H Miga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany.
| | - Ira M Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
39
|
Samelak-Czajka A, Wojciechowski P, Marszalek-Zenczak M, Figlerowicz M, Zmienko A. Differences in the intraspecies copy number variation of Arabidopsis thaliana conserved and nonconserved miRNA genes. Funct Integr Genomics 2023; 23:120. [PMID: 37036577 PMCID: PMC10085913 DOI: 10.1007/s10142-023-01043-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 03/23/2023] [Accepted: 03/25/2023] [Indexed: 04/11/2023]
Abstract
MicroRNAs (miRNAs) regulate gene expression by RNA interference mechanism. In plants, miRNA genes (MIRs) which are grouped into conserved families, i.e. they are present among the different plant taxa, are involved in the regulation of many developmental and physiological processes. The roles of the nonconserved MIRs-which are MIRs restricted to one plant family, genus, or even species-are less recognized; however, many of them participate in the responses to biotic and abiotic stresses. Both over- and underproduction of miRNAs may influence various biological processes. Consequently, maintaining intracellular miRNA homeostasis seems to be crucial for the organism. Deletions and duplications in the genomic sequence may alter gene dosage and/or activity. We evaluated the extent of copy number variations (CNVs) among Arabidopsis thaliana (Arabidopsis) MIRs in over 1000 natural accessions, using population-based analysis of the short-read sequencing data. We showed that the conserved MIRs were unlikely to display CNVs and their deletions were extremely rare, whereas nonconserved MIRs presented moderate variation. Transposon-derived MIRs displayed exceptionally high diversity. Conversely, MIRs involved in the epigenetic control of transposons reactivated during development were mostly invariable. MIR overlap with the protein-coding genes also limited their variability. At the expression level, a higher rate of nonvariable, nonconserved miRNAs was detectable in Col-0 leaves, inflorescence, and siliques compared to nonconserved variable miRNAs, although the expression of both groups was much lower than that of the conserved MIRs. Our data indicate that CNV rate of Arabidopsis MIRs is related with their age, function, and genomic localization.
Collapse
Affiliation(s)
- Anna Samelak-Czajka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland
| | - Pawel Wojciechowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland
- Institute of Computing Science, Faculty of Computing and Telecommunications, Poznan University of Technology, 60-965, Poznan, Poland
| | | | - Marek Figlerowicz
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland.
| | - Agnieszka Zmienko
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland.
| |
Collapse
|
40
|
Badrane H, Cheng S, Dupont CL, Hao B, Driscoll E, Morder K, Liu G, Newbrough A, Fleres G, Kaul D, Espinoza JL, Clancy CJ, Nguyen MH. Genotypic diversity and unrecognized antifungal resistance among populations of Candida glabrata from positive blood cultures. RESEARCH SQUARE 2023:rs.3.rs-2706400. [PMID: 37066226 PMCID: PMC10104189 DOI: 10.21203/rs.3.rs-2706400/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The longstanding paradigm is that most bloodstream infections (BSIs) are caused by a single organism. We performed whole genome sequencing of five-to-ten strains from blood culture (BC) bottles in each of ten patients with Candida glabrata BSI. We demonstrated that BCs contained mixed populations of clonal but genetically diverse strains. Genetically distinct strains from two patients exhibited phenotypes that were potentially important during BSIs, including differences in susceptibility to antifungal agents and phagocytosis. In both patients, the clinical microbiology lab recovered a fluconazole-susceptible index strain, but we identified mixed fluconazole-susceptible and â€"resistant populations. Diversity in drug susceptibility was likely clinically relevant, as fluconazole-resistant strains were subsequently recovered by the clinical laboratory during persistent or relapsing infections. In one patient, unrecognized respiration-deficient small colony variants were fluconazole-resistant and significantly attenuated for virulence during murine candidiasis. Our data suggest a new population-based paradigm of C. glabrata genotypic and phenotypic diversity during BSIs.
Collapse
Affiliation(s)
| | - Shaoji Cheng
- University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | | - Binghua Hao
- University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | | | | - Guojun Liu
- University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | | | | | | | | - Cornelius J Clancy
- University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, USA
| | | |
Collapse
|
41
|
Martin-Fernandez L, Garcia-Martínez I, Lopez S, Martinez-Perez A, Vilalta N, Plaza M, Moret C, Viñuela A, Brown AA, Panousis NI, Buil A, Dermitzakis ET, Corrales I, Souto JC, Vidal F, Soria JM. Multiallelic Copy Number Variation in ORM1 is Associated with Plasma Cell-Free DNA Levels as an Intermediate Phenotype for Venous Thromboembolism. Thromb Haemost 2023; 123:438-452. [PMID: 36696913 DOI: 10.1055/s-0043-1760844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Venous thromboembolism (VTE) is a common disease with high heritability. However, only a small portion of the genetic variance of VTE can be explained by known genetic risk factors. Neutrophil extracellular traps (NETs) have been associated with prothrombotic activity. Therefore, the genetic basis of NETs could reveal novel risk factors for VTE. A recent genome-wide association study of plasma cell-free DNA (cfDNA) levels in the Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT-2) Project showed a significant associated locus near ORM1. We aimed to further explore this candidate region by next-generation sequencing, copy number variation (CNV) quantification, and expression analysis using an extreme phenotype sampling design involving 80 individuals from the GAIT-2 Project. The RETROVE study with 400 VTE cases and 400 controls was used to replicate the results. A total of 105 genetic variants and a multiallelic CNV (mCNV) spanning ORM1 were identified in GAIT-2. Of these, 17 independent common variants, a region of 22 rare variants, and the mCNV were significantly associated with cfDNA levels. In addition, eight of these common variants and the mCNV influenced ORM1 expression. The association of the mCNV and cfDNA levels was replicated in RETROVE (p-value = 1.19 × 10-6). Additional associations between the mCNV and thrombin generation parameters were identified. Our results reveal that increased mCNV dosages in ORM1 decreased gene expression and upregulated cfDNA levels. Therefore, the mCNV in ORM1 appears to be a novel marker for cfDNA levels, which could contribute to VTE risk.
Collapse
Affiliation(s)
- Laura Martin-Fernandez
- Genomics of Complex Diseases Unit, Research Institute Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona, Spain
- Congenital Coagulopathies Laboratory, Blood and Tissue Bank, Barcelona, Spain
- Fundación Española de Trombosis y Hemostasia (FETH), Madrid, Spain
- Transfusional Medicine, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona (VHIR-UAB), Barcelona, Spain
| | - Iris Garcia-Martínez
- Congenital Coagulopathies Laboratory, Blood and Tissue Bank, Barcelona, Spain
- Transfusional Medicine, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona (VHIR-UAB), Barcelona, Spain
| | - Sonia Lopez
- Genomics of Complex Diseases Unit, Research Institute Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona, Spain
| | - Angel Martinez-Perez
- Genomics of Complex Diseases Unit, Research Institute Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona, Spain
| | - Noelia Vilalta
- Hemostasis and Thrombosis Unit, Department of Hematology, Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona, Spain
| | - Melania Plaza
- Hemostasis and Thrombosis Unit, Department of Hematology, Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona, Spain
| | - Carla Moret
- Hemostasis and Thrombosis Unit, Department of Hematology, Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona, Spain
| | - Ana Viñuela
- Biosciences Institute, Faculty of Medicine, Newcastle University, Newcastle Upon Tyne, United Kingdom
| | - Andrew A Brown
- Population Health and Genomics, University of Dundee, Dundee, Scotland, United Kingdom
| | - Nikolaos I Panousis
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, South Cambridgeshire, United Kingdom
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland
| | - Alfonso Buil
- Institute of Biological Psychiatry, Mental Health Sct. Hans Hospital, Roskilde, Denmark
| | | | - Irene Corrales
- Congenital Coagulopathies Laboratory, Blood and Tissue Bank, Barcelona, Spain
- Transfusional Medicine, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona (VHIR-UAB), Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Instituto Carlos III (ISCIII), Madrid, Spain
| | - Juan Carlos Souto
- Hemostasis and Thrombosis Unit, Department of Hematology, Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona, Spain
| | - Francisco Vidal
- Congenital Coagulopathies Laboratory, Blood and Tissue Bank, Barcelona, Spain
- Transfusional Medicine, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona (VHIR-UAB), Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Instituto Carlos III (ISCIII), Madrid, Spain
| | - Jose Manuel Soria
- Genomics of Complex Diseases Unit, Research Institute Hospital de la Santa Creu i Sant Pau, IIB-Sant Pau, Barcelona, Spain
| |
Collapse
|
42
|
Dallmann-Sauer M, Xu YZ, da Costa ALF, Tao S, Gomes TA, Prata RBDS, Correa-Macedo W, Manry J, Alcaïs A, Abel L, Cobat A, Fava VM, Pinheiro RO, Lara FA, Probst CM, Mira MT, Schurr E. Allele-dependent interaction of LRRK2 and NOD2 in leprosy. PLoS Pathog 2023; 19:e1011260. [PMID: 36972292 PMCID: PMC10079233 DOI: 10.1371/journal.ppat.1011260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 04/06/2023] [Accepted: 03/02/2023] [Indexed: 03/29/2023] Open
Abstract
Leprosy, caused by Mycobacterium leprae, rarely affects children younger than 5 years. Here, we studied a multiplex leprosy family that included monozygotic twins aged 22 months suffering from paucibacillary leprosy. Whole genome sequencing identified three amino acid mutations previously associated with Crohn’s disease and Parkinson’s disease as candidate variants for early onset leprosy: LRRK2 N551K, R1398H and NOD2 R702W. In genome-edited macrophages, we demonstrated that cells expressing the LRRK2 mutations displayed reduced apoptosis activity following mycobacterial challenge independently of NOD2. However, employing co-immunoprecipitation and confocal microscopy we showed that LRRK2 and NOD2 proteins interacted in RAW cells and monocyte-derived macrophages, and that this interaction was substantially reduced for the NOD2 R702W mutation. Moreover, we observed a joint effect of LRRK2 and NOD2 variants on Bacillus Calmette-Guérin (BCG)-induced respiratory burst, NF-κB activation and cytokine/chemokine secretion with a strong impact for the genotypes found in the twins consistent with a role of the identified mutations in the development of early onset leprosy.
Collapse
Affiliation(s)
- Monica Dallmann-Sauer
- Program in Infectious Diseases and Immunity in Global Health, The Research Institute of the McGill University Health Centre; Montreal, Canada
- McGill International TB Centre, McGill University; Montreal, Canada
- Departments of Human Genetics and Medicine, Faculty of Medicine and Health Science, McGill University; Montreal, Canada
- Graduate Program in Health Sciences, School of Medicine and Life Sciences, Pontifícia Universidade Católica do Paraná; Curitiba, Brazil
| | - Yong Zhong Xu
- Program in Infectious Diseases and Immunity in Global Health, The Research Institute of the McGill University Health Centre; Montreal, Canada
- McGill International TB Centre, McGill University; Montreal, Canada
| | - Ana Lúcia França da Costa
- Department of Specialized Medicine, Health Sciences Center, Federal University of Piauí; Teresina, Brazil
| | - Shao Tao
- Division of Experimental Medicine, Faculty of Medicine, McGill University; Montreal, Canada
- The Translational Research in Respiratory Diseases Program, The Research Institute of the McGill University Health Centre; Montreal, Canada
| | - Tiago Araujo Gomes
- Laboratory of Cellular Microbiology, Oswaldo Cruz Institute, Oswaldo Cruz Foundation; Rio de Janeiro, Brazil
| | | | - Wilian Correa-Macedo
- Program in Infectious Diseases and Immunity in Global Health, The Research Institute of the McGill University Health Centre; Montreal, Canada
- McGill International TB Centre, McGill University; Montreal, Canada
- Department of Biochemistry, Faculty of Medicine and Health Science, McGill University; Montreal, Canada
| | - Jérémy Manry
- Program in Infectious Diseases and Immunity in Global Health, The Research Institute of the McGill University Health Centre; Montreal, Canada
- McGill International TB Centre, McGill University; Montreal, Canada
| | - Alexandre Alcaïs
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, Institut National de la Santé et de la Recherche Médicale U.1163, Paris, France
- Université Paris Cité, Imagine Institute, Paris, France
| | - Laurent Abel
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, Institut National de la Santé et de la Recherche Médicale U.1163, Paris, France
- Université Paris Cité, Imagine Institute, Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, Rockefeller University, New York, United States of America
| | - Aurélie Cobat
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, Institut National de la Santé et de la Recherche Médicale U.1163, Paris, France
- Université Paris Cité, Imagine Institute, Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, Rockefeller University, New York, United States of America
| | - Vinicius M. Fava
- Program in Infectious Diseases and Immunity in Global Health, The Research Institute of the McGill University Health Centre; Montreal, Canada
- McGill International TB Centre, McGill University; Montreal, Canada
| | - Roberta Olmo Pinheiro
- Leprosy Laboratory, Oswaldo Cruz Institute, Oswaldo Cruz Foundation; Rio de Janeiro, Brazil
| | - Flavio Alves Lara
- Laboratory of Cellular Microbiology, Oswaldo Cruz Institute, Oswaldo Cruz Foundation; Rio de Janeiro, Brazil
| | - Christian M. Probst
- Laboratory of Systems and Molecular Biology of Trypanosomatids, Instituto Carlos Chagas; FIOCRUZ, Curitiba, Brazil
| | - Marcelo T. Mira
- Graduate Program in Health Sciences, School of Medicine and Life Sciences, Pontifícia Universidade Católica do Paraná; Curitiba, Brazil
- * E-mail: (M.T.M); (E.S.)
| | - Erwin Schurr
- Program in Infectious Diseases and Immunity in Global Health, The Research Institute of the McGill University Health Centre; Montreal, Canada
- McGill International TB Centre, McGill University; Montreal, Canada
- Departments of Human Genetics and Medicine, Faculty of Medicine and Health Science, McGill University; Montreal, Canada
- Department of Biochemistry, Faculty of Medicine and Health Science, McGill University; Montreal, Canada
- * E-mail: (M.T.M); (E.S.)
| |
Collapse
|
43
|
Nguyen TV, Vander Jagt CJ, Wang J, Daetwyler HD, Xiang R, Goddard ME, Nguyen LT, Ross EM, Hayes BJ, Chamberlain AJ, MacLeod IM. In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants. Genet Sel Evol 2023; 55:9. [PMID: 36721111 PMCID: PMC9887926 DOI: 10.1186/s12711-023-00783-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/23/2023] [Indexed: 02/02/2023] Open
Abstract
Studies have demonstrated that structural variants (SV) play a substantial role in the evolution of species and have an impact on Mendelian traits in the genome. However, unlike small variants (< 50 bp), it has been challenging to accurately identify and genotype SV at the population scale using short-read sequencing. Long-read sequencing technologies are becoming competitively priced and can address several of the disadvantages of short-read sequencing for the discovery and genotyping of SV. In livestock species, analysis of SV at the population scale still faces challenges due to the lack of resources, high costs, technological barriers, and computational limitations. In this review, we summarize recent progress in the characterization of SV in the major livestock species, the obstacles that still need to be overcome, as well as the future directions in this growing field. It seems timely that research communities pool resources to build global population-scale long-read sequencing consortiums for the major livestock species for which the application of genomic tools has become cost-effective.
Collapse
Affiliation(s)
- Tuan V. Nguyen
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Christy J. Vander Jagt
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Jianghui Wang
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Hans D. Daetwyler
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Ruidong Xiang
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1008.90000 0001 2179 088XFaculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Michael E. Goddard
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1008.90000 0001 2179 088XFaculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Loan T. Nguyen
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Elizabeth M. Ross
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Ben J. Hayes
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Amanda J. Chamberlain
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Iona M. MacLeod
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| |
Collapse
|
44
|
Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak. Nat Commun 2023; 14:283. [PMID: 36650186 PMCID: PMC9845341 DOI: 10.1038/s41467-023-35996-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 01/11/2023] [Indexed: 01/19/2023] Open
Abstract
Long-read sequencing has demonstrated great potential for characterizing all types of structural variations (SVs). However, existing algorithms have insufficient sensitivity and precision. To address these limitations, we present DeBreak, a computational method for comprehensive and accurate SV discovery. Based on alignment results, DeBreak employs a density-based approach for clustering SV candidates together with a local de novo assembly approach for reconstructing long insertions. A partial order alignment algorithm ensures precise SV breakpoints with single base-pair resolution, and a k-means clustering method can report multi-allele SV events. DeBreak outperforms existing tools on both simulated and real long-read sequencing data from both PacBio and Nanopore platforms. An important application of DeBreak is analyzing cancer genomes for potentially tumor-driving SVs. DeBreak can also be used for supplementing whole-genome assembly-based SV discovery.
Collapse
|
45
|
Söylev A, Çokoglu SS, Koptekin D, Alkan C, Somel M. CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data. PLoS Comput Biol 2022; 18:e1010788. [PMID: 36516232 PMCID: PMC9873172 DOI: 10.1371/journal.pcbi.1010788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 01/24/2023] [Accepted: 12/03/2022] [Indexed: 12/15/2022] Open
Abstract
To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low genome coverage (<1×) and short fragments (<80 bps), precluding standard CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, tailored for genotyping CNVs at low coverage. Simulations and down-sampling experiments suggest that CONGA can genotype deletions >1 kbps with F-scores >0.75 at ≥1×, and distinguish between heterozygous and homozygous states. We used CONGA to genotype 10,002 outgroup-ascertained deletions across a heterogenous set of 71 ancient human genomes spanning the last 50,000 years, produced using variable experimental protocols. A fraction of these (21/71) display divergent deletion profiles unrelated to their population origin, but attributable to technical factors such as coverage and read length. The majority of the sample (50/71), despite originating from nine different laboratories and having coverages ranging from 0.44×-26× (median 4×) and average read lengths 52-121 bps (median 69), exhibit coherent deletion frequencies. Across these 50 genomes, inter-individual genetic diversity measured using SNPs and CONGA-genotyped deletions are highly correlated. CONGA-genotyped deletions also display purifying selection signatures, as expected. CONGA thus paves the way for systematic CNV analyses in ancient genomes, despite the technical challenges posed by low and variable genome coverage.
Collapse
Affiliation(s)
- Arda Söylev
- Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- * E-mail: (AS); (MS)
| | | | - Dilek Koptekin
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara, Turkey
| | - Mehmet Somel
- Department of Biology, Middle East Technical University, Ankara, Turkey
- * E-mail: (AS); (MS)
| |
Collapse
|
46
|
Doleschall M, Darvasi O, Herold Z, Doleschall Z, Nyirő G, Somogyi A, Igaz P, Patócs A. Quantitative PCR from human genomic DNA: The determination of gene copy numbers for congenital adrenal hyperplasia and RCCX copy number variation. PLoS One 2022; 17:e0277299. [PMID: 36454796 PMCID: PMC9714944 DOI: 10.1371/journal.pone.0277299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 10/25/2022] [Indexed: 12/05/2022] Open
Abstract
Quantitative PCR (qPCR) is used for the determination of gene copy number (GCN). GCNs contribute to human disorders, and characterize copy number variation (CNV). The single laboratory method validations of duplex qPCR assays with hydrolysis probes on CYP21A1P and CYP21A2 genes, residing a CNV (RCCX CNV) and related to congenital adrenal hyperplasia, were performed using 46 human genomic DNA samples. We also performed the verifications on 5 qPCR assays for the genetic elements of RCCX CNV; C4A, C4B, CNV breakpoint, HERV-K(C4) CNV deletion and insertion alleles. Precision of each qPCR assay was under 1.01 CV%. Accuracy (relative error) ranged from 4.96±4.08% to 9.91±8.93%. Accuracy was not tightly linked to precision, but was significantly correlated with the efficiency of normalization using the RPPH1 internal reference gene (Spearman's ρ: 0.793-0.940, p>0.0001), ambiguity (ρ = 0.671, p = 0.029) and misclassification (ρ = 0.769, p = 0.009). A strong genomic matrix effect was observed, and target-singleplex (one target gene in one assay) qPCR was able to appropriately differentiate 2 GCN from 3 GCN at best. The analysis of all GCNs from the 7 qPCR assays using a multiplex approach increased the resolution of differentiation, and produced 98% of GCNs unambiguously, and all of which were in 100% concordance with GCNs measured by Southern blot, MLPA and aCGH. We conclude that the use of an internal (in one assay with the target gene) reference gene, the use of allele-specific primers or probes, and the multiplex approach (in one assay or different assays) are crucial for GCN determination using qPCR or other methods.
Collapse
Affiliation(s)
- Márton Doleschall
- Molecular Medicine Research Group, Eotvos Lorand Research Network and Semmelweis University, Budapest, Hungary
- * E-mail:
| | - Ottó Darvasi
- Hereditary Tumours Research Group, Eotvos Lorand Research Network and Semmelweis University, Budapest, Hungary
| | - Zoltán Herold
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Zoltán Doleschall
- Department of Pathogenetics, National Institute of Oncology, Budapest, Hungary
| | - Gábor Nyirő
- Molecular Medicine Research Group, Eotvos Lorand Research Network and Semmelweis University, Budapest, Hungary
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Anikó Somogyi
- Department of Internal Medicine and Hematology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Péter Igaz
- Molecular Medicine Research Group, Eotvos Lorand Research Network and Semmelweis University, Budapest, Hungary
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
- Department of Endocrinology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Attila Patócs
- Hereditary Tumours Research Group, Eotvos Lorand Research Network and Semmelweis University, Budapest, Hungary
- Department of Internal Medicine and Hematology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
- Department of Molecular Genetics, National Institute of Oncology, Budapest, Hungary
| |
Collapse
|
47
|
Pokrovac I, Pezer Ž. Recent advances and current challenges in population genomics of structural variation in animals and plants. Front Genet 2022; 13:1060898. [PMID: 36523759 PMCID: PMC9745067 DOI: 10.3389/fgene.2022.1060898] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/15/2022] [Indexed: 05/02/2024] Open
Abstract
The field of population genomics has seen a surge of studies on genomic structural variation over the past two decades. These studies witnessed that structural variation is taxonomically ubiquitous and represent a dominant form of genetic variation within species. Recent advances in technology, especially the development of long-read sequencing platforms, have enabled the discovery of structural variants (SVs) in previously inaccessible genomic regions which unlocked additional structural variation for population studies and revealed that more SVs contribute to evolution than previously perceived. An increasing number of studies suggest that SVs of all types and sizes may have a large effect on phenotype and consequently major impact on rapid adaptation, population divergence, and speciation. However, the functional effect of the vast majority of SVs is unknown and the field generally lacks evidence on the phenotypic consequences of most SVs that are suggested to have adaptive potential. Non-human genomes are heavily under-represented in population-scale studies of SVs. We argue that more research on other species is needed to objectively estimate the contribution of SVs to evolution. We discuss technical challenges associated with SV detection and outline the most recent advances towards more representative reference genomes, which opens a new era in population-scale studies of structural variation.
Collapse
Affiliation(s)
| | - Željka Pezer
- Laboratory for Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
| |
Collapse
|
48
|
Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger MR, Porubsky D, Cheng H, Asri M, Logsdon GA, Carnevali P, Chaisson MJP, Chin CS, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton RS, Fulton LL, Garg S, Gerton JL, Ghurye J, Granat A, Green RE, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger EB, Jain M, Kirsche M, Kolmogorov M, Korbel JO, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell MW, McDaniel J, Nie F, Olsen HE, Olson ND, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg SL, Sanders AD, Schatz MC, Schmitt A, Schneider VA, Selvaraj S, Shafin K, Shumate A, Stitziel NO, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin AV, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook JM, Eichler EE, Phillippy AM, Paten B, Howe K, Miga KH. Semi-automated assembly of high-quality diploid human reference genomes. Nature 2022; 611:519-531. [PMID: 36261518 PMCID: PMC9668749 DOI: 10.1038/s41586-022-05325-5] [Citation(s) in RCA: 70] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 09/06/2022] [Indexed: 01/01/2023]
Abstract
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Collapse
Affiliation(s)
- Erich D. Jarvis
- grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA ,grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA
| | - Giulio Formenti
- grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA
| | - Arang Rhie
- grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Andrea Guarracino
- grid.510779.d0000 0004 9414 6915Genomics Research Centre, Human Technopole, Viale Rita Levi-Montalcini, Milan, Italy
| | - Chentao Yang
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China
| | - Jonathan Wood
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Alan Tracey
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Francoise Thibaud-Nissen
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| | - Mitchell R. Vollger
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - David Porubsky
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Haoyu Cheng
- grid.65499.370000 0001 2106 9910Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA ,grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Mobin Asri
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Glennis A. Logsdon
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Paolo Carnevali
- grid.507326.50000 0004 6090 4941Chan Zuckerberg Initiative, Redwood City, CA USA
| | - Mark J. P. Chaisson
- grid.42505.360000 0001 2156 6853Quantitative and Computational Biology, University of Southern California, Los Angeles, CA USA
| | | | - Sarah Cody
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Joanna Collins
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Peter Ebert
- grid.411327.20000 0001 2176 9917Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Merly Escalona
- grid.205975.c0000 0001 0740 6917Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA USA
| | - Olivier Fedrigo
- grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA
| | - Robert S. Fulton
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Lucinda L. Fulton
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Shilpa Garg
- grid.5254.60000 0001 0674 042XDepartment of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jennifer L. Gerton
- grid.250820.d0000 0000 9420 1591Stowers Institute for Medical Research, Kansas City, MO USA
| | - Jay Ghurye
- grid.504403.6Dovetail Genomics, Scotts Valley, CA USA
| | | | - Richard E. Green
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - William Harvey
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Patrick Hasenfeld
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Alex Hastie
- grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA USA
| | - Marina Haukness
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Erich B. Jaeger
- grid.185669.50000 0004 0507 3954Illumina, Inc., San Diego, CA USA
| | - Miten Jain
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Melanie Kirsche
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | - Mikhail Kolmogorov
- grid.266100.30000 0001 2107 4242Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA USA
| | - Jan O. Korbel
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Sergey Koren
- grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Jonas Korlach
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Joyce Lee
- grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA USA
| | - Daofeng Li
- grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO USA
| | - Tina Lindsay
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Julian Lucas
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Feng Luo
- grid.26090.3d0000 0001 0665 0280School of Computing, Clemson University, Clemson, SC USA
| | - Tobias Marschall
- grid.411327.20000 0001 2176 9917Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Matthew W. Mitchell
- grid.282012.b0000 0004 0627 5048Coriell Institute for Medical Research, Camden, NJ USA
| | - Jennifer McDaniel
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Fan Nie
- grid.216417.70000 0001 0379 7164Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Hugh E. Olsen
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Nathan D. Olson
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Trevor Pesout
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Tamara Potapova
- grid.250820.d0000 0000 9420 1591Stowers Institute for Medical Research, Kansas City, MO USA
| | - Daniela Puiu
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Allison Regier
- grid.511991.40000 0004 4910 5831DNAnexus, Mountain View, CA USA
| | - Jue Ruan
- grid.410727.70000 0001 0526 1937Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Steven L. Salzberg
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Ashley D. Sanders
- grid.419491.00000 0001 1014 0849Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Michael C. Schatz
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | | | - Valerie A. Schneider
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| | | | - Kishwar Shafin
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Alaina Shumate
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Nathan O. Stitziel
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Cardiovascular Division, John T. Milliken Department of Internal Medicine, Washington University School of Medicine, St. Louis, USA
| | - Catherine Stober
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - James Torrance
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Justin Wagner
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Jianxin Wang
- grid.216417.70000 0001 0379 7164Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Aaron Wenger
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Chuanle Xiao
- grid.12981.330000 0001 2360 039XState Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Aleksey V. Zimin
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Guojie Zhang
- grid.13402.340000 0004 1759 700XCenter for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
| | - Ting Wang
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO USA
| | - Heng Li
- grid.65499.370000 0001 2106 9910Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA
| | - Erik Garrison
- grid.267301.10000 0004 0386 9246Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN USA
| | - David Haussler
- grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA ,grid.205975.c0000 0001 0740 6917Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA USA
| | - Ira Hall
- grid.47100.320000000419368710Yale School of Medicine, New Haven, CT USA
| | - Justin M. Zook
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Evan E. Eichler
- grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA ,grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Adam M. Phillippy
- grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Benedict Paten
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Kerstin Howe
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Karen H. Miga
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | | |
Collapse
|
49
|
Hujoel MLA, Sherman MA, Barton AR, Mukamel RE, Sankaran VG, Terao C, Loh PR. Influences of rare copy-number variation on human complex traits. Cell 2022; 185:4233-4248.e27. [PMID: 36306736 PMCID: PMC9800003 DOI: 10.1016/j.cell.2022.09.028] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 07/22/2022] [Accepted: 09/19/2022] [Indexed: 11/06/2022]
Abstract
The human genome contains hundreds of thousands of regions harboring copy-number variants (CNV). However, the phenotypic effects of most such polymorphisms are unknown because only larger CNVs have been ascertainable from SNP-array data generated by large biobanks. We developed a computational approach leveraging haplotype sharing in biobank cohorts to more sensitively detect CNVs. Applied to UK Biobank, this approach accounted for approximately half of all rare gene inactivation events produced by genomic structural variation. This CNV call set enabled a detailed analysis of associations between CNVs and 56 quantitative traits, identifying 269 independent associations (p < 5 × 10-8) likely to be causally driven by CNVs. Putative target genes were identifiable for nearly half of the loci, enabling insights into dosage sensitivity of these genes and uncovering several gene-trait relationships. These results demonstrate the ability of haplotype-informed analysis to provide insights into the genetic basis of human complex traits.
Collapse
Affiliation(s)
- Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
| | - Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vijay G Sankaran
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan; Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
50
|
Lepamets M, Auwerx C, Nõukas M, Claringbould A, Porcu E, Kals M, Jürgenson T, Morris AP, Võsa U, Bochud M, Stringhini S, Wijmenga C, Franke L, Peterson H, Vilo J, Lepik K, Mägi R, Kutalik Z. Omics-informed CNV calls reduce false-positive rates and improve power for CNV-trait associations. HGG ADVANCES 2022; 3:100133. [PMID: 36035246 PMCID: PMC9399386 DOI: 10.1016/j.xhgg.2022.100133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/26/2022] [Indexed: 11/29/2022] Open
Abstract
Copy-number variations (CNV) are believed to play an important role in a wide range of complex traits, but discovering such associations remains challenging. While whole-genome sequencing (WGS) is the gold-standard approach for CNV detection, there are several orders of magnitude more samples with available genotyping microarray data. Such array data can be exploited for CNV detection using dedicated software (e.g., PennCNV); however, these calls suffer from elevated false-positive and -negative rates. In this study, we developed a CNV quality score that weights PennCNV calls (pCNVs) based on their likelihood of being true positive. First, we established a measure of pCNV reliability by leveraging evidence from multiple omics data (WGS, transcriptomics, and methylomics) obtained from the same samples. Next, we built a predictor of omics-confirmed pCNVs, termed omics-informed quality score (OQS), using only PennCNV software output parameters. Promisingly, OQS assigned to pCNVs detected in close family members was up to 35% higher than the OQS of pCNVs not carried by other relatives (p < 3.0 × 10−90), outperforming other scores. Finally, in an association study of four anthropometric traits in 89,516 Estonian Biobank samples, the use of OQS led to a relative increase in the trait variance explained by CNVs of up to 56% compared with published quality filtering methods or scores. Overall, we put forward a flexible framework to improve any CNV detection method leveraging multi-omics evidence, applied it to improve PennCNV calls, and demonstrated its utility by improving the statistical power for downstream association analyses.
Collapse
Affiliation(s)
- Maarja Lepamets
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
- Institute of Molecular and Cell Biology, University of Tartu, Tartu 51010, Estonia
- Corresponding author
| | - Chiara Auwerx
- Center for Integrative Genomics, University of Lausanne, Lausanne 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Center for Primary Care and Public Health (Unisanté), Department of Epidemiology and Health Systems, University of Lausanne, Lausanne 1010, Switzerland
| | - Margit Nõukas
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
- Institute of Molecular and Cell Biology, University of Tartu, Tartu 51010, Estonia
| | | | - Eleonora Porcu
- Center for Integrative Genomics, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Center for Primary Care and Public Health (Unisanté), Department of Epidemiology and Health Systems, University of Lausanne, Lausanne 1010, Switzerland
| | - Mart Kals
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki 00014, Finland
| | - Tuuli Jürgenson
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
- Institute of Mathematics and Statistics, University of Tartu, Tartu 51009, Estonia
| | | | - Andrew Paul Morris
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
- Centre for Genetics and Genomics Versus Arthritis, Division of Musculoskeletal and Dermatological Sciences, The University of Manchester, Manchester M13 9PL, UK
| | - Urmo Võsa
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Murielle Bochud
- Center for Primary Care and Public Health (Unisanté), Department of Epidemiology and Health Systems, University of Lausanne, Lausanne 1010, Switzerland
| | - Silvia Stringhini
- Unit of Population Epidemiology, Division of Primary Care, Geneva 1205, Switzerland
| | - Cisca Wijmenga
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9713 AV Groningen, the Netherlands
| | - Lude Franke
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9713 AV Groningen, the Netherlands
- Oncode Institute, 3521 AL Utrecht, the Netherlands
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu 51009, Estonia
| | - Jaak Vilo
- Institute of Computer Science, University of Tartu, Tartu 51009, Estonia
| | - Kaido Lepik
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Center for Primary Care and Public Health (Unisanté), Department of Epidemiology and Health Systems, University of Lausanne, Lausanne 1010, Switzerland
- Institute of Computer Science, University of Tartu, Tartu 51009, Estonia
| | - Reedik Mägi
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Zoltán Kutalik
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Center for Primary Care and Public Health (Unisanté), Department of Epidemiology and Health Systems, University of Lausanne, Lausanne 1010, Switzerland
- Corresponding author
| |
Collapse
|