1
|
Petrazzini BO, Balick DJ, Forrest IS, Cho J, Rocheleau G, Jordan DM, Do R. Ensemble and consensus approaches to prediction of recessive inheritance for missense variants in human disease. CELL REPORTS METHODS 2024; 4:100914. [PMID: 39657681 PMCID: PMC11704621 DOI: 10.1016/j.crmeth.2024.100914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/19/2024] [Accepted: 11/13/2024] [Indexed: 12/12/2024]
Abstract
Mode of inheritance (MOI) is necessary for clinical interpretation of pathogenic variants; however, the majority of variants lack this information. Furthermore, variant effect predictors are fundamentally insensitive to recessive-acting diseases. Here, we present MOI-Pred, a variant pathogenicity prediction tool that accounts for MOI, and ConMOI, a consensus method that integrates variant MOI predictions from three independent tools. MOI-Pred integrates evolutionary and functional annotations to produce variant-level predictions that are sensitive to both dominant-acting and recessive-acting pathogenic variants. Both MOI-Pred and ConMOI show state-of-the-art performance on standard benchmarks. Importantly, dominant and recessive predictions from both tools are enriched in individuals with pathogenic variants for dominant- and recessive-acting diseases, respectively, in a real-world electronic health record (EHR)-based validation approach of 29,981 individuals. ConMOI outperforms its component methods in benchmarking and validation, demonstrating the value of consensus among multiple prediction methods. Predictions for all possible missense variants are provided in the "Data and code availability" section.
Collapse
Affiliation(s)
- Ben O Petrazzini
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniel J Balick
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Biomedical Informatics, Harvard, Medical School, Boston, MA, USA
| | - Iain S Forrest
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Judy Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Ghislain Rocheleau
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniel M Jordan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
2
|
Schraiber JG, Spence JP, Edge MD. Estimation of demography and mutation rates from one million haploid genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.18.613708. [PMID: 39345369 PMCID: PMC11429810 DOI: 10.1101/2024.09.18.613708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
As genetic sequencing costs have plummeted, datasets with sizes previously un-thinkable have begun to appear. Such datasets present new opportunities to learn about evolutionary history, particularly via rare alleles that record the very recent past. However, beyond the computational challenges inherent in the analysis of many large-scale datasets, large population-genetic datasets present theoretical problems. In particular, the majority of population-genetic tools require the assumption that each mutant allele in the sample is the result of a single mutation (the "infinite sites" assumption), which is violated in large samples. Here, we present DR EVIL, a method for estimating mutation rates and recent demographic history from very large samples. DR EVIL avoids the infinite-sites assumption by using a diffusion approximation to a branching-process model with recurrent mutation. The branching-process approach limits the method to rare alleles, but, along with recent results, renders tractable likelihoods with recurrent mutation. We show that DR EVIL performs well in simulations and apply it to rare-variant data from a million haploid samples, identifying a signal of mutation-rate heterogeneity within commonly analyzed classes and predicting that in modern sample sizes, most rare variants at sites with high mutation rates represent the descendants of multiple mutation events.
Collapse
|
3
|
Kobren SN, Moldovan MA, Reimers R, Traviglia D, Li X, Barnum D, Veit A, Corona RI, Carvalho Neto GDV, Willett J, Berselli M, Ronchetti W, Nelson SF, Martinez-Agosto JA, Sherwood R, Krier J, Kohane IS, Sunyaev SR. Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.13.580158. [PMID: 38405764 PMCID: PMC10888768 DOI: 10.1101/2024.02.13.580158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Genomics for rare disease diagnosis has advanced at a rapid pace due to our ability to perform "N-of-1" analyses on individual patients with ultra-rare diseases. The increasing sizes of ultra-rare disease cohorts internationally newly enables cohort-wide analyses for new discoveries, but well-calibrated statistical genetics approaches for jointly analyzing these patients are still under development.1,2 The Undiagnosed Diseases Network (UDN) brings multiple clinical, research and experimental centers under the same umbrella across the United States to facilitate and scale N-of-1 analyses. Here, we present the first joint analysis of whole genome sequencing data of UDN patients across the network. We introduce new, well-calibrated statistical methods for prioritizing disease genes with de novo recurrence and compound heterozygosity. We also detect pathways enriched with candidate and known diagnostic genes. Our computational analysis, coupled with a systematic clinical review, recapitulated known diagnoses and revealed new disease associations. We further release a software package, RaMeDiES, enabling automated cross-analysis of deidentified sequenced cohorts for new diagnostic and research discoveries. Gene-level findings and variant-level information across the cohort are available in a public-facing browser (https://dbmi-bgm.github.io/udn-browser/). These results show that N-of-1 efforts should be supplemented by a joint genomic analysis across cohorts.
Collapse
Affiliation(s)
| | | | | | - Daniel Traviglia
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Xinyun Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT
| | | | - Alexander Veit
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Rosario I. Corona
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA
| | - George de V. Carvalho Neto
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA
| | - Julian Willett
- Department of Pathology and Laboratory Medicine, NewYork-Presbyterian Weill Cornell Medical Center, New York, NY
| | - Michele Berselli
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - William Ronchetti
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Stanley F. Nelson
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA
| | - Julian A. Martinez-Agosto
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA
| | - Richard Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Joel Krier
- Department of Genetics, Atrius Health, Boston, MA
| | - Isaac S. Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | | | - Shamil R. Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| |
Collapse
|
4
|
Whiffin N. Improving estimates of loss-of-function constraint for short genes. Nat Genet 2024; 56:1544-1545. [PMID: 39009668 DOI: 10.1038/s41588-024-01829-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Affiliation(s)
- Nicola Whiffin
- Big Data Institute and Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
5
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. Nat Genet 2024; 56:1632-1643. [PMID: 38977852 DOI: 10.1038/s41588-024-01820-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 05/29/2024] [Indexed: 07/10/2024]
Abstract
Measures of selective constraint on genes have been used for many applications, including clinical interpretation of rare coding variants, disease gene discovery and studies of genome evolution. However, widely used metrics are severely underpowered at detecting constraints for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. Here we developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease and other phenotypes, especially for short genes. Our estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve the estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | | | - Hakhamanesh Mostafavi
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Population Health, New York University, New York, NY, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Biology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
6
|
Sun KY, Bai X, Chen S, Bao S, Zhang C, Kapoor M, Backman J, Joseph T, Maxwell E, Mitra G, Gorovits A, Mansfield A, Boutkov B, Gokhale S, Habegger L, Marcketta A, Locke AE, Ganel L, Hawes A, Kessler MD, Sharma D, Staples J, Bovijn J, Gelfman S, Di Gioia A, Rajagopal VM, Lopez A, Varela JR, Alegre-Díaz J, Berumen J, Tapia-Conyer R, Kuri-Morales P, Torres J, Emberson J, Collins R, Cantor M, Thornton T, Kang HM, Overton JD, Shuldiner AR, Cremona ML, Nafde M, Baras A, Abecasis G, Marchini J, Reid JG, Salerno W, Balasubramanian S. A deep catalogue of protein-coding variation in 983,578 individuals. Nature 2024; 631:583-592. [PMID: 38768635 PMCID: PMC11254753 DOI: 10.1038/s41586-024-07556-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 05/10/2024] [Indexed: 05/22/2024]
Abstract
Rare coding variants that substantially affect function provide insights into the biology of a gene1-3. However, ascertaining the frequency of such variants requires large sample sizes4-8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
Collapse
Affiliation(s)
| | | | - Siying Chen
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Suying Bao
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Liron Ganel
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | - Jesús Alegre-Díaz
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Jaime Berumen
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Roberto Tapia-Conyer
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Pablo Kuri-Morales
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
- Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico
| | - Jason Torres
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan Emberson
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Rory Collins
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | - Mona Nafde
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | |
Collapse
|
7
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Bhat V, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat Genet 2024; 56:925-937. [PMID: 38658794 PMCID: PMC11669423 DOI: 10.1038/s41588-024-01726-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 03/21/2024] [Indexed: 04/26/2024]
Abstract
CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Martin Jankowiak
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Vineel Bhat
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, Quebec, Canada
- Faculté de Médecine, Université de Montréal, Montréal, Quebec, Canada
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A Cassa
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Luca Pinello
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Pathology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
8
|
Safadi A, Lovell SC, Doig AJ. Essentiality, protein-protein interactions and evolutionary properties are key predictors for identifying cancer-associated genes using machine learning. Sci Rep 2024; 14:9199. [PMID: 38649399 PMCID: PMC11035574 DOI: 10.1038/s41598-023-44118-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 10/04/2023] [Indexed: 04/25/2024] Open
Abstract
The distinctive nature of cancer as a disease prompts an exploration of the special characteristics the genes implicated in cancer exhibit. The identification of cancer-associated genes and their characteristics is crucial to further our understanding of this disease and enhanced likelihood of therapeutic drug targets success. However, the rate at which cancer genes are being identified experimentally is slow. Applying predictive analysis techniques, through the building of accurate machine learning models, is potentially a useful approach in enhancing the identification rate of these genes and their characteristics. Here, we investigated gene essentiality scores and found that they tend to be higher for cancer-associated genes compared to other protein-coding human genes. We built a dataset of extended gene properties linked to essentiality and used it to train a machine-learning model; this model reached 89% accuracy and > 0.85 for the Area Under Curve (AUC). The model showed that essentiality, evolutionary-related properties, and properties arising from protein-protein interaction networks are particularly effective in predicting cancer-associated genes. We were able to use the model to identify potential candidate genes that have not been previously linked to cancer. Prioritising genes that score highly by our methods could aid scientists in their cancer genes research.
Collapse
Affiliation(s)
- Amro Safadi
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9PT, UK
| | - Simon C Lovell
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9PT, UK
| | - Andrew J Doig
- Division of Neuroscience, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9BL, UK.
| |
Collapse
|
9
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.19.541520. [PMID: 37292653 PMCID: PMC10245655 DOI: 10.1101/2023.05.19.541520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ∼25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford CA
| | | | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University, Stanford CA
- Department of Biology, Stanford University, Stanford CA
| |
Collapse
|
10
|
Wang B, Vartak R, Zaltsman Y, Naing ZZC, Hennick KM, Polacco BJ, Bashir A, Eckhardt M, Bouhaddou M, Xu J, Sun N, Lasser MC, Zhou Y, McKetney J, Guiley KZ, Chan U, Kaye JA, Chadha N, Cakir M, Gordon M, Khare P, Drake S, Drury V, Burke DF, Gonzalez S, Alkhairy S, Thomas R, Lam S, Morris M, Bader E, Seyler M, Baum T, Krasnoff R, Wang S, Pham P, Arbalaez J, Pratt D, Chag S, Mahmood N, Rolland T, Bourgeron T, Finkbeiner S, Swaney DL, Bandyopadhay S, Ideker T, Beltrao P, Willsey HR, Obernier K, Nowakowski TJ, Hüttenhain R, State MW, Willsey AJ, Krogan NJ. A foundational atlas of autism protein interactions reveals molecular convergence. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.03.569805. [PMID: 38076945 PMCID: PMC10705567 DOI: 10.1101/2023.12.03.569805] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Translating high-confidence (hc) autism spectrum disorder (ASD) genes into viable treatment targets remains elusive. We constructed a foundational protein-protein interaction (PPI) network in HEK293T cells involving 100 hcASD risk genes, revealing over 1,800 PPIs (87% novel). Interactors, expressed in the human brain and enriched for ASD but not schizophrenia genetic risk, converged on protein complexes involved in neurogenesis, tubulin biology, transcriptional regulation, and chromatin modification. A PPI map of 54 patient-derived missense variants identified differential physical interactions, and we leveraged AlphaFold-Multimer predictions to prioritize direct PPIs and specific variants for interrogation in Xenopus tropicalis and human forebrain organoids. A mutation in the transcription factor FOXP1 led to reconfiguration of DNA binding sites and altered development of deep cortical layer neurons in forebrain organoids. This work offers new insights into molecular mechanisms underlying ASD and describes a powerful platform to develop and test therapeutic strategies for many genetically-defined conditions.
Collapse
|
11
|
Ravichandran P, Parsana P, Keener R, Hansen KD, Battle A. Aggregation of recount3 RNA-seq data improves inference of consensus and tissue-specific gene co-expression networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576447. [PMID: 38328080 PMCID: PMC10849507 DOI: 10.1101/2024.01.20.576447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Background Gene co-expression networks (GCNs) describe relationships among expressed genes key to maintaining cellular identity and homeostasis. However, the small sample size of typical RNA-seq experiments which is several orders of magnitude fewer than the number of genes is too low to infer GCNs reliably. recount3, a publicly available dataset comprised of 316,443 uniformly processed human RNA-seq samples, provides an opportunity to improve power for accurate network reconstruction and obtain biological insight from the resulting networks. Results We compared alternate aggregation strategies to identify an optimal workflow for GCN inference by data aggregation and inferred three consensus networks: a universal network, a non-cancer network, and a cancer network in addition to 27 tissue context-specific networks. Central network genes from our consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, whereas central context-specific network genes included tissue-specific transcription factors and factorization based on the hubs led to clustering of related tissue contexts. We discovered that annotations corresponding to context-specific networks inferred from aggregated data were enriched for trait heritability beyond known functional genomic annotations and were significantly more enriched when we aggregated over a larger number of samples. Conclusion This study outlines best practices for network GCN inference and evaluation by data aggregation. We recommend estimating and regressing confounders in each data set before aggregation and prioritizing large sample size studies for GCN reconstruction. Increased statistical power in inferring context-specific networks enabled the derivation of variant annotations that were enriched for concordant trait heritability independent of functional genomic annotations that are context-agnostic. While we observed strictly increasing held-out log-likelihood with data aggregation, we noted diminishing marginal improvements. Future directions aimed at alternate methods for estimating confounders and integrating orthogonal information from modalities such as Hi-C and ChIP-seq can further improve GCN inference.
Collapse
Affiliation(s)
| | - Princy Parsana
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Kaspar D Hansen
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Biostatistics, Johns Hopkins School of Public Health, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA
- Data Science and AI Institute, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
12
|
Gao Z. Unveiling recent and ongoing adaptive selection in human populations. PLoS Biol 2024; 22:e3002469. [PMID: 38236800 PMCID: PMC10796035 DOI: 10.1371/journal.pbio.3002469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open
Abstract
Genome-wide scans for signals of selection have become a routine part of the analysis of population genomic variation datasets and have resulted in compelling evidence of selection during recent human evolution. This Essay spotlights methodological innovations that have enabled the detection of selection over very recent timescales, even in contemporary human populations. By harnessing large-scale genomic and phenotypic datasets, these new methods use different strategies to uncover connections between genotype, phenotype, and fitness. This Essay outlines the rationale and key findings of each strategy, discusses challenges in interpretation, and describes opportunities to improve detection and understanding of ongoing selection in human populations.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
13
|
Zhao Y, Zhong G, Hagen J, Pan H, Chung WK, Shen Y. A probabilistic graphical model for estimating selection coefficient of missense variants from human population sequence data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.11.23299809. [PMID: 38168397 PMCID: PMC10760286 DOI: 10.1101/2023.12.11.23299809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Accurately predicting the effect of missense variants is a central problem in interpretation of genomic variation. Commonly used computational methods does not capture the quantitative impact on fitness in populations. We developed MisFit to estimate missense fitness effect using biobank-scale human population genome data. MisFit jointly models the effect at molecular level ( d ) and population level (selection coefficient, s ), assuming that in the same gene, missense variants with similar d have similar s . MisFit is a probabilistic graphical model that integrates deep neural network components and population genetics models efficiently with inductive bias based on biological causality of variant effect. We trained it by maximizing probability of observed allele counts in 236,017 European individuals. We show that s is informative in predicting frequency across ancestries and consistent with the fraction of de novo mutations given s . Finally, MisFit outperforms previous methods in prioritizing missense variants in individuals with neurodevelopmental disorders.
Collapse
Affiliation(s)
- Yige Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- The Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY 10032
| | - Guojie Zhong
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- The Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY 10032
| | - Jake Hagen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032
| | - Hongbing Pan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032
| | - Wendy K. Chung
- Department of Pediatrics, Boston Children’s Hospital and Harvard Medical School, Boston, MA 02115
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY 10032
| |
Collapse
|
14
|
Kyriazis CC, Robinson JA, Lohmueller KE. Using Computational Simulations to Model Deleterious Variation and Genetic Load in Natural Populations. Am Nat 2023; 202:737-752. [PMID: 38033186 PMCID: PMC10897732 DOI: 10.1086/726736] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
AbstractDeleterious genetic variation is abundant in wild populations, and understanding the ecological and conservation implications of such variation is an area of active research. Genomic methods are increasingly used to quantify the impacts of deleterious variation in natural populations; however, these approaches remain limited by an inability to accurately predict the selective and dominance effects of mutations. Computational simulations of deleterious variation offer a complementary tool that can help overcome these limitations, although such approaches have yet to be widely employed. In this perspective article, we aim to encourage ecological and conservation genomics researchers to adopt greater use of computational simulations to aid in deepening our understanding of deleterious variation in natural populations. We first provide an overview of the components of a simulation of deleterious variation, describing the key parameters involved in such models. Next, we discuss several approaches for validating simulation models. Finally, we compare and validate several recently proposed deleterious mutation models, demonstrating that models based on estimates of selection parameters from experimental systems are biased toward highly deleterious mutations. We describe a new model that is supported by multiple orthogonal lines of evidence and provide example scripts for implementing this model (https://github.com/ckyriazis/simulations_review).
Collapse
Affiliation(s)
- Christopher C. Kyriazis
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles; Los Angeles, CA, USA
| | - Jacqueline A. Robinson
- Institute for Human Genetics, University of California, San Francisco; San Francisco, CA, USA
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles; Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, USA
| |
Collapse
|
15
|
Seplyarskiy V, Koch EM, Lee DJ, Lichtman JS, Luan HH, Sunyaev SR. A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription. Nat Genet 2023; 55:2235-2242. [PMID: 38036792 PMCID: PMC11348951 DOI: 10.1038/s41588-023-01562-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/06/2023] [Indexed: 12/02/2023]
Abstract
De novo mutations occur at substantially different rates depending on genomic location, sequence context and DNA strand. The success of methods to estimate selection intensity, infer demographic history and map rare disease genes, depends strongly on assumptions about the local mutation rate. Here we present Roulette, a genome-wide mutation rate model at basepair resolution that incorporates known determinants of local mutation rate. Roulette is shown to be more accurate than existing models. We use Roulette to refine the estimates of population growth within Europe by incorporating the full range of human mutation rates. The analysis of significant deviations from the model predictions revealed a tenfold increase in mutation rate in nearly all genes transcribed by polymerase III (Pol III), suggesting a new mutagenic mechanism. We also detected an elevated mutation rate within transcription factor binding sites restricted to sites actively used in testis and residing in promoters.
Collapse
Affiliation(s)
- Vladimir Seplyarskiy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Evan M Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Daniel J Lee
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Joshua S Lichtman
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Harding H Luan
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
16
|
Sun KY, Bai X, Chen S, Bao S, Kapoor M, Zhang C, Backman J, Joseph T, Maxwell E, Mitra G, Gorovits A, Mansfield A, Boutkov B, Gokhale S, Habegger L, Marcketta A, Locke A, Kessler MD, Sharma D, Staples J, Bovijn J, Gelfman S, Gioia AD, Rajagopal V, Lopez A, Varela JR, Alegre J, Berumen J, Tapia-Conyer R, Kuri-Morales P, Torres J, Emberson J, Collins R, Cantor M, Thornton T, Kang HM, Overton J, Shuldiner AR, Cremona ML, Nafde M, Baras A, Abecasis G, Marchini J, Reid JG, Salerno W, Balasubramanian S. A deep catalog of protein-coding variation in 985,830 individuals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.09.539329. [PMID: 37214792 PMCID: PMC10197621 DOI: 10.1101/2023.05.09.539329] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.
Collapse
Affiliation(s)
| | | | - Siying Chen
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Suying Bao
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | - Adam Locke
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | - Jesus Alegre
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Jaime Berumen
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Roberto Tapia-Conyer
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Pablo Kuri-Morales
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Jason Torres
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan Emberson
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Rory Collins
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | | | | | - Mona Nafde
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | |
Collapse
|
17
|
Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the discrete-time Wright-Fisher model to biobank-scale datasets. Genetics 2023; 225:iyad168. [PMID: 37724741 PMCID: PMC10627256 DOI: 10.1093/genetics/iyad168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/01/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open
Abstract
The discrete-time Wright-Fisher (DTWF) model and its diffusion limit are central to population genetics. These models can describe the forward-in-time evolution of allele frequencies in a population resulting from genetic drift, mutation, and selection. Computing likelihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large samples or in the presence of strong selection. Existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here, we present a scalable algorithm that approximates the DTWF model with provably bounded error. Our approach relies on two key observations about the DTWF model. The first is that transition probabilities under the model are approximately sparse. The second is that transition distributions for similar starting allele frequencies are extremely close as distributions. Together, these observations enable approximate matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the tens of millions, paving the way for rigorous biobank-scale inference. Finally, we use our results to estimate the impact of larger samples on estimating selection coefficients for loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
Collapse
Affiliation(s)
- Jeffrey P Spence
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Tony Zeng
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | | | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
18
|
Brown BC, Morris JA, Lappalainen T, Knowles DA. Large-scale causal discovery using interventional data sheds light on the regulatory network architecture of blood traits. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.13.562293. [PMID: 37905013 PMCID: PMC10614812 DOI: 10.1101/2023.10.13.562293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Inference of directed biological networks is an important but notoriously challenging problem. We introduce inverse sparse regression (inspre), an approach to learning causal networks that leverages large-scale intervention-response data. Applied to 788 genes from the genome-wide perturb-seq dataset, inspre helps elucidate the network architecture of blood traits.
Collapse
Affiliation(s)
- Brielin C. Brown
- New York Genome Center, New York, NY, USA
- Data Science Institute, Columbia University, New York, NY, USA
| | | | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Systems Biology, Columbia University, New York, NY
| | - David A. Knowles
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY
- Department of Computer Science, Columbia University, New York, NY
| |
Collapse
|
19
|
Vance Z, McLysaght A. Ohnologs and SSD Paralogs Differ in Genomic and Expression Features Related to Dosage Constraints. Genome Biol Evol 2023; 15:evad174. [PMID: 37776514 PMCID: PMC10563793 DOI: 10.1093/gbe/evad174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 09/21/2023] [Accepted: 09/26/2023] [Indexed: 10/02/2023] Open
Abstract
Gene duplication is recognized as a critical process in genome evolution; however, many questions about this process remain unanswered. Although gene duplicability has been observed to differ by duplication mechanism and evolutionary rate, there is so far no broad characterization of its determinants. Many features correlate with this difference in duplicability; however, our ability to exploit these observations to advance our understanding of the role of duplication in evolution is hampered by limitations within existing work. In particular, the existence of methodological differences across studies impedes meaningful comparison. Here, we use consistent definitions of duplicability in the human lineage to explore these associations, allow resolution of the impact of confounding factors, and define the overall relevance of individual features. Using a classifier approach and controlling for the confounding effect of duplicate longevity, we find a subset of gene features important in differentiating genes duplicable by small-scale duplication from those duplicable by whole-genome duplication, revealing critical roles for gene dosage and expression costs in duplicability. We further delve into patterns of functional enrichment and find a lack of constraint on duplicate retention in any context for genes duplicable by small-scale duplication.
Collapse
Affiliation(s)
- Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
20
|
Liu Z, Huang YF. Deep multiple-instance learning accurately predicts gene haploinsufficiency and deletion pathogenicity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.29.555384. [PMID: 37693607 PMCID: PMC10491176 DOI: 10.1101/2023.08.29.555384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Copy number losses (deletions) are a major contributor to the etiology of severe genetic disorders. Although haploinsufficient genes play a critical role in deletion pathogenicity, current methods for deletion pathogenicity prediction fail to integrate multiple lines of evidence for haploinsufficiency at the gene level, limiting their power to pinpoint deleterious deletions associated with genetic disorders. Here we introduce DosaCNV, a deep multiple-instance learning framework that, for the first time, models deletion pathogenicity jointly with gene haploinsufficiency. By integrating over 30 gene-level features potentially predictive of haploinsufficiency, DosaCNV shows unmatched performance in prioritizing pathogenic deletions associated with a broad spectrum of genetic disorders. Furthermore, DosaCNV outperforms existing methods in predicting gene haploinsufficiency even though it is not trained on known haploinsufficient genes. Finally, DosaCNV leverages a state-of-the-art technique to quantify the contributions of individual gene-level features to haploinsufficiency, allowing for human-understandable explanations of model predictions. Altogether, DosaCNV is a powerful computational tool for both fundamental and translational research.
Collapse
Affiliation(s)
- Zhihan Liu
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Molecular, Cellular, and Integrative Biosciences Program, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
21
|
Walton NA, Nguyen HH, Procknow SS, Johnson D, Anzelmi A, Jay PY. Repurposing Normal Chromosomal Microarray Data to Harbor Genetic Insights into Congenital Heart Disease. BIOLOGY 2023; 12:1290. [PMID: 37887000 PMCID: PMC10604103 DOI: 10.3390/biology12101290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/08/2023] [Accepted: 09/08/2023] [Indexed: 10/28/2023]
Abstract
About 15% of congenital heart disease (CHD) patients have a known pathogenic copy number variant. The majority of their chromosomal microarray (CMA) tests are deemed normal. Diagnostic interpretation typically ignores microdeletions smaller than 100 kb. We hypothesized that unreported microdeletions are enriched for CHD genes. We analyzed "normal" CMAs of 1762 patients who were evaluated at a pediatric referral center, of which 319 (18%) had CHD. Using CMAs from monozygotic twins or replicates from the same individual, we established a size threshold based on probe count for the reproducible detection of small microdeletions. Genes in the microdeletions were sequentially filtered by their nominal association with a CHD diagnosis, the expression level in the fetal heart, and the deleteriousness of a loss-of-function mutation. The subsequent enrichment for CHD genes was assessed using the presence of known or potentially novel genes implicated by a large whole-exome sequencing study of CHD. The unreported microdeletions were modestly enriched for both known CHD genes and those of unknown significance identified using their de novo mutation in CHD patients. Our results show that readily available "normal" CMA data can be a fruitful resource for genetic discovery and that smaller deletions should receive more attention in clinical evaluation.
Collapse
Affiliation(s)
- Nephi A. Walton
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Hoang H. Nguyen
- Department of Pediatrics, UT Southwestern Medical Center, Dallas, TX 75390, USA;
| | - Sara S. Procknow
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Darren Johnson
- Genomic Medicine Institute, Geisinger, Danville, PA 17822, USA
| | - Alexander Anzelmi
- Department of Medicine, Thomas Jefferson University Hospitals, Philadelphia, PA 19107, USA
| | - Patrick Y. Jay
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
22
|
LaPolice TM, Huang YF. An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data. BMC Bioinformatics 2023; 24:347. [PMID: 37723435 PMCID: PMC10506225 DOI: 10.1186/s12859-023-05481-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 09/13/2023] [Indexed: 09/20/2023] Open
Abstract
BACKGROUND The ability to accurately predict essential genes intolerant to loss-of-function (LOF) mutations can dramatically improve the identification of disease-associated genes. Recently, there have been numerous computational methods developed to predict human essential genes from population genomic data. While the existing methods are highly predictive of essential genes of long length, they have limited power in pinpointing short essential genes due to the sparsity of polymorphisms in the human genome. RESULTS Motivated by the premise that population and functional genomic data may provide complementary evidence for gene essentiality, here we present an evolution-based deep learning model, DeepLOF, to predict essential genes in an unsupervised manner. Unlike previous population genetic methods, DeepLOF utilizes a novel deep learning framework to integrate both population and functional genomic data, allowing us to pinpoint short essential genes that can hardly be predicted from population genomic data alone. Compared with previous methods, DeepLOF shows unmatched performance in predicting ClinGen haploinsufficient genes, mouse essential genes, and essential genes in human cell lines. Notably, at a false positive rate of 5%, DeepLOF detects 50% more ClinGen haploinsufficient genes than previous methods. Furthermore, DeepLOF discovers 109 novel essential genes that are too short to be identified by previous methods. CONCLUSION The predictive power of DeepLOF shows that it is a compelling computational method to aid in the discovery of essential genes.
Collapse
Affiliation(s)
- Troy M LaPolice
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA.
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, PA, 16802, USA.
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA.
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
23
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.08.23295253. [PMID: 37732177 PMCID: PMC10508837 DOI: 10.1101/2023.09.08.23295253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I. Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, QC H1T 1C8, Canada
- Faculté de Médecine, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A. Cassa
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard I. Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
24
|
Foreman J, Perrett D, Mazaika E, Hunt SE, Ware JS, Firth HV. DECIPHER: Improving Genetic Diagnosis Through Dynamic Integration of Genomic and Clinical Data. Annu Rev Genomics Hum Genet 2023; 24:151-176. [PMID: 37285546 PMCID: PMC7615097 DOI: 10.1146/annurev-genom-102822-100509] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
DECIPHER (Database of Genomic Variation and Phenotype in Humans Using Ensembl Resources) shares candidate diagnostic variants and phenotypic data from patients with genetic disorders to facilitate research and improve the diagnosis, management, and therapy of rare diseases. The platform sits at the boundary between genomic research and the clinical community. DECIPHER aims to ensure that the most up-to-date data are made rapidly available within its interpretation interfaces to improve clinical care. Newly integrated cardiac case-control data that provide evidence of gene-disease associations and inform variant interpretation exemplify this mission. New research resources are presented in a format optimized for use by a broad range of professionals supporting the delivery of genomic medicine. The interfaces within DECIPHER integrate and contextualize variant and phenotypic data, helping to determine a robust clinico-molecular diagnosis for rare-disease patients, which combines both variant classification and clinical fit. DECIPHER supports discovery research, connecting individuals within the rare-disease community to pursue hypothesis-driven research.
Collapse
Affiliation(s)
- Julia Foreman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Daniel Perrett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Erica Mazaika
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
| | - James S Ware
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
- Royal Brompton and Harefield Hospitals, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, United Kingdom
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom;
| |
Collapse
|
25
|
Krill-Burger JM, Dempster JM, Borah AA, Paolella BR, Root DE, Golub TR, Boehm JS, Hahn WC, McFarland JM, Vazquez F, Tsherniak A. Partial gene suppression improves identification of cancer vulnerabilities when CRISPR-Cas9 knockout is pan-lethal. Genome Biol 2023; 24:192. [PMID: 37612728 PMCID: PMC10464129 DOI: 10.1186/s13059-023-03020-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 07/21/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND Hundreds of functional genomic screens have been performed across a diverse set of cancer contexts, as part of efforts such as the Cancer Dependency Map, to identify gene dependencies-genes whose loss of function reduces cell viability or fitness. Recently, large-scale screening efforts have shifted from RNAi to CRISPR-Cas9, due to superior efficacy and specificity. However, many effective oncology drugs only partially inhibit their protein targets, leading us to question whether partial suppression of genes using RNAi could reveal cancer vulnerabilities that are missed by complete knockout using CRISPR-Cas9. Here, we compare CRISPR-Cas9 and RNAi dependency profiles of genes across approximately 400 matched cancer cell lines. RESULTS We find that CRISPR screens accurately identify more gene dependencies per cell line, but the majority of each cell line's dependencies are part of a set of 1867 genes that are shared dependencies across the entire collection (pan-lethals). While RNAi knockdown of about 30% of these genes is also pan-lethal, approximately 50% have selective dependency patterns across cell lines, suggesting they could still be cancer vulnerabilities. The accuracy of the unique RNAi selectivity is supported by associations to multi-omics profiles, drug sensitivity, and other expected co-dependencies. CONCLUSIONS Incorporating RNAi data for genes that are pan-lethal knockouts facilitates the discovery of a wider range of gene targets than could be detected using the CRISPR dataset alone. This can aid in the interpretation of contrasting results obtained from CRISPR and RNAi screens and reinforce the importance of partial gene suppression methods in building a cancer dependency map.
Collapse
Affiliation(s)
| | | | - Ashir A Borah
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - David E Root
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Todd R Golub
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Jesse S Boehm
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - William C Hahn
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | - Francisca Vazquez
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Dana-Farber Cancer Institute, Boston, MA, USA.
| | | |
Collapse
|
26
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. RESEARCH SQUARE 2023:rs.3.rs-3012879. [PMID: 37398424 PMCID: PMC10312940 DOI: 10.21203/rs.3.rs-3012879/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, s het . Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford CA
| | | | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University, Stanford CA
- Department of Biology, Stanford University, Stanford CA
| |
Collapse
|
27
|
Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna LFK, Sundaram L, Wu Y, Adhikari A, Field Y, Chen C, Batzoglou S, Aguet F, Lemire G, Reimers R, Balick D, Janiak MC, Kuhlwilm M, Orkin JD, Manu S, Valenzuela A, Bergman J, Rousselle M, Silva FE, Agueda L, Blanc J, Gut M, de Vries D, Goodhead I, Harris RA, Raveendran M, Jensen A, Chuma IS, Horvath JE, Hvilsom C, Juan D, Frandsen P, de Melo FR, Bertuol F, Byrne H, Sampaio I, Farias I, do Amaral JV, Messias M, da Silva MNF, Trivedi M, Rossi R, Hrbek T, Andriaholinirina N, Rabarivola CJ, Zaramody A, Jolly CJ, Phillips-Conroy J, Wilkerson G, Abee C, Simmons JH, Fernandez-Duque E, Kanthaswamy S, Shiferaw F, Wu D, Zhou L, Shao Y, Zhang G, Keyyu JD, Knauf S, Le MD, Lizano E, Merker S, Navarro A, Bataillon T, Nadler T, Khor CC, Lee J, Tan P, Lim WK, Kitchener AC, Zinner D, Gut I, Melin A, Guschanski K, Schierup MH, Beck RMD, Umapathy G, Roos C, Boubli JP, Lek M, Sunyaev S, O'Donnell-Luria A, Rehm HL, Xu J, Rogers J, Marques-Bonet T, Farh KKH. The landscape of tolerated genetic variation in humans and primates. Science 2023; 380:eabn8153. [PMID: 37262156 DOI: 10.1126/science.abn8197] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 03/22/2023] [Indexed: 06/03/2023]
Abstract
Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole-genome sequencing data for 809 individuals from 233 primate species and identified 4.3 million common protein-altering variants with orthologs in humans. We show that these variants can be inferred to have nondeleterious effects in humans based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases.
Collapse
Affiliation(s)
- Hong Gao
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Tobias Hamp
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Jeffrey Ede
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Joshua G Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Jeremy McRae
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
| | - Yanshen Yang
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | | | - Petko P Fiziev
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Lukas F K Kuderna
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Laksshman Sundaram
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Yibing Wu
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Aashish Adhikari
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Yair Field
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Chen Chen
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Serafim Batzoglou
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Francois Aguet
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Gabrielle Lemire
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Rebecca Reimers
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Daniel Balick
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Mareike C Janiak
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Martin Kuhlwilm
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Department of Evolutionary Anthropology, University of Vienna, Djerassiplatz 1, 1030 Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, 1030 Vienna, Austria
| | - Joseph D Orkin
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Département d'anthropologie, Université de Montréal, 3150 Jean-Brillant, Montréal, QC H3T 1N8, Canada
| | - Shivakumara Manu
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Alejandro Valenzuela
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Juraj Bergman
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark
- Section for Ecoinformatics & Biodiversity, Department of Biology, Aarhus University, 8000 Aarhus, Denmark
| | | | - Felipe Ennes Silva
- Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development, Estrada da Bexiga 2584, Tefé, Amazonas, CEP 69553-225, Brazil
- Evolutionary Biology and Ecology (EBE), Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Av. Franklin D. Roosevelt 50, CP 160/12, B-1050 Brussels, Belgium
| | - Lidia Agueda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
| | - Julie Blanc
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
| | - Dorien de Vries
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Ian Goodhead
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - R Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, SE-75236 Uppsala, Sweden
| | | | - Julie E Horvath
- North Carolina Museum of Natural Sciences, Raleigh, NC 27601, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, NC 27707, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
- Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | - David Juan
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | | | - Fabrício Bertuol
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil
| | - Hazel Byrne
- Department of Anthropology, University of Utah, Salt Lake City, UT 84102, USA
| | - Iracilda Sampaio
- Universidade Federal do Para, Guamá, Belém - PA, 66075-110, Brazil
| | - Izeni Farias
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil
| | - João Valsecchi do Amaral
- Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development, Tefé, Amazonas, 69553-225, Brazil
- Rede de Pesquisa para Estudos sobre Diversidade, Conservação e Uso da Fauna na Amazônia - RedeFauna, Manaus, Amazonas, 69080-900, Brazil
- Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica - ComFauna, Iquitos, Loreto, 16001, Peru
| | - Mariluce Messias
- Universidade Federal de Rondonia, Porto Velho, Rondônia, 78900-000, Brazil
- PPGREN - Programa de Pós-Graduação "Conservação e Uso dos Recursos Naturais and BIONORTE - Programa de Pós-Graduação em Biodiversidade e Biotecnologia da Rede BIONORTE, Universidade Federal de Rondonia, Porto Velho, Rondônia, 78900-000, Brazil
| | - Maria N F da Silva
- Instituto Nacional de Pesquisas da Amazonia, Petrópolis, Manaus - AM, 69067-375, Brazil
| | - Mihir Trivedi
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Rogerio Rossi
- Universidade Federal do Mato Grosso, Boa Esperança, Cuiabá - MT, 78060-900, Brazil
| | - Tomas Hrbek
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil
- Department of Biology, Trinity University, San Antonio, TX 78212, USA
| | - Nicole Andriaholinirina
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar
| | - Clément J Rabarivola
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar
| | - Alphonse Zaramody
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar
| | | | | | - Gregory Wilkerson
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Christian Abee
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Joe H Simmons
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Eduardo Fernandez-Duque
- Yale University, New Haven, CT 06520, USA
- Universidad Nacional de Formosa, Argentina Fundacion ECO, Formosa, Argentina
| | | | - Fekadu Shiferaw
- Guinea Worm Eradication Program, The Carter Center Ethiopia, PoB 16316, Addis Ababa 1000, Ethiopia
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Guojie Zhang
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
- Liangzhu Laboratory, Zhejiang University Medical Center, 1369 West Wenyi Road, Hangzhou 311121, China
- Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Shangcheng District, Hangzhou 310006, China
| | - Julius D Keyyu
- Tanzania Wildlife Research Institute (TAWIRI), Head Office, P.O. Box 661, Arusha, Tanzania
| | - Sascha Knauf
- Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, 17493 Greifswald - Insei Riems, Germany
| | - Minh D Le
- Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University, Hanoi 100000, Vietnam
| | - Esther Lizano
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010 Barcelona, Spain
| | - Stefan Merker
- Department of Zoology, State Museum of Natural History Stuttgart, 70191 Stuttgart, Germany
| | - Arcadi Navarro
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Av. Doctor Aiguader, N88, 08003 Barcelona, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, C. Wellington 30, 08005 Barcelona, Spain
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark
| | - Tilo Nadler
- Cuc Phuong Commune, Nho Quan District, Ninh Binh Province 430000, Vietnam
| | - Chiea Chuen Khor
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
| | - Jessica Lee
- Mandai Nature, 80 Mandai Lake Road, Singapore 729826, Republic of Singapore
| | - Patrick Tan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore 168582, Republic of Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore 168582, Republic of Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore 168582, Republic of Singapore
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums Scotland, Chambers Street, Edinburgh EH1 1JF, UK
- School of Geosciences, University of Edinburgh, Drummond Street, Edinburgh EH8 9XP, UK
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research, 37077 Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen, 37077 Göttingen, Germany
- Leibniz Science Campus Primate Cognition, 37077 Göttingen, Germany
| | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra, Pg. Luís Companys 23, 08010 Barcelona, Spain
| | - Amanda Melin
- Department of Anthropology & Archaeology, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada
- Department of Medical Genetics, 3330 Hospital Drive NW, HMRB 202, Calgary, AB T2N 4N1, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada
| | - Katerina Guschanski
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, SE-75236 Uppsala, Sweden
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh EH8 9XP, UK
| | | | - Robin M D Beck
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Govindhaswamy Umapathy
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
| | - Jean P Boubli
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Monkol Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
| | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jinbo Xu
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Kyle Kai-How Farh
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| |
Collapse
|
28
|
Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the Discrete-time Wright Fisher model to biobank-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.19.541517. [PMID: 37293115 PMCID: PMC10245735 DOI: 10.1101/2023.05.19.541517] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
Collapse
Affiliation(s)
| | - Tony Zeng
- Department of Genetics, Stanford University
| | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University
- Department of Biology, Stanford University
| |
Collapse
|
29
|
Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich A, Fiziev P, Kuderna L, Sundaram L, Wu Y, Adhikari A, Field Y, Chen C, Batzoglou S, Aguet F, Lemire G, Reimers R, Balick D, Janiak MC, Kuhlwilm M, Orkin JD, Manu S, Valenzuela A, Bergman J, Rouselle M, Silva FE, Agueda L, Blanc J, Gut M, de Vries D, Goodhead I, Harris RA, Raveendran M, Jensen A, Chuma IS, Horvath J, Hvilsom C, Juan D, Frandsen P, de Melo FR, Bertuol F, Byrne H, Sampaio I, Farias I, do Amaral JV, Messias M, da Silva MNF, Trivedi M, Rossi R, Hrbek T, Andriaholinirina N, Rabarivola CJ, Zaramody A, Jolly CJ, Phillips-Conroy J, Wilkerson G, Abee C, Simmons JH, Fernandez-Duque E, Kanthaswamy S, Shiferaw F, Wu D, Zhou L, Shao Y, Zhang G, Keyyu JD, Knauf S, Le MD, Lizano E, Merker S, Navarro A, Batallion T, Nadler T, Khor CC, Lee J, Tan P, Lim WK, Kitchener AC, Zinner D, Gut I, Melin A, Guschanski K, Schierup MH, Beck RMD, Umapathy G, Roos C, Boubli JP, Lek M, Sunyaev S, O’Donnell A, Rehm H, Xu J, Rogers J, Marques-Bonet T, Kai-How Farh K. The landscape of tolerated genetic variation in humans and primates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.01.538953. [PMID: 37205491 PMCID: PMC10187174 DOI: 10.1101/2023.05.01.538953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole genome sequencing data for 809 individuals from 233 primate species, and identified 4.3 million common protein-altering variants with orthologs in human. We show that these variants can be inferred to have non-deleterious effects in human based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases. One Sentence Summary Deep learning classifier trained on 4.3 million common primate missense variants predicts variant pathogenicity in humans.
Collapse
Affiliation(s)
- Hong Gao
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Tobias Hamp
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Jeffrey Ede
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Joshua G. Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Jeremy McRae
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Boston, Massachusetts, 02142, USA
| | - Yanshen Yang
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Anastasia Dietrich
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Petko Fiziev
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Lukas Kuderna
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Laksshman Sundaram
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Yibing Wu
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Aashish Adhikari
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Yair Field
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Chen Chen
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Serafim Batzoglou
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Francois Aguet
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Gabrielle Lemire
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Boston, Massachusetts, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
| | - Rebecca Reimers
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
| | - Daniel Balick
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
| | - Mareike C. Janiak
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - Martin Kuhlwilm
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Department of Evolutionary Anthropology, University of Vienna; Djerassiplatz 1, 1030, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna; 1030, Vienna, Austria
| | - Joseph D. Orkin
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Département d’anthropologie, Université de Montréal; 3150 Jean-Brillant, Montréal, QC, H3T 1N8, Canada
| | - Shivakumara Manu
- Academy of Scientific and Innovative Research (AcSIR); Ghaziabad, 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology; Hyderabad, 500007, India
| | - Alejandro Valenzuela
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Juraj Bergman
- Bioinformatics Research Centre, Aarhus University; Aarhus, 8000, Denmark
- Section for Ecoinformatics & Biodiversity, Department of Biology, Aarhus University; Aarhus, 8000, Denmark
| | | | - Felipe Ennes Silva
- Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development; Estrada da Bexiga 2584, Tefé, Amazonas, CEP 69553-225, Brazil
- Faculty of Sciences, Department of Organismal Biology, Unit of Evolutionary Biology and Ecology, Université Libre de Bruxelles (ULB); Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium
| | - Lidia Agueda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Julie Blanc
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Dorien de Vries
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - Ian Goodhead
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - R. Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine; Houston, Texas, 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine; Houston, Texas, 77030, USA
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University; SE-75236, Uppsala, Sweden
| | | | - Julie Horvath
- North Carolina Museum of Natural Sciences; Raleigh, North Carolina, 27601, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University; Durham, North Carolina , 27707, USA
- Department of Biological Sciences, North Carolina State University; Raleigh, North Carolina , 27695, USA
- Department of Evolutionary Anthropology, Duke University; Durham, North Carolina , 27708, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | - David Juan
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | | | - Fabricio Bertuol
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL); Manaus, Amazonas, 69080-900, Brazil
| | - Hazel Byrne
- Department of Anthropology, University of Utah; Salt Lake City, Utah, 84102, USA
| | - Iracilda Sampaio
- Universidade Federal do Para; Guamá, Belém - PA, 66075-110, Brazil
| | - Izeni Farias
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL); Manaus, Amazonas, 69080-900, Brazil
| | - João Valsecchi do Amaral
- Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development; Tefé, Amazonas, 69553-225, Brazil
- Rede de Pesquisa para Estudos sobre Diversidade, Conservação e Uso da Fauna na Amazônia – RedeFauna; Manaus, Amazonas, 69080-900, Brazil
- Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica – ComFauna; Iquitos, Loreto, 16001, Peru
| | - Mariluce Messias
- Universidade Federal de Rondonia; Porto Velho, Rondônia, 78900-000, Brazil
- PPGREN - Programa de Pós-Graduação “Conservação e Uso dos Recursos Naturais and BIONORTE - Programa de Pós-Graduação em Biodiversidade e Biotecnologia da Rede BIONORTE, Universidade Federal de Rondonia; Porto Velho, Rondônia, 78900-000, Brazil
| | - Maria N. F. da Silva
- Instituto Nacional de Pesquisas da Amazonia; Petrópolis, Manaus - AM, 69067-375, Brazil
| | - Mihir Trivedi
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology; Hyderabad, 500007, India
| | - Rogerio Rossi
- Universidade Federal do Mato Grosso; Boa Esperança, Cuiabá - MT, 78060-900, Brazil
| | - Tomas Hrbek
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL); Manaus, Amazonas, 69080-900, Brazil
- Department of Biology, Trinity University; San Antonio, Texas, 78212, USA
| | - Nicole Andriaholinirina
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga; Mahajanga, 401, Madagascar
| | - Clément J. Rabarivola
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga; Mahajanga, 401, Madagascar
| | - Alphonse Zaramody
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga; Mahajanga, 401, Madagascar
| | | | | | - Gregory Wilkerson
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center; Houston, Texas, 77030, USA
| | | | - Joe H. Simmons
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center; Houston, Texas, 77030, USA
| | - Eduardo Fernandez-Duque
- Yale University; New Haven, Connecticut, 06520, USA
- Universidad Nacional de Formosa, Argentina Fundacion ECO, Formosa, Argentina
| | | | | | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences; Kunming, Yunnan, 650223, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences; Kunming, Yunnan, 650223, China
| | - Guojie Zhang
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen; Copenhagen, DK-2100, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, 650223, China
- Liangzhu Laboratory, Zhejiang University Medical Center; 1369 West Wenyi Road, Hangzhou, 311121, China
- Women’s Hospital, School of Medicine, Zhejiang University; 1 Xueshi Road, Shangcheng District, Hangzhou, 310006, China
| | - Julius D. Keyyu
- Tanzania Wildlife Research Institute (TAWIRI), Head Office; P.O.Box 661, Arusha, Tanzania
| | - Sascha Knauf
- Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health; 17493 Greifswald - Isle of Riems, Germany
| | - Minh D. Le
- Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University; Hanoi, 100000, Vietnam
| | - Esther Lizano
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain; Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Stefan Merker
- Department of Zoology, State Museum of Natural History Stuttgart; 70191 Stuttgart, Germany
| | - Arcadi Navarro
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) and Universitat Pompeu Fabra, Pg. Luís Companys 23, Barcelona, 08010, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology; Av. Doctor Aiguader, N88, Barcelona, 08003, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation; C. Wellington 30, Barcelona, 08005, Spain
| | - Thomas Batallion
- Bioinformatics Research Centre, Aarhus University; Aarhus, 8000, Denmark
| | - Tilo Nadler
- Cuc Phuong Commune; Nho Quan District, Ninh Binh Province, 430000, Vietnam
| | - Chiea Chuen Khor
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
| | - Jessica Lee
- Mandai Nature; 80 Mandai Lake Road, Singapore 729826, Republic of Singapore
| | - Patrick Tan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM); Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School; Singapore 168582, Republic of Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM); Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School; Singapore 168582, Republic of Singapore
- SingHealth Duke-NUS Genomic Medicine Centre; Singapore 168582, Republic of Singapore
| | - Andrew C. Kitchener
- Department of Natural Sciences, National Museums Scotland; Chambers Street, Edinburgh, EH1 1JF, UK
- School of Geosciences, University of Edinburgh; Drummond Street, Edinburgh, EH8 9XP, UK
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research; 37077 Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen; 37077 Göttingen, Germany
| | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
- Universitat Pompeu Fabra, Pg. Luís Companys 23, Barcelona, 08010, Spain
| | - Amanda Melin
- Leibniz Science Campus Primate Cognition; 37077 Göttingen, Germany
- Department of Anthropology & Archaeology and Department of Medical Genetics
| | - Katerina Guschanski
- Department of Ecology and Genetics, Animal Ecology, Uppsala University; SE-75236, Uppsala, Sweden
- Alberta Children’s Hospital Research Institute; University of Calgary; 2500 University Dr NW T2N 1N4, Calgary, Alberta, Canada
| | | | - Robin M. D. Beck
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - Govindhaswamy Umapathy
- Academy of Scientific and Innovative Research (AcSIR); Ghaziabad, 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology; Hyderabad, 500007, India
| | - Christian Roos
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh; Edinburgh, EH8 9XP, UK
| | - Jean P. Boubli
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - Monkol Lek
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research; Kellnerweg 4, 37077 Göttingen, Germany
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
- Department of Genetics, Yale School of Medicine; New Haven, Connecticut, 06520, USA
| | - Anne O’Donnell
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Boston, Massachusetts, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Heidi Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Boston, Massachusetts, 02142, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School; Boston, Massachusetts, 02115, USA
| | - Jinbo Xu
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
- Toyota Technological Institute at Chicago; Chicago, Illinois, 60637, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine; Houston, Texas, 77030, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain; Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) and Universitat Pompeu Fabra, Pg. Luís Companys 23, Barcelona, 08010, Spain
| | - Kyle Kai-How Farh
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| |
Collapse
|
30
|
Barroso GV, Lohmueller KE. Inferring the mode and strength of ongoing selection. Genome Res 2023; 33:632-643. [PMID: 37055196 PMCID: PMC10234300 DOI: 10.1101/gr.276386.121] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/29/2023] [Indexed: 04/15/2023]
Abstract
Genome sequence data are no longer scarce. The UK Biobank alone comprises 200,000 individual genomes, with more on the way, leading the field of human genetics toward sequencing entire populations. Within the next decades, other model organisms will follow suit, especially domesticated species such as crops and livestock. Having sequences from most individuals in a population will present new challenges for using these data to improve health and agriculture in the pursuit of a sustainable future. Existing population genetic methods are designed to model hundreds of randomly sampled sequences but are not optimized for extracting the information contained in the larger and richer data sets that are beginning to emerge, with thousands of closely related individuals. Here we develop a new method called trio-based inference of dominance and selection (TIDES) that uses data from tens of thousands of family trios to make inferences about natural selection acting in a single generation. TIDES further improves on the state of the art by making no assumptions regarding demography, linkage, or dominance. We discuss how our method paves the way for studying natural selection from new angles.
Collapse
Affiliation(s)
- Gustavo V Barroso
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
31
|
Kyriazis CC, Beichman AC, Brzeski KE, Hoy SR, Peterson RO, Vucetich JA, Vucetich LM, Lohmueller KE, Wayne RK. Genomic Underpinnings of Population Persistence in Isle Royale Moose. Mol Biol Evol 2023; 40:msad021. [PMID: 36729989 PMCID: PMC9927576 DOI: 10.1093/molbev/msad021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 01/20/2023] [Accepted: 01/25/2023] [Indexed: 02/03/2023] Open
Abstract
Island ecosystems provide natural laboratories to assess the impacts of isolation on population persistence. However, most studies of persistence have focused on a single species, without comparisons to other organisms they interact with in the ecosystem. The case study of moose and gray wolves on Isle Royale allows for a direct contrast of genetic variation in isolated populations that have experienced dramatically differing population trajectories over the past decade. Whereas the Isle Royale wolf population recently declined nearly to extinction due to severe inbreeding depression, the moose population has thrived and continues to persist, despite having low genetic diversity and being isolated for ∼120 years. Here, we examine the patterns of genomic variation underlying the continued persistence of the Isle Royale moose population. We document high levels of inbreeding in the population, roughly as high as the wolf population at the time of its decline. However, inbreeding in the moose population manifests in the form of intermediate-length runs of homozygosity suggestive of historical inbreeding and purging, contrasting with the long runs of homozygosity observed in the smaller wolf population. Using simulations, we confirm that substantial purging has likely occurred in the moose population. However, we also document notable increases in genetic load, which could eventually threaten population viability over the long term. Overall, our results demonstrate a complex relationship between inbreeding, genetic diversity, and population viability that highlights the use of genomic datasets and computational simulation tools for understanding the factors enabling persistence in isolated populations.
Collapse
Affiliation(s)
- Christopher C Kyriazis
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA
| | | | - Kristin E Brzeski
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI
| | - Sarah R Hoy
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI
| | - Rolf O Peterson
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI
| | - John A Vucetich
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI
| | - Leah M Vucetich
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA
| | - Robert K Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA
| |
Collapse
|
32
|
Agarwal I, Fuller ZL, Myers SR, Przeworski M. Relating pathogenic loss-of-function mutations in humans to their evolutionary fitness costs. eLife 2023; 12:e83172. [PMID: 36648429 PMCID: PMC9937649 DOI: 10.7554/elife.83172] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 01/16/2023] [Indexed: 01/18/2023] Open
Abstract
Causal loss-of-function (LOF) variants for Mendelian and severe complex diseases are enriched in 'mutation intolerant' genes. We show how such observations can be interpreted in light of a model of mutation-selection balance and use the model to relate the pathogenic consequences of LOF mutations at present to their evolutionary fitness effects. To this end, we first infer posterior distributions for the fitness costs of LOF mutations in 17,318 autosomal and 679 X-linked genes from exome sequences in 56,855 individuals. Estimated fitness costs for the loss of a gene copy are typically above 1%; they tend to be largest for X-linked genes, whether or not they have a Y homolog, followed by autosomal genes and genes in the pseudoautosomal region. We compare inferred fitness effects for all possible de novo LOF mutations to those of de novo mutations identified in individuals diagnosed with one of six severe, complex diseases or developmental disorders. Probands carry an excess of mutations with estimated fitness effects above 10%; as we show by simulation, when sampled in the population, such highly deleterious mutations are typically only a couple of generations old. Moreover, the proportion of highly deleterious mutations carried by probands reflects the typical age of onset of the disease. The study design also has a discernible influence: a greater proportion of highly deleterious mutations is detected in pedigree than case-control studies, and for autism, in simplex than multiplex families and in female versus male probands. Thus, anchoring observations in human genetics to a population genetic model allows us to learn about the fitness effects of mutations identified by different mapping strategies and for different traits.
Collapse
Affiliation(s)
- Ipsita Agarwal
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
- Department of Statistics, University of OxfordOxfordUnited Kingdom
| | - Zachary L Fuller
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
| | - Simon R Myers
- Department of Statistics, University of OxfordOxfordUnited Kingdom
- The Wellcome Centre for Human Genetics, University of OxfordOxfordUnited Kingdom
| | - Molly Przeworski
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
- Department of Systems Biology, Columbia UniversityNew YorkUnited States
| |
Collapse
|
33
|
Molecular Landscape of Tourette's Disorder. Int J Mol Sci 2023; 24:ijms24021428. [PMID: 36674940 PMCID: PMC9865021 DOI: 10.3390/ijms24021428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 12/29/2022] [Accepted: 01/01/2023] [Indexed: 01/12/2023] Open
Abstract
Tourette's disorder (TD) is a highly heritable childhood-onset neurodevelopmental disorder and is caused by a complex interplay of multiple genetic and environmental factors. Yet, the molecular mechanisms underlying the disorder remain largely elusive. In this study, we used the available omics data to compile a list of TD candidate genes, and we subsequently conducted tissue/cell type specificity and functional enrichment analyses of this list. Using genomic data, we also investigated genetic sharing between TD and blood and cerebrospinal fluid (CSF) metabolite levels. Lastly, we built a molecular landscape of TD through integrating the results from these analyses with an extensive literature search to identify the interactions between the TD candidate genes/proteins and metabolites. We found evidence for an enriched expression of the TD candidate genes in four brain regions and the pituitary. The functional enrichment analyses implicated two pathways ('cAMP-mediated signaling' and 'Endocannabinoid Neuronal Synapse Pathway') and multiple biological functions related to brain development and synaptic transmission in TD etiology. Furthermore, we found genetic sharing between TD and the blood and CSF levels of 39 metabolites. The landscape of TD not only provides insights into the (altered) molecular processes that underlie the disease but, through the identification of potential drug targets (such as FLT3, NAALAD2, CX3CL1-CX3CR1, OPRM1, and HRH2), it also yields clues for developing novel TD treatments.
Collapse
|
34
|
Bhat V, Adzhubei IA, Fife JD, Lebo M, Cassa CA. Informing variant assessment using structured evidence from prior classifications (PS1, PM5, and PVS1 sequence variant interpretation criteria). Genet Med 2023; 25:16-26. [PMID: 36305854 DOI: 10.1016/j.gim.2022.09.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 09/15/2022] [Accepted: 09/17/2022] [Indexed: 11/06/2022] Open
Abstract
PURPOSE This study aimed to explore whether evidence of pathogenicity from prior variant classifications in ClinVar could be used to inform variant interpretation using the American College of Medical Genetics and Genomics/Association for Molecular Pathology clinical guidelines. METHODS We identified distinct single-nucleotide variants (SNVs) that are either similar in location or in functional consequence to pathogenic variants in ClinVar and analyzed evidence in support of pathogenicity using 3 interpretation criteria. RESULTS Thousands of variants, including many in clinically actionable disease genes (American College of Medical Genetics and Genomics secondary findings v3.0), have evidence of pathogenicity from existing variant classifications, accounting for 2.5% of nonsynonymous SNVs within ClinVar. Notably, there are many variants with uncertain or conflicting classifications that cause the same amino acid substitution as other pathogenic variants (PS1, N = 323), variants that are predicted to cause different amino acid substitutions in the same codon as pathogenic variants (PM5, N = 7692), and loss-of-function variants that are present in genes in which many loss-of-function variants are classified as pathogenic (PVS1, N = 3635). Most of these variants have similar computational predictions of pathogenicity and splicing effect as their associated pathogenic variants. CONCLUSION Broadly, for >1.4 million SNVs exome wide, information from previously classified variants could be used to provide evidence of pathogenicity. We have developed a pipeline to identify variants meeting these criteria that may inform interpretation efforts.
Collapse
Affiliation(s)
- Vineel Bhat
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Ivan A Adzhubei
- Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School, Boston, MA
| | - James D Fife
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Matthew Lebo
- Laboratory for Molecular Medicine, Mass General Brigham Personalized Medicine, Boston, MA; Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Christopher A Cassa
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.
| |
Collapse
|
35
|
Murat F, Mbengue N, Winge SB, Trefzer T, Leushkin E, Sepp M, Cardoso-Moreira M, Schmidt J, Schneider C, Mößinger K, Brüning T, Lamanna F, Belles MR, Conrad C, Kondova I, Bontrop R, Behr R, Khaitovich P, Pääbo S, Marques-Bonet T, Grützner F, Almstrup K, Schierup MH, Kaessmann H. The molecular evolution of spermatogenesis across mammals. Nature 2023; 613:308-316. [PMID: 36544022 PMCID: PMC9834047 DOI: 10.1038/s41586-022-05547-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 11/09/2022] [Indexed: 12/24/2022]
Abstract
The testis produces gametes through spermatogenesis and evolves rapidly at both the morphological and molecular level in mammals1-6, probably owing to the evolutionary pressure on males to be reproductively successful7. However, the molecular evolution of individual spermatogenic cell types across mammals remains largely uncharacterized. Here we report evolutionary analyses of single-nucleus transcriptome data for testes from 11 species that cover the three main mammalian lineages (eutherians, marsupials and monotremes) and birds (the evolutionary outgroup), and include seven primates. We find that the rapid evolution of the testis was driven by accelerated fixation rates of gene expression changes, amino acid substitutions and new genes in late spermatogenic stages, probably facilitated by reduced pleiotropic constraints, haploid selection and transcriptionally permissive chromatin. We identify temporal expression changes of individual genes across species and conserved expression programs controlling ancestral spermatogenic processes. Genes predominantly expressed in spermatogonia (germ cells fuelling spermatogenesis) and Sertoli (somatic support) cells accumulated on X chromosomes during evolution, presumably owing to male-beneficial selective forces. Further work identified transcriptomal differences between X- and Y-bearing spermatids and uncovered that meiotic sex-chromosome inactivation (MSCI) also occurs in monotremes and hence is common to mammalian sex-chromosome systems. Thus, the mechanism of meiotic silencing of unsynapsed chromatin, which underlies MSCI, is an ancestral mammalian feature. Our study illuminates the molecular evolution of spermatogenesis and associated selective forces, and provides a resource for investigating the biology of the testis across mammals.
Collapse
Affiliation(s)
- Florent Murat
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany. .,INRAE, LPGP, Rennes, France.
| | - Noe Mbengue
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany.
| | - Sofia Boeg Winge
- Department of Growth and Reproduction, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark.,International Center for Research and Research Training in Endocrine Disruption of Male Reproduction and Child Health, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark.,Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Timo Trefzer
- Berlin Institute of Health at Charité, University of Medicine Berlin, Corporate Member of the Free University of Berlin, Humboldt-University of Berlin, Berlin, Germany
| | - Evgeny Leushkin
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany
| | - Mari Sepp
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany
| | | | - Julia Schmidt
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany
| | - Celine Schneider
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany
| | - Katharina Mößinger
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany
| | - Thoomke Brüning
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany
| | - Francesco Lamanna
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany
| | | | - Christian Conrad
- Berlin Institute of Health at Charité, University of Medicine Berlin, Corporate Member of the Free University of Berlin, Humboldt-University of Berlin, Berlin, Germany
| | - Ivanela Kondova
- Biomedical Primate Research Center (BPRC), Rijswijk, the Netherlands
| | - Ronald Bontrop
- Biomedical Primate Research Center (BPRC), Rijswijk, the Netherlands
| | - Rüdiger Behr
- German Primate Center (DPZ), Platform Degenerative Diseases, Göttingen, Germany.,German Center for Cardiovascular Research (DZHK), Partner Site Göttingen, Göttingen, Germany
| | - Philipp Khaitovich
- Center for Neurobiology and Brain Restoration, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Svante Pääbo
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), Barcelona, Spain.,Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain.,CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Miquel Crusafont Catalan Institute of Paleontology, Autonomous University of Barcelona, Barcelona, Spain
| | - Frank Grützner
- The Robinson Research Institute, School of Biological Science, University of Adelaide, Adelaide, South Australia, Australia
| | - Kristian Almstrup
- Department of Growth and Reproduction, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark.,International Center for Research and Research Training in Endocrine Disruption of Male Reproduction and Child Health, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark.,Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Henrik Kaessmann
- Center for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
36
|
Lazareva TE, Barbitoff YA, Changalidis AI, Tkachenko AA, Maksiutenko EM, Nasykhova YA, Glotov AS. Biobanking as a Tool for Genomic Research: From Allele Frequencies to Cross-Ancestry Association Studies. J Pers Med 2022; 12:2040. [PMID: 36556260 PMCID: PMC9783756 DOI: 10.3390/jpm12122040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 11/19/2022] [Accepted: 11/28/2022] [Indexed: 12/14/2022] Open
Abstract
In recent years, great advances have been made in the field of collection, storage, and analysis of biological samples. Large collections of samples, biobanks, have been established in many countries. Biobanks typically collect large amounts of biological samples and associated clinical information; the largest collections include over a million samples. In this review, we summarize the main directions in which biobanks aid medical genetics and genomic research, from providing reference allele frequency information to allowing large-scale cross-ancestry meta-analyses. The largest biobanks greatly vary in the size of the collection, and the amount of available phenotype and genotype data. Nevertheless, all of them are extensively used in genomics, providing a rich resource for genome-wide association analysis, genetic epidemiology, and statistical research into the structure, function, and evolution of the human genome. Recently, multiple research efforts were based on trans-biobank data integration, which increases sample size and allows for the identification of robust genetic associations. We provide prominent examples of such data integration and discuss important caveats which have to be taken into account in trans-biobank research.
Collapse
Affiliation(s)
- Tatyana E. Lazareva
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
- Department of Genetics and Biotechnology, St. Petersburg State University, 199034 St. Petersburg, Russia
| | - Yury A. Barbitoff
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
- Department of Genetics and Biotechnology, St. Petersburg State University, 199034 St. Petersburg, Russia
| | - Anton I. Changalidis
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
- Faculty of Software Engineering and Computer Systems, ITMO University, 197101 St. Petersburg, Russia
| | - Alexander A. Tkachenko
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
| | - Evgeniia M. Maksiutenko
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
| | - Yulia A. Nasykhova
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
| | - Andrey S. Glotov
- Departemnt of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia
| |
Collapse
|
37
|
Cormier MJ, Pedersen BS, Bayrak-Toydemir P, Quinlan AR. Combining genetic constraint with predictions of alternative splicing to prioritize deleterious splicing in rare disease studies. BMC Bioinformatics 2022; 23:482. [PMID: 36376793 PMCID: PMC9664736 DOI: 10.1186/s12859-022-05041-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 11/07/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Despite numerous molecular and computational advances, roughly half of patients with a rare disease remain undiagnosed after exome or genome sequencing. A particularly challenging barrier to diagnosis is identifying variants that cause deleterious alternative splicing at intronic or exonic loci outside of canonical donor or acceptor splice sites. RESULTS Several existing tools predict the likelihood that a genetic variant causes alternative splicing. We sought to extend such methods by developing a new metric that aids in discerning whether a genetic variant leads to deleterious alternative splicing. Our metric combines genetic variation in the Genome Aggregate Database with alternative splicing predictions from SpliceAI to compare observed and expected levels of splice-altering genetic variation. We infer genic regions with significantly less splice-altering variation than expected to be constrained. The resulting model of regional splicing constraint captures differential splicing constraint across gene and exon categories, and the most constrained genic regions are enriched for pathogenic splice-altering variants. Building from this model, we developed ConSpliceML. This ensemble machine learning approach combines regional splicing constraint with multiple per-nucleotide alternative splicing scores to guide the prediction of deleterious splicing variants in protein-coding genes. ConSpliceML more accurately distinguishes deleterious and benign splicing variants than state-of-the-art splicing prediction methods, especially in "cryptic" splicing regions beyond canonical donor or acceptor splice sites. CONCLUSION Integrating a model of genetic constraint with annotations from existing alternative splicing tools allows ConSpliceML to prioritize potentially deleterious splice-altering variants in studies of rare human diseases.
Collapse
Affiliation(s)
- Michael J Cormier
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | | | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA.
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
38
|
Zhang H, Xu MS, Fan X, Chung WK, Shen Y. Predicting functional effect of missense variants using graph attention neural networks. NAT MACH INTELL 2022; 4:1017-1028. [PMID: 37484202 PMCID: PMC10361701 DOI: 10.1038/s42256-022-00561-w] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 10/07/2022] [Indexed: 11/16/2022]
Abstract
Accurate prediction of damaging missense variants is critically important for interpreting a genome sequence. Although many methods have been developed, their performance has been limited. Recent advances in machine learning and the availability of large-scale population genomic sequencing data provide new opportunities to considerably improve computational predictions. Here we describe the graphical missense variant pathogenicity predictor (gMVP), a new method based on graph attention neural networks. Its main component is a graph with nodes that capture predictive features of amino acids and edges weighted by co-evolution strength, enabling effective pooling of information from the local protein context and functionally correlated distal positions. Evaluation of deep mutational scan data shows that gMVP outperforms other published methods in identifying damaging variants in TP53, PTEN, BRCA1 and MSH2. Furthermore, it achieves the best separation of de novo missense variants in neuro developmental disorder cases from those in controls. Finally, the model supports transfer learning to optimize gain- and loss-of-function predictions in sodium and calcium channels. In summary, we demonstrate that gMVP can improve interpretation of missense variants in clinical testing and genetic studies.
Collapse
Affiliation(s)
- Haicang Zhang
- Department of Systems Biology, Columbia University, New York, NY, USA
| | | | - Xiao Fan
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Wendy K. Chung
- Department of Pediatrics, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA
| |
Collapse
|
39
|
Zug R, Uller T. Evolution and dysfunction of human cognitive and social traits: A transcriptional regulation perspective. EVOLUTIONARY HUMAN SCIENCES 2022; 4:e43. [PMID: 37588924 PMCID: PMC10426018 DOI: 10.1017/ehs.2022.42] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 08/11/2022] [Accepted: 09/11/2022] [Indexed: 11/07/2022] Open
Abstract
Evolutionary changes in brain and craniofacial development have endowed humans with unique cognitive and social skills, but also predisposed us to debilitating disorders in which these traits are disrupted. What are the developmental genetic underpinnings that connect the adaptive evolution of our cognition and sociality with the persistence of mental disorders with severe negative fitness effects? We argue that loss of function of genes involved in transcriptional regulation represents a crucial link between the evolution and dysfunction of human cognitive and social traits. The argument is based on the haploinsufficiency of many transcriptional regulator genes, which makes them particularly sensitive to loss-of-function mutations. We discuss how human brain and craniofacial traits evolved through partial loss of function (i.e. reduced expression) of these genes, a perspective compatible with the idea of human self-domestication. Moreover, we explain why selection against loss-of-function variants supports the view that mutation-selection-drift, rather than balancing selection, underlies the persistence of psychiatric disorders. Finally, we discuss testable predictions.
Collapse
Affiliation(s)
- Roman Zug
- Department of Biology, Lund University, Lund, Sweden
| | - Tobias Uller
- Department of Biology, Lund University, Lund, Sweden
| |
Collapse
|
40
|
Zhou X, Feliciano P, Shu C, Wang T, Astrovskaya I, Hall JB, Obiajulu JU, Wright JR, Murali SC, Xu SX, Brueggeman L, Thomas TR, Marchenko O, Fleisch C, Barns SD, Snyder LG, Han B, Chang TS, Turner TN, Harvey WT, Nishida A, O'Roak BJ, Geschwind DH, Michaelson JJ, Volfovsky N, Eichler EE, Shen Y, Chung WK. Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat Genet 2022; 54:1305-1319. [PMID: 35982159 PMCID: PMC9470534 DOI: 10.1038/s41588-022-01148-2] [Citation(s) in RCA: 160] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 06/28/2022] [Indexed: 12/16/2022]
Abstract
To capture the full spectrum of genetic risk for autism, we performed a two-stage analysis of rare de novo and inherited coding variants in 42,607 autism cases, including 35,130 new cases recruited online by SPARK. We identified 60 genes with exome-wide significance (P < 2.5 × 10-6), including five new risk genes (NAV3, ITSN1, MARK2, SCAF1 and HNRNPUL2). The association of NAV3 with autism risk is primarily driven by rare inherited loss-of-function (LoF) variants, with an estimated relative risk of 4, consistent with moderate effect. Autistic individuals with LoF variants in the four moderate-risk genes (NAV3, ITSN1, SCAF1 and HNRNPUL2; n = 95) have less cognitive impairment than 129 autistic individuals with LoF variants in highly penetrant genes (CHD8, SCN2A, ADNP, FOXP1 and SHANK3) (59% vs 88%, P = 1.9 × 10-6). Power calculations suggest that much larger numbers of autism cases are needed to identify additional moderate-risk genes.
Collapse
Affiliation(s)
- Xueya Zhou
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.,Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | | | - Chang Shu
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.,Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Tianyun Wang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Department of Medical Genetics, Center for Medical Genetics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.,Neuroscience Research Institute, Department of Neurobiology, School of Basic Medical Sciences, Peking University Health Science Center; Key Laboratory for Neuroscience, Ministry of Education of China & National Health Commission of China, Beijing, China
| | | | | | - Joseph U Obiajulu
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.,Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | | | - Shwetha C Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Leo Brueggeman
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | - Taylor R Thomas
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | | | | | | | | | - Bing Han
- Simons Foundation, New York, NY, USA
| | - Timothy S Chang
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Tychele N Turner
- Department of Genetics, Washington University, St. Louis, MO, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrew Nishida
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Brian J O'Roak
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Daniel H Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Jacob J Michaelson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA.,Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA. .,Simons Foundation, New York, NY, USA. .,Department of Medicine, Columbia University Medical Center, New York, NY, USA.
| |
Collapse
|
41
|
Tilk S, Tkachenko S, Curtis C, Petrov DA, McFarland CD. Most cancers carry a substantial deleterious load due to Hill-Robertson interference. eLife 2022; 11:67790. [PMID: 36047771 PMCID: PMC9499534 DOI: 10.7554/elife.67790] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Cancer genomes exhibit surprisingly weak signatures of negative selection1,2. This may be because selective pressures are relaxed or because genome-wide linkage prevents deleterious mutations from being removed (Hill-Robertson interference)3. By stratifying tumors by their genome-wide mutational burden, we observe negative selection (dN/dS ~ 0.56) in low mutational burden tumors, while remaining cancers exhibit dN/dS ratios ~1. This suggests that most tumors do not remove deleterious passengers. To buffer against deleterious passengers, tumors upregulate heat shock pathways as their mutational burden increases. Finally, evolutionary modeling finds that Hill-Robertson interference alone can reproduce patterns of attenuated selection and estimates the total fitness cost of passengers to be 46% per cell on average. Collectively, our findings suggest that the lack of observed negative selection in most tumors is not due to relaxed selective pressures, but rather the inability of selection to remove deleterious mutations in the presence of genome-wide linkage.
Collapse
Affiliation(s)
- Susanne Tilk
- Department of Biology, Stanford University, Stanford, United States
| | - Svyatoslav Tkachenko
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, United States
| | - Christina Curtis
- Department of Genetics, Stanford University, Stanford, United States
| | - Dmitri A Petrov
- Department of Biology, Stanford University, Stanford, United States
| | - Christopher D McFarland
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, United States
| |
Collapse
|
42
|
Collins RL, Glessner JT, Porcu E, Lepamets M, Brandon R, Lauricella C, Han L, Morley T, Niestroj LM, Ulirsch J, Everett S, Howrigan DP, Boone PM, Fu J, Karczewski KJ, Kellaris G, Lowther C, Lucente D, Mohajeri K, Nõukas M, Nuttle X, Samocha KE, Trinh M, Ullah F, Võsa U, Hurles ME, Aradhya S, Davis EE, Finucane H, Gusella JF, Janze A, Katsanis N, Matyakhina L, Neale BM, Sanders D, Warren S, Hodge JC, Lal D, Ruderfer DM, Meck J, Mägi R, Esko T, Reymond A, Kutalik Z, Hakonarson H, Sunyaev S, Brand H, Talkowski ME. A cross-disorder dosage sensitivity map of the human genome. Cell 2022; 185:3041-3055.e25. [PMID: 35917817 PMCID: PMC9742861 DOI: 10.1016/j.cell.2022.06.036] [Citation(s) in RCA: 164] [Impact Index Per Article: 54.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 03/17/2022] [Accepted: 06/20/2022] [Indexed: 02/06/2023]
Abstract
Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.
Collapse
Affiliation(s)
- Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
| | - Joseph T Glessner
- Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, Division of Human Genetics, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Eleonora Porcu
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maarja Lepamets
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia; Institute of Molecular and Cell Biology, University of Tartu, 51010 Tartu, Estonia
| | | | | | - Lide Han
- Division of Genetic Medicine, Department of Medicine, and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Theodore Morley
- Division of Genetic Medicine, Department of Medicine, and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | | | - Jacob Ulirsch
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Selin Everett
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Daniel P Howrigan
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Philip M Boone
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
| | - Jack Fu
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Konrad J Karczewski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Georgios Kellaris
- Advanced Center for Translational and Genetic Medicine, Stanley Manne Children's Research Institute, Lurie Children's Hospital, Chicago, IL 60611, USA; Departments of Pediatrics and Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Chelsea Lowther
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Diane Lucente
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kiana Mohajeri
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Margit Nõukas
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia; Institute of Molecular and Cell Biology, University of Tartu, 51010 Tartu, Estonia
| | - Xander Nuttle
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Kaitlin E Samocha
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115, USA; Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10, UK
| | - Mi Trinh
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10, UK
| | - Farid Ullah
- Advanced Center for Translational and Genetic Medicine, Stanley Manne Children's Research Institute, Lurie Children's Hospital, Chicago, IL 60611, USA; Departments of Pediatrics and Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Urmo Võsa
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
| | | | | | - Matthew E Hurles
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10, UK
| | | | - Erica E Davis
- Advanced Center for Translational and Genetic Medicine, Stanley Manne Children's Research Institute, Lurie Children's Hospital, Chicago, IL 60611, USA; Departments of Pediatrics and Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Hilary Finucane
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - James F Gusella
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | | | - Nicholas Katsanis
- Advanced Center for Translational and Genetic Medicine, Stanley Manne Children's Research Institute, Lurie Children's Hospital, Chicago, IL 60611, USA; Departments of Pediatrics and Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | | | - Benjamin M Neale
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | | | - Jennelle C Hodge
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Dennis Lal
- Cologne Center for Genomics, University of Cologne, 51149 Cologne, Germany; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Douglas M Ruderfer
- Division of Genetic Medicine, Department of Medicine, and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Center for Precision Medicine, Department of Biomedical Informatics, and Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | | | - Reedik Mägi
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
| | - Tõnu Esko
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Zoltán Kutalik
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Center for Primary Care and Public Health, University of Lausanne, 1015 Lausanne, Switzerland; Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Hakon Hakonarson
- Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, Division of Human Genetics, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Shamil Sunyaev
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, MA 02114, USA.
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA.
| |
Collapse
|
43
|
Gudmundsson S, Singer‐Berk M, Watts NA, Phu W, Goodrich JK, Solomonson M, Rehm HL, MacArthur DG, O'Donnell‐Luria A. Variant interpretation using population databases: Lessons from gnomAD. Hum Mutat 2022; 43:1012-1030. [PMID: 34859531 PMCID: PMC9160216 DOI: 10.1002/humu.24309] [Citation(s) in RCA: 243] [Impact Index Per Article: 81.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 11/02/2021] [Accepted: 11/28/2021] [Indexed: 01/22/2023]
Abstract
Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We introduce key features including allele frequency, per-base expression levels, constraint scores, and variant co-occurrence, alongside guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease.
Collapse
Affiliation(s)
- Sanna Gudmundsson
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeMAUSA
- Division of Genetics and Genomics, Boston Children's HospitalHarvard Medical SchoolBostonMAUSA
- Analytic and Translational Genetics UnitMassachusetts General HospitalBostonMAUSA
| | - Moriel Singer‐Berk
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeMAUSA
- Analytic and Translational Genetics UnitMassachusetts General HospitalBostonMAUSA
| | - Nicholas A. Watts
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeMAUSA
- Analytic and Translational Genetics UnitMassachusetts General HospitalBostonMAUSA
| | - William Phu
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeMAUSA
- Division of Genetics and Genomics, Boston Children's HospitalHarvard Medical SchoolBostonMAUSA
- Analytic and Translational Genetics UnitMassachusetts General HospitalBostonMAUSA
| | - Julia K. Goodrich
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeMAUSA
- Analytic and Translational Genetics UnitMassachusetts General HospitalBostonMAUSA
| | - Matthew Solomonson
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeMAUSA
- Analytic and Translational Genetics UnitMassachusetts General HospitalBostonMAUSA
| | | | - Heidi L. Rehm
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeMAUSA
- Analytic and Translational Genetics UnitMassachusetts General HospitalBostonMAUSA
- Center for Genomic MedicineMassachusetts General HospitalBostonMAUSA
| | - Daniel G. MacArthur
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeMAUSA
- Centre for Population Genomics, Garvan Institute of Medical ResearchUniversity of New South Wales SydneySydneyNew South WalesAustralia
- Centre for Population GenomicsMurdoch Children's Research InstituteMelbourneVictoriaAustralia
| | - Anne O'Donnell‐Luria
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeMAUSA
- Division of Genetics and Genomics, Boston Children's HospitalHarvard Medical SchoolBostonMAUSA
- Analytic and Translational Genetics UnitMassachusetts General HospitalBostonMAUSA
| |
Collapse
|
44
|
Milligan WR, Amster G, Sella G. The impact of genetic modifiers on variation in germline mutation rates within and among human populations. Genetics 2022; 221:iyac087. [PMID: 35666194 PMCID: PMC9339295 DOI: 10.1093/genetics/iyac087] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 05/16/2022] [Indexed: 11/14/2022] Open
Abstract
Mutation rates and spectra differ among human populations. Here, we examine whether this variation could be explained by evolution at mutation modifiers. To this end, we consider genetic modifier sites at which mutations, "mutator alleles," increase genome-wide mutation rates and model their evolution under purifying selection due to the additional deleterious mutations that they cause, genetic drift, and demographic processes. We solve the model analytically for a constant population size and characterize how evolution at modifier sites impacts variation in mutation rates within and among populations. We then use simulations to study the effects of modifier sites under a plausible demographic model for Africans and Europeans. When comparing populations that evolve independently, weakly selected modifier sites (2Nes≈1), which evolve slowly, contribute the most to variation in mutation rates. In contrast, when populations recently split from a common ancestral population, strongly selected modifier sites (2Nes≫1), which evolve rapidly, contribute the most to variation between them. Moreover, a modest number of modifier sites (e.g. 10 per mutation type in the standard classification into 96 types) subject to moderate to strong selection (2Nes>1) could account for the variation in mutation rates observed among human populations. If such modifier sites indeed underlie differences among populations, they should also cause variation in mutation rates within populations and their effects should be detectable in pedigree studies.
Collapse
Affiliation(s)
- William R Milligan
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Guy Amster
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Flatiron Health Inc., New York, NY 10013, USA
| | - Guy Sella
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Program for Mathematical Genomics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
45
|
Jamet S, Ha S, Ho TH, Houghtaling S, Timms A, Yu K, Paquette A, Maga AM, Greene NDE, Beier DR. The arginine methyltransferase Carm1 is necessary for heart development. G3 GENES|GENOMES|GENETICS 2022; 12:6613934. [PMID: 35736367 PMCID: PMC9339313 DOI: 10.1093/g3journal/jkac155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 05/27/2022] [Indexed: 11/28/2022]
Abstract
To discover genes implicated in human congenital disorders, we performed ENU mutagenesis in the mouse and screened for mutations affecting embryonic development. In this work, we report defects of heart development in mice homozygous for a mutation of coactivator-associated arginine methyltransferase 1 (Carm1). While Carm1 has been extensively studied, it has never been previously associated with a role in heart development. Phenotype analysis combining histology and microcomputed tomography imaging shows a range of cardiac defects. Most notably, many affected midgestation embryos appear to have cardiac rupture and hemorrhaging in the thorax. Mice that survive to late gestation show a variety of cardiac defects, including ventricular septal defects, double outlet right ventricle, and persistent truncus arteriosus. Transcriptome analyses of the mutant embryos by mRNA-seq reveal the perturbation of several genes involved in cardiac morphogenesis and muscle development and function. In addition, we observe the mislocalization of cardiac neural crest cells at E12.5 in the outflow tract. The cardiac phenotype of Carm1 mutant embryos is similar to that of Pax3 null mutants, and PAX3 is a putative target of CARM1. However, our analysis does not support the hypothesis that developmental defects in Carm1 mutant embryos are primarily due to a functional defect of PAX3.
Collapse
Affiliation(s)
- Sophie Jamet
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute , Seattle, WA 98101, USA
| | - Seungshin Ha
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute , Seattle, WA 98101, USA
| | - Tzu-Hua Ho
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute , Seattle, WA 98101, USA
| | - Scott Houghtaling
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute , Seattle, WA 98101, USA
| | - Andrew Timms
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute , Seattle, WA 98101, USA
| | - Kai Yu
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute , Seattle, WA 98101, USA
- Department of Pediatrics, University of Washington School of Medicine , Seattle, WA 98195, USA
| | - Alison Paquette
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute , Seattle, WA 98101, USA
- Department of Pediatrics, University of Washington School of Medicine , Seattle, WA 98195, USA
| | - Ali Murat Maga
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute , Seattle, WA 98101, USA
- Department of Pediatrics, University of Washington School of Medicine , Seattle, WA 98195, USA
| | - Nicholas D E Greene
- Developmental Biology & Cancer Department, UCL Great Ormond Street Institute of Child Health , London WC1N 1EH, UK
| | - David R Beier
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute , Seattle, WA 98101, USA
- Department of Pediatrics, University of Washington School of Medicine , Seattle, WA 98195, USA
| |
Collapse
|
46
|
Dukler N, Mughal MR, Ramani R, Huang YF, Siepel A. Extreme purifying selection against point mutations in the human genome. Nat Commun 2022; 13:4312. [PMID: 35879308 PMCID: PMC9314448 DOI: 10.1038/s41467-022-31872-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 07/07/2022] [Indexed: 12/13/2022] Open
Abstract
Large-scale genome sequencing has enabled the measurement of strong purifying selection in protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring such selection in noncoding as well as coding regions of the human genome. ExtRaINSIGHT estimates the prevalence of "ultraselection" by the fractional depletion of rare single-nucleotide variants, after controlling for variation in mutation rates. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find abundant ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. By contrast, we find much less ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest levels in ultraconserved elements. We estimate that ~0.4-0.7% of the human genome is ultraselected, implying ~ 0.26-0.51 strongly deleterious mutations per generation. Overall, our study sheds new light on the genome-wide distribution of fitness effects by combining deep sequencing data and classical theory from population genetics.
Collapse
Affiliation(s)
- Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Mehreen R Mughal
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Ritika Ramani
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Yi-Fei Huang
- Department of Biology and Huck Institute of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
47
|
Green TE, Motelow JE, Bennett MF, Ye Z, Bennett CA, Griffin NG, Damiano JA, Leventer RJ, Freeman JL, Harvey AS, Lockhart PJ, Sadleir LG, Boys A, Scheffer IE, Major H, Darbro BW, Bahlo M, Goldstein DB, Kerrigan JF, Heinzen EL, Berkovic SF, Hildebrand MS. Sporadic hypothalamic hamartoma is a ciliopathy with somatic and bi-allelic contributions. Hum Mol Genet 2022; 31:2307-2316. [PMID: 35137044 PMCID: PMC9307310 DOI: 10.1093/hmg/ddab366] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/02/2021] [Accepted: 12/15/2021] [Indexed: 11/13/2022] Open
Abstract
Hypothalamic hamartoma with gelastic seizures is a well-established cause of drug-resistant epilepsy in early life. The development of novel surgical techniques has permitted the genomic interrogation of hypothalamic hamartoma tissue. This has revealed causative mosaic variants within GLI3, OFD1 and other key regulators of the sonic-hedgehog pathway in a minority of cases. Sonic-hedgehog signalling proteins localize to the cellular organelle primary cilia. We therefore explored the hypothesis that cilia gene variants may underlie hitherto unsolved cases of sporadic hypothalamic hamartoma. We performed high-depth exome sequencing and chromosomal microarray on surgically resected hypothalamic hamartoma tissue and paired leukocyte-derived DNA from 27 patients. We searched for both germline and somatic variants under both dominant and bi-allelic genetic models. In hamartoma-derived DNA of seven patients we identified bi-allelic (one germline, one somatic) variants within one of four cilia genes-DYNC2I1, DYNC2H1, IFT140 or SMO. In eight patients, we identified single somatic variants in the previously established hypothalamic hamartoma disease genes GLI3 or OFD1. Overall, we established a plausible molecular cause for 15/27 (56%) patients. Here, we expand the genetic architecture beyond single variants within dominant disease genes that cause sporadic hypothalamic hamartoma to bi-allelic (one germline/one somatic) variants, implicate three novel cilia genes and reconceptualize the disorder as a ciliopathy.
Collapse
Affiliation(s)
- Timothy E Green
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia
| | - Joshua E Motelow
- Institute for Genomic Medicine, Columbia University, New York, NY 10032, USA
| | - Mark F Bennett
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Victoria 3052, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Victoria 3052, Australia
| | - Zimeng Ye
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia
| | - Caitlin A Bennett
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia
| | - Nicole G Griffin
- Institute for Genomic Medicine, Columbia University, New York, NY 10032, USA
| | - John A Damiano
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia
| | - Richard J Leventer
- Department of Neurology, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
| | - Jeremy L Freeman
- Department of Neurology, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
| | - A Simon Harvey
- Department of Neurology, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
| | - Paul J Lockhart
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
| | - Lynette G Sadleir
- Department of Paediatrics and Child Health, University of Otago, Wellington 6242, New Zealand
| | - Amber Boys
- Victorian Clinical Genetics Services, Parkville, VIC 3052, Australia
| | - Ingrid E Scheffer
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia
- Department of Neurology, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
- The Florey Institute of Neuroscience and Mental Health, Parkville, VIC 3052, Australia
| | - Heather Major
- Department of Pediatrics, The University of Iowa, Iowa City, IA 52246, USA
| | - Benjamin W Darbro
- Department of Pediatrics, The University of Iowa, Iowa City, IA 52246, USA
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Victoria 3052, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Victoria 3052, Australia
| | - David B Goldstein
- Institute for Genomic Medicine, Columbia University, New York, NY 10032, USA
- Department of Genetics and Development, Columbia University, New York, NY 10032, USA
| | - John F Kerrigan
- Division of Pediatric Neurology, Barrow Neurological Institute, Phoenix Children's Hospital, Phoenix, AZ 85013, USA
| | - Erin L Heinzen
- Eshelman School of Pharmacy, Division of Pharmacotherapy and Experimental Therapeutics, and Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Samuel F Berkovic
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia
| | - Michael S Hildebrand
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia
- Murdoch Children’s Research Institute, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
| |
Collapse
|
48
|
Qing T, Mohsen H, Cannataro VL, Marczyk M, Rozenblit M, Foldi J, Murray M, Townsend JP, Kluger Y, Gerstein M, Pusztai L. Cancer Relevance of Human Genes. J Natl Cancer Inst 2022; 114:988-995. [PMID: 35417011 PMCID: PMC9275765 DOI: 10.1093/jnci/djac068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 01/03/2022] [Accepted: 03/29/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND We hypothesize that genes that directly or indirectly interact with core cancer genes (CCGs) in a comprehensive gene-gene interaction network may have functional importance in cancer. METHODS We categorized 12 767 human genes into CCGs (n = 468), 1 (n = 5467), 2 (n = 5573), 3 (n = 915), and more than 3 steps (n = 416) removed from the nearest CCG in the Search Tool for the Retrieval of Interacting Genes/Proteins network. We estimated cancer-relevant functional importance in these neighborhood categories using 1) gene dependency score, which reflects the effect of a gene on cell viability after knockdown; 2) somatic mutation frequency in The Cancer Genome Atlas; 3) effect size that estimates to what extent a mutation in a gene enhances cell survival; and 4) negative selection pressure of germline protein-truncating variants in healthy populations. RESULTS Cancer biology-related functional importance of genes decreases as their distance from the CCGs increases. Genes closer to cancer genes show greater connectedness in the network, have greater importance in maintaining cancer cell viability, are under greater negative germline selection pressure, and have higher somatic mutation frequency in cancer. Based on these 4 metrics, we provide cancer relevance annotation to known human genes. CONCLUSIONS A large number of human genes are connected to CCGs and could influence cancer biology to various extent when dysregulated; any given mutation may be functionally important in one but not in another individual depending on genomic context.
Collapse
Affiliation(s)
- Tao Qing
- Breast Medical Oncology, School of Medicine, Yale University, New Haven, CT, USA
| | - Hussein Mohsen
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, USA
| | | | - Michal Marczyk
- Breast Medical Oncology, School of Medicine, Yale University, New Haven, CT, USA
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Mariya Rozenblit
- Breast Medical Oncology, School of Medicine, Yale University, New Haven, CT, USA
| | - Julia Foldi
- Breast Medical Oncology, School of Medicine, Yale University, New Haven, CT, USA
| | - Michael Murray
- Department of Genetics, Yale Center for Genomic Health, New Haven, CT, USA
| | - Jeffrey P Townsend
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, USA
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Yuval Kluger
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, USA
- Department of Pathology, School of Medicine, Yale University, New Haven, CT, USA
- Applied Mathematics Program, Yale University, New Haven, CT, USA
| | - Mark Gerstein
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, USA
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA
- Department of Computer Science, Yale University, New Haven, CT, USA
- Department of Statistics & Data Science, Yale University, New Haven, CT, USA
| | - Lajos Pusztai
- Breast Medical Oncology, School of Medicine, Yale University, New Haven, CT, USA
| |
Collapse
|
49
|
Lesurf R, Said A, Akinrinade O, Breckpot J, Delfosse K, Liu T, Yao R, Persad G, McKenna F, Noche RR, Oliveros W, Mattioli K, Shah S, Miron A, Yang Q, Meng G, Yue MCS, Sung WWL, Thiruvahindrapuram B, Lougheed J, Oechslin E, Mondal T, Bergin L, Smythe J, Jayappa S, Rao VJ, Shenthar J, Dhandapany PS, Semsarian C, Weintraub RG, Bagnall RD, Ingles J, Melé M, Maass PG, Ellis J, Scherer SW, Mital S. Whole genome sequencing delineates regulatory, copy number, and cryptic splice variants in early onset cardiomyopathy. NPJ Genom Med 2022; 7:18. [PMID: 35288587 PMCID: PMC8921194 DOI: 10.1038/s41525-022-00288-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 02/04/2022] [Indexed: 11/08/2022] Open
Abstract
Cardiomyopathy (CMP) is a heritable disorder. Over 50% of cases are gene-elusive on clinical gene panel testing. The contribution of variants in non-coding DNA elements that result in cryptic splicing and regulate gene expression has not been explored. We analyzed whole-genome sequencing (WGS) data in a discovery cohort of 209 pediatric CMP patients and 1953 independent replication genomes and exomes. We searched for protein-coding variants, and non-coding variants predicted to affect the function or expression of genes. Thirty-nine percent of cases harbored pathogenic coding variants in known CMP genes, and 5% harbored high-risk loss-of-function (LoF) variants in additional candidate CMP genes. Fifteen percent harbored high-risk regulatory variants in promoters and enhancers of CMP genes (odds ratio 2.25, p = 6.70 × 10-7 versus controls). Genes involved in α-dystroglycan glycosylation (FKTN, DTNA) and desmosomal signaling (DSC2, DSG2) were most highly enriched for regulatory variants (odds ratio 6.7-58.1). Functional effects were confirmed in patient myocardium and reporter assays in human cardiomyocytes, and in zebrafish CRISPR knockouts. We provide strong evidence for the genomic contribution of functionally active variants in new genes and in regulatory elements of known CMP genes to early onset CMP.
Collapse
Affiliation(s)
- Robert Lesurf
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Abdelrahman Said
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Oyediran Akinrinade
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
- St. George's University School of Medicine, Grenada, Grenada
| | | | - Kathleen Delfosse
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Ting Liu
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Roderick Yao
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Gabrielle Persad
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Fintan McKenna
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Ramil R Noche
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
- Zebrafish Genetics and Disease Models Core, The Hospital for Sick Children, Toronto, ON, Canada
| | - Winona Oliveros
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Catalonia, Spain
| | - Kaia Mattioli
- Division of Genetics, Department of Medicine, Brigham & Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Shreya Shah
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Anastasia Miron
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Qian Yang
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Guoliang Meng
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | | | - Wilson W L Sung
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
| | | | - Jane Lougheed
- Division of Cardiology, Children's Hospital of Eastern Ontario, Ottawa, ON, Canada
| | - Erwin Oechslin
- Peter Munk Cardiac Centre, Division of Cardiology, Toronto General Hospital, University of Toronto, Toronto, ON, Canada
| | - Tapas Mondal
- Department of Pediatrics, Hamilton Health Sciences Centre, Hamilton, ON, Canada
| | - Lynn Bergin
- Division of Cardiology, London Health Sciences Centre, London, ON, Canada
| | - John Smythe
- Department of Pediatrics, Kingston General Hospital, Kingston, ON, Canada
| | - Shashank Jayappa
- Cardiovascular Biology and Disease Theme, Institute for Stem Cell Science and Regenerative Medicine, Bangalore (inStem), Bangalore, India
| | - Vinay J Rao
- Cardiovascular Biology and Disease Theme, Institute for Stem Cell Science and Regenerative Medicine, Bangalore (inStem), Bangalore, India
| | - Jayaprakash Shenthar
- Department of Cardiology, Sri Jayadeva Institute of Cardiovascular Sciences and Research, Bengaluru, India
| | - Perundurai S Dhandapany
- Cardiovascular Biology and Disease Theme, Institute for Stem Cell Science and Regenerative Medicine, Bangalore (inStem), Bangalore, India
| | - Christopher Semsarian
- Agnes Ginges Centre for Molecular Cardiology at Centenary Institute, The University of Sydney, Sydney, Australia
- Department of Cardiology, Royal Prince Alfred Hospital, Sydney, Australia
| | - Robert G Weintraub
- Cardiology Department, Royal Children's Hospital, Melbourne, Australia
- Murdoch Children's Research Institute and Department of Paediatrics, University of Melbourne, Melbourne, Australia
| | - Richard D Bagnall
- Department of Cardiology, Royal Prince Alfred Hospital, Sydney, Australia
| | - Jodie Ingles
- Agnes Ginges Centre for Molecular Cardiology at Centenary Institute, The University of Sydney, Sydney, Australia
- Cardio Genomics Program at Centenary Institute, The University of Sydney, Sydney, Australia
| | - Marta Melé
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Catalonia, Spain
| | - Philipp G Maass
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - James Ellis
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Stephen W Scherer
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
- McLaughlin Centre, University of Toronto, Toronto, ON, Canada
| | - Seema Mital
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada.
- Ted Rogers Centre for Heart Research, Toronto, ON, Canada.
- Department of Pediatrics, The Hospital for Sick Children, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
50
|
Valentini P, Pierattini B, Zacco E, Mangoni D, Espinoza S, Webster NA, Andrews B, Carninci P, Tartaglia GG, Pandolfini L, Gustincich S. Towards SINEUP-based therapeutics: Design of an in vitro synthesized SINEUP RNA. MOLECULAR THERAPY. NUCLEIC ACIDS 2022; 27:1092-1102. [PMID: 35228902 PMCID: PMC8857549 DOI: 10.1016/j.omtn.2022.01.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 01/28/2022] [Indexed: 12/28/2022]
Abstract
SINEUPs are a novel class of natural and synthetic non-coding antisense RNA molecules able to increase the translation of a target mRNA. They present a modular organization comprising an unstructured antisense target-specific domain, which sets the specificity of each individual SINEUP, and a structured effector domain, which is responsible for the translation enhancement. In order to design a fully functional in vitro transcribed SINEUP for therapeutics applications, SINEUP RNAs were synthesized in vitro with a variety of chemical modifications and screened for their activity on endogenous target mRNA upon transfection. Three combinations of modified ribonucleotides-2'O methyl-ATP (Am), N6 methyl-ATP (m6A), and pseudo-UTP (ψ)-conferred SINEUP activity to naked RNA. The best combination tested in this study was fully modified with m6A and ψ. Aside from functionality, this combination conferred improved stability upon transfection and higher thermal stability. Common structural determinants of activity were identified by circular dichroisms, defining a core functional structure that is achieved with different combinations of modifications.
Collapse
Affiliation(s)
- Paola Valentini
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Bianca Pierattini
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
- Area of Neuroscience, International School for Advanced Studies (SISSA), 34136 Trieste, Italy
| | - Elsa Zacco
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Damiano Mangoni
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Stefano Espinoza
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Natalie A. Webster
- STORM Therapeutics, Babraham Research Campus, Moneta Building, Cambridge, CB22 3AT, UK
| | - Byron Andrews
- STORM Therapeutics, Babraham Research Campus, Moneta Building, Cambridge, CB22 3AT, UK
| | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | | | - Luca Pandolfini
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Stefano Gustincich
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| |
Collapse
|