1
|
Uppili B, Faruq M. STRIDE-DB: a comprehensive database for exploration of instability and phenotypic relevance of short tandem repeats in the human genome. Database (Oxford) 2024; 2024:baae020. [PMID: 38602506 PMCID: PMC11008502 DOI: 10.1093/database/baae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/10/2023] [Accepted: 03/07/2024] [Indexed: 04/12/2024]
Abstract
Short Tandem Repeats (STRs) are genetic markers made up of repeating DNA sequences. The variations of the STRs are widely studied in forensic analysis, population studies and genetic testing for a variety of neuromuscular disorders. Understanding polymorphic STR variation and its cause is crucial for deciphering genetic information and finding links to various disorders. In this paper, we present STRIDE-DB, a novel and unique platform to explore STR Instability and its Phenotypic Relevance, and a comprehensive database of STRs in the human genome. We utilized RepeatMasker to identify all the STRs in the human genome (hg19) and combined it with frequency data from the 1000 Genomes Project. STRIDE-DB, a user-friendly resource, plays a pivotal role in investigating the relationship between STR variation, instability and phenotype. By harnessing data from genome-wide association studies (GWAS), ClinVar database, Alu loci, Haploblocks in genome and Conservation of the STRs, it serves as an important tool for researchers exploring the variability of STRs in the human genome and its direct impact on phenotypes. STRIDE-DB has its broad applicability and significance in various research domains like forensic sciences and other repeat expansion disorders. Database URL: https://stridedb.igib.res.in.
Collapse
Affiliation(s)
- Bharathram Uppili
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110007, India
- CSIR-HRDC Campus, Academy for Scientific and Innovative Research, Ghaziabad 201002, India
| | - Mohammed Faruq
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110007, India
| |
Collapse
|
2
|
Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Marschall T, Li H, Paten B, Abel HJ, Antonacci-Fulton LL, Asri M, Baid G, Baker CA, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson MJP, Chang PC, Chang XH, Cheng H, Chu J, Cody S, Colonna V, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler EE, Eizenga JM, Fairley S, Fedrigo O, Felsenfeld AL, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton RS, Gao Y, Garg S, Garrison E, Garrison NA, Giron CG, Green RE, Groza C, Guarracino A, Haggerty L, Hall IM, Harvey WT, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis ED, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Li H, Liao WW, Lu S, Lu TY, Lucas JK, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin FJ, McCartney A, McDaniel J, Miga KH, Mitchell MW, Monlong J, Mountcastle J, Munson KM, Mwaniki MN, Nattestad M, Novak AM, Nurk S, Olsen HE, Olson ND, Paten B, Pesout T, Phillippy AM, Popejoy AB, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Sibbesen JA, Sirén J, Smith MW, Sofia HJ, Tayoun ANA, Thibaud-Nissen F, Tomlinson C, Tricomi FF, Villani F, Vollger MR, Wagner J, Walenz B, Wang T, Wood JMD, Zimin AV, Zook JM. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol 2024; 42:663-673. [PMID: 37165083 PMCID: PMC10638906 DOI: 10.1038/s41587-023-01793-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 04/18/2023] [Indexed: 05/12/2023]
Abstract
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
Collapse
Affiliation(s)
- Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Haley J. Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Carl A. Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, McGill University, Montreal, QC, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | | | - Mark J. P. Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | - Xian H. Chang
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - Robert M. Cook-Deegan
- Arizona State University, Barrett and O’Connor Washington Center, Washington, DC, USA
| | - Omar E. Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Daniel Doerr
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jana Ebler
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L. Felsenfeld
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Nanibaa’ A. Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Erich D. Jarvis
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E. Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A. Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | | | - Jan O. Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Julian K. Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Hugo Magalhães
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Charles Markello
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Tobias Marschall
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | | | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E. Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice B. Popejoy
- Department of Public Health Sciences, University of California, Davis, Davis, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A. Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ashley D. Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I. Schultz
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Jonas A. Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Michael W. Smith
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J. Sofia
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N. Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children’s Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Aleksey V. Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
3
|
Ji Q, Yao Y, Li Z, Zhou Z, Qian J, Tang Q, Xie J. Characterizing identity by descent segments in Chinese interpopulation unrelated individual pairs. Mol Genet Genomics 2024; 299:37. [PMID: 38494535 DOI: 10.1007/s00438-024-02132-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 02/22/2024] [Indexed: 03/19/2024]
Abstract
Identity by descent (IBD) segments, uninterrupted DNA segments derived from the same ancestral chromosomes, are widely used as indicators of relationships in genetics. A great deal of research focuses on IBD segments between related pairs, while the statistical analyses of segments in irrelevant individuals are rare. In this study, we investigated the basic informative features of IBD segments in unrelated pairs in Chinese populations from the 1000 Genome Project. A total of 5922 IBD segments in Chinese interpopulation unrelated individual pairs were detected via IBIS and the average length of IBD was 3.71 Mb in length. It was found that 17.86% of unrelated pairs shared at least one IBD segment in the Chinese cohort. Furthermore, a total of 49 chromosomal regions where IBD segments clustered in high abundance were identified, which might be sharing hotspots in the human genome. Such regions could also be observed in other ancestry populations, which implies that similar IBD backgrounds also exist. Altogether, these results demonstrated the distribution of common background IBD segments, which helps improve the accuracy in pedigree studies based on IBD analysis.
Collapse
Affiliation(s)
- Qiqi Ji
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Yining Yao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhimin Li
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhihan Zhou
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Jinglei Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Qiqun Tang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Jianhui Xie
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China.
| |
Collapse
|
4
|
Chen L, Zhang C, Xue R, Liu M, Bai J, Bao J, Wang Y, Jiang N, Li Z, Wang W, Wang R, Zheng B, Yang A, Hu J, Liu K, Shen S, Zhang Y, Bai M, Wang Y, Zhu Y, Yang S, Gao Q, Gu J, Gao D, Wang XW, Nakagawa H, Zhang N, Wu L, Rozen SG, Bai F, Wang H. Deep whole-genome analysis of 494 hepatocellular carcinomas. Nature 2024; 627:586-593. [PMID: 38355797 DOI: 10.1038/s41586-024-07054-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 01/10/2024] [Indexed: 02/16/2024]
Abstract
Over half of hepatocellular carcinoma (HCC) cases diagnosed worldwide are in China1-3. However, whole-genome analysis of hepatitis B virus (HBV)-associated HCC in Chinese individuals is limited4-8, with current analyses of HCC mainly from non-HBV-enriched populations9,10. Here we initiated the Chinese Liver Cancer Atlas (CLCA) project and performed deep whole-genome sequencing (average depth, 120×) of 494 HCC tumours. We identified 6 coding and 28 non-coding previously undescribed driver candidates. Five previously undescribed mutational signatures were found, including aristolochic-acid-associated indel and doublet base signatures, and a single-base-substitution signature that we termed SBS_H8. Pentanucleotide context analysis and experimental validation confirmed that SBS_H8 was distinct to the aristolochic-acid-associated SBS22. Notably, HBV integrations could take the form of extrachromosomal circular DNA, resulting in elevated copy numbers and gene expression. Our high-depth data also enabled us to characterize subclonal clustered alterations, including chromothripsis, chromoplexy and kataegis, suggesting that these catastrophic events could also occur in late stages of hepatocarcinogenesis. Pathway analysis of all classes of alterations further linked non-coding mutations to dysregulation of liver metabolism. Finally, we performed in vitro and in vivo assays to show that fibrinogen alpha chain (FGA), determined as both a candidate coding and non-coding driver, regulates HCC progression and metastasis. Our CLCA study depicts a detailed genomic landscape and evolutionary history of HCC in Chinese individuals, providing important clinical implications.
Collapse
Affiliation(s)
- Lei Chen
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China.
| | - Chong Zhang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), School of Life Sciences, Peking University, Beijing, China
| | - Ruidong Xue
- Peking University-Yunnan Baiyao International Medical Research Center, International Cancer Institute, Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
- Translational Cancer Research Center, Peking University First Hospital, Beijing, China
| | - Mo Liu
- Centre for Computational Biology and Programme in Cancer & Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore
| | - Jian Bai
- Berry Oncology Corporation, Beijing, China
| | - Jinxia Bao
- Model Animal Research Center, Medical School, Nanjing University, Nanjing, China
| | - Yin Wang
- Berry Oncology Corporation, Beijing, China
| | - Nanhai Jiang
- Centre for Computational Biology and Programme in Cancer & Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore
| | - Zhixuan Li
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Wenwen Wang
- The International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Ruiru Wang
- Berry Oncology Corporation, Beijing, China
| | - Bo Zheng
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China
- The International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | | | - Ji Hu
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China
- The International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Ke Liu
- Berry Oncology Corporation, Beijing, China
| | - Siyun Shen
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China
- The International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Yangqianwen Zhang
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Mixue Bai
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Yan Wang
- Berry Oncology Corporation, Beijing, China
| | - Yanjing Zhu
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China
- The International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Shuai Yang
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China
- The International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Qiang Gao
- Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Jin Gu
- MOE Key Laboratory for Bioinformatics, Department of Automation, Tsinghua University, Beijing, China
| | - Dong Gao
- State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, CAS, Shanghai, China
| | - Xin Wei Wang
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Hidewaki Nakagawa
- Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Ning Zhang
- Peking University-Yunnan Baiyao International Medical Research Center, International Cancer Institute, Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
- Translational Cancer Research Center, Peking University First Hospital, Beijing, China
| | - Lin Wu
- Berry Oncology Corporation, Beijing, China.
| | - Steven G Rozen
- Centre for Computational Biology and Programme in Cancer & Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore.
| | - Fan Bai
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), School of Life Sciences, Peking University, Beijing, China.
| | - Hongyang Wang
- National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China.
| |
Collapse
|
5
|
Merav M, Bitensky EM, Heilbrun EE, Hacohen T, Kirshenbaum A, Golan-Berman H, Cohen Y, Adar S. Gene architecture is a determinant of the transcriptional response to bulky DNA damages. Life Sci Alliance 2024; 7:e202302328. [PMID: 38167611 PMCID: PMC10761554 DOI: 10.26508/lsa.202302328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 12/19/2023] [Accepted: 12/21/2023] [Indexed: 01/05/2024] Open
Abstract
Bulky DNA damages block transcription and compromise genome integrity and function. The cellular response to these damages includes global transcription shutdown. Still, active transcription is necessary for transcription-coupled repair and for induction of damage-response genes. To uncover common features of a general bulky DNA damage response, and to identify response-related transcripts that are expressed despite damage, we performed a systematic RNA-seq study comparing the transcriptional response to three independent damage-inducing agents: UV, the chemotherapy cisplatin, and benzo[a]pyrene, a component of cigarette smoke. Reduction in gene expression after damage was associated with higher damage rates, longer gene length, and low GC content. We identified genes with relatively higher expression after all three damage treatments, including NR4A2, a potential novel damage-response transcription factor. Up-regulated genes exhibit higher exon content that is associated with preferential repair, which could enable rapid damage removal and transcription restoration. The attenuated response to BPDE highlights that not all bulky damages elicit the same response. These findings frame gene architecture as a major determinant of the transcriptional response that is hardwired into the human genome.
Collapse
Affiliation(s)
- May Merav
- https://ror.org/03qxff017 Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
| | - Elnatan M Bitensky
- https://ror.org/03qxff017 Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
| | - Elisheva E Heilbrun
- https://ror.org/03qxff017 Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
| | - Tamar Hacohen
- https://ror.org/03qxff017 Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
| | - Ayala Kirshenbaum
- https://ror.org/03qxff017 Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
| | - Hadar Golan-Berman
- https://ror.org/03qxff017 Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
| | - Yuval Cohen
- https://ror.org/03qxff017 Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
| | - Sheera Adar
- https://ror.org/03qxff017 Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel Canada, Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
| |
Collapse
|
6
|
Bick AG, Metcalf GA, Mayo KR, Lichtenstein L, Rura S, Carroll RJ, Musick A, Linder JE, Jordan IK, Nagar SD, Sharma S, Meller R, Basford M, Boerwinkle E, Cicek MS, Doheny KF, Eichler EE, Gabriel S, Gibbs RA, Glazer D, Harris PA, Jarvik GP, Philippakis A, Rehm HL, Roden DM, Thibodeau SN, Topper S, Blegen AL, Wirkus SJ, Wagner VA, Meyer JG, Cicek MS, Muzny DM, Venner E, Mawhinney MZ, Griffith SML, Hsu E, Ling H, Adams MK, Walker K, Hu J, Doddapaneni H, Kovar CL, Murugan M, Dugan S, Khan Z, Boerwinkle E, Lennon NJ, Austin-Tse C, Banks E, Gatzen M, Gupta N, Henricks E, Larsson K, McDonough S, Harrison SM, Kachulis C, Lebo MS, Neben CL, Steeves M, Zhou AY, Smith JD, Frazar CD, Davis CP, Patterson KE, Wheeler MM, McGee S, Lockwood CM, Shirts BH, Pritchard CC, Murray ML, Vasta V, Leistritz D, Richardson MA, Buchan JG, Radhakrishnan A, Krumm N, Ehmen BW, Schwartz S, Aster MMT, Cibulskis K, Haessly A, Asch R, Cremer A, Degatano K, Shergill A, Gauthier LD, Lee SK, Hatcher A, Grant GB, Brandt GR, Covarrubias M, Banks E, Able A, Green AE, Carroll RJ, Zhang J, Condon HR, Wang Y, Dillon MK, Albach CH, Baalawi W, Choi SH, Wang X, Rosenthal EA, Ramirez AH, Lim S, Nambiar S, Ozenberger B, Wise AL, Lunt C, Ginsburg GS, Denny JC. Genomic data in the All of Us Research Program. Nature 2024; 627:340-346. [PMID: 38374255 PMCID: PMC10937371 DOI: 10.1038/s41586-023-06957-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 12/08/2023] [Indexed: 02/21/2024]
Abstract
Comprehensively mapping the genetic basis of human disease across diverse individuals is a long-standing goal for the field of human genetics1-4. The All of Us Research Program is a longitudinal cohort study aiming to enrol a diverse group of at least one million individuals across the USA to accelerate biomedical research and improve human health5,6. Here we describe the programme's genomics data release of 245,388 clinical-grade genome sequences. This resource is unique in its diversity as 77% of participants are from communities that are historically under-represented in biomedical research and 46% are individuals from under-represented racial and ethnic minorities. All of Us identified more than 1 billion genetic variants, including more than 275 million previously unreported genetic variants, more than 3.9 million of which had coding consequences. Leveraging linkage between genomic data and the longitudinal electronic health record, we evaluated 3,724 genetic variants associated with 117 diseases and found high replication rates across both participants of European ancestry and participants of African ancestry. Summary-level data are publicly available, and individual-level data can be accessed by researchers through the All of Us Researcher Workbench using a unique data passport model with a median time from initial researcher registration to data access of 29 hours. We anticipate that this diverse dataset will advance the promise of genomic medicine for all.
Collapse
|
7
|
Buffalo V, Kern AD. A quantitative genetic model of background selection in humans. PLoS Genet 2024; 20:e1011144. [PMID: 38507461 PMCID: PMC10984650 DOI: 10.1371/journal.pgen.1011144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 04/01/2024] [Accepted: 01/19/2024] [Indexed: 03/22/2024] Open
Abstract
Across the human genome, there are large-scale fluctuations in genetic diversity caused by the indirect effects of selection. This "linked selection signal" reflects the impact of selection according to the physical placement of functional regions and recombination rates along chromosomes. Previous work has shown that purifying selection acting against the steady influx of new deleterious mutations at functional portions of the genome shapes patterns of genomic variation. To date, statistical efforts to estimate purifying selection parameters from linked selection models have relied on classic Background Selection theory, which is only applicable when new mutations are so deleterious that they cannot fix in the population. Here, we develop a statistical method based on a quantitative genetics view of linked selection, that models how polygenic additive fitness variance distributed along the genome increases the rate of stochastic allele frequency change. By jointly predicting the equilibrium fitness variance and substitution rate due to both strong and weakly deleterious mutations, we estimate the distribution of fitness effects (DFE) and mutation rate across three geographically distinct human samples. While our model can accommodate weaker selection, we find evidence of strong selection operating similarly across all human samples. Although our quantitative genetic model of linked selection fits better than previous models, substitution rates of the most constrained sites disagree with observed divergence levels. We find that a model incorporating selective interference better predicts observed divergence in conserved regions, but overall our results suggest uncertainty remains about the processes generating fitness variation in humans.
Collapse
Affiliation(s)
- Vince Buffalo
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| | - Andrew D. Kern
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| |
Collapse
|
8
|
Zheng Y, Li Y, Zhou K, Li T, VanDusen NJ, Hua Y. Precise genome-editing in human diseases: mechanisms, strategies and applications. Signal Transduct Target Ther 2024; 9:47. [PMID: 38409199 PMCID: PMC10897424 DOI: 10.1038/s41392-024-01750-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 02/28/2024] Open
Abstract
Precise genome-editing platforms are versatile tools for generating specific, site-directed DNA insertions, deletions, and substitutions. The continuous enhancement of these tools has led to a revolution in the life sciences, which promises to deliver novel therapies for genetic disease. Precise genome-editing can be traced back to the 1950s with the discovery of DNA's double-helix and, after 70 years of development, has evolved from crude in vitro applications to a wide range of sophisticated capabilities, including in vivo applications. Nonetheless, precise genome-editing faces constraints such as modest efficiency, delivery challenges, and off-target effects. In this review, we explore precise genome-editing, with a focus on introduction of the landmark events in its history, various platforms, delivery systems, and applications. First, we discuss the landmark events in the history of precise genome-editing. Second, we describe the current state of precise genome-editing strategies and explain how these techniques offer unprecedented precision and versatility for modifying the human genome. Third, we introduce the current delivery systems used to deploy precise genome-editing components through DNA, RNA, and RNPs. Finally, we summarize the current applications of precise genome-editing in labeling endogenous genes, screening genetic variants, molecular recording, generating disease models, and gene therapy, including ex vivo therapy and in vivo therapy, and discuss potential future advances.
Collapse
Affiliation(s)
- Yanjiang Zheng
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Yifei Li
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Kaiyu Zhou
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Tiange Li
- Department of Cardiovascular Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Nathan J VanDusen
- Department of Pediatrics, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.
| | - Yimin Hua
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, 610041, China.
| |
Collapse
|
9
|
Matsushima W, Planet E, Trono D. Ancestral genome reconstruction enhances transposable element annotation by identifying degenerate integrants. Cell Genom 2024; 4:100497. [PMID: 38295789 PMCID: PMC10879028 DOI: 10.1016/j.xgen.2024.100497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 08/09/2023] [Accepted: 01/06/2024] [Indexed: 02/17/2024]
Abstract
Growing evidence indicates that transposable elements (TEs) play important roles in evolution by providing genomes with coding and non-coding sequences. Identification of TE-derived functional elements, however, has relied on TE annotations in individual species, which limits its scope to relatively intact TE sequences. Here, we report a novel approach to uncover previously unannotated degenerate TEs (degTEs) by probing multiple ancestral genomes reconstructed from hundreds of species. We applied this method to the human genome and achieved a 10.8% increase in coverage over the most recent annotation. Further, we discovered that degTEs contribute to various cis-regulatory elements and transcription factor binding sites, including those of a known TE-controlling family, the KRAB zinc-finger proteins. We also report unannotated chimeric transcripts between degTEs and human genes expressed in embryos. This study provides a novel methodology and a freely available resource that will facilitate the investigation of TE co-option events on a full scale.
Collapse
Affiliation(s)
- Wayo Matsushima
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.
| | - Evarist Planet
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Didier Trono
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.
| |
Collapse
|
10
|
Siau JW, Siddiqui AA, Lau SY, Kannan S, Peter S, Zeng Y, Verma C, Droge P, Ghadessy JF. Expanding the DNA editing toolbox: Novel lambda integrase variants targeting microalgal and human genome sequences. PLoS One 2024; 19:e0292479. [PMID: 38349923 PMCID: PMC10863862 DOI: 10.1371/journal.pone.0292479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 01/26/2024] [Indexed: 02/15/2024] Open
Abstract
Recombinase enzymes are extremely efficient at integrating very large DNA fragments into target genomes. However, intrinsic sequence specificities curtail their use to DNA sequences with sufficient homology to endogenous target motifs. Extensive engineering is therefore required to broaden applicability and robustness. Here, we describe the directed evolution of novel lambda integrase variants capable of editing exogenous target sequences identified in the diatom Phaeodactylum tricornutum and the algae Nannochloropsis oceanica. These microorganisms hold great promise as conduits for green biomanufacturing and carbon sequestration. The evolved enzyme variants show >1000-fold switch in specificity towards the non-natural target sites when assayed in vitro. A single-copy target motif in the human genome with homology to the Nannochloropsis oceanica site can also be efficiently targeted using an engineered integrase, both in vitro and in human cells. The developed integrase variants represent useful additions to the DNA editing toolbox, with particular application for targeted genomic insertion of large DNA cargos.
Collapse
Affiliation(s)
- Jia Wei Siau
- Protein and Peptide Engineering Research Laboratory, Institute of Molecular and Cell Biology, Agency for Science Technology and Research, Singapore, Singapore
| | - Asim Azhar Siddiqui
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Sze Yi Lau
- Protein and Peptide Engineering Research Laboratory, Institute of Molecular and Cell Biology, Agency for Science Technology and Research, Singapore, Singapore
| | | | - Sabrina Peter
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Yingying Zeng
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Chandra Verma
- Bioinformatics Institute, Agency for Science Technology and Research, Singapore, Singapore
| | - Peter Droge
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
- LambdaGen Pte. Ltd., Singapore, Singapore
| | - John F. Ghadessy
- Protein and Peptide Engineering Research Laboratory, Institute of Molecular and Cell Biology, Agency for Science Technology and Research, Singapore, Singapore
| |
Collapse
|
11
|
Huang S, Liu S, Huang M, He JR, Wang C, Wang T, Feng X, Kuang Y, Lu J, Gu Y, Xia X, Lin S, Zhou W, Fu Q, Xia H, Qiu X. The Born in Guangzhou Cohort Study enables generational genetic discoveries. Nature 2024; 626:565-573. [PMID: 38297123 DOI: 10.1038/s41586-023-06988-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 12/15/2023] [Indexed: 02/02/2024]
Abstract
Genomic research that targets large-scale, prospective birth cohorts constitutes an essential strategy for understanding the influence of genetics and environment on human health1. Nonetheless, such studies remain scarce, particularly in Asia. Here we present the phase I genome study of the Born in Guangzhou Cohort Study2 (BIGCS), which encompasses the sequencing and analysis of 4,053 Chinese individuals, primarily composed of trios or mother-infant duos residing in South China. Our analysis reveals novel genetic variants, a high-quality reference panel, and fine-scale local genetic structure within BIGCS. Notably, we identify previously unreported East Asian-specific genetic associations with maternal total bile acid, gestational weight gain and infant cord blood traits. Additionally, we observe prevalent age-specific genetic effects on lipid levels in mothers and infants. In an exploratory intergenerational Mendelian randomization analysis, we estimate the maternal putatively causal and fetal genetic effects of seven adult phenotypes on seven fetal growth-related measurements. These findings illuminate the genetic links between maternal and early-life traits in an East Asian population and lay the groundwork for future research into the intricate interplay of genetics, intrauterine exposures and early-life experiences in shaping long-term health.
Collapse
Affiliation(s)
- Shujia Huang
- Division of Birth Cohort Study, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
| | - Siyang Liu
- School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
| | - Mingxi Huang
- Division of Birth Cohort Study, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
| | - Jian-Rong He
- Division of Birth Cohort Study, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
- Provincial Clinical Research Center for Child Health, Guangzhou, China
- Department of Obstetrics and Gynecology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
| | - Chengrui Wang
- Division of Birth Cohort Study, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
| | - Tianyi Wang
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Xiaotian Feng
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Yashu Kuang
- Division of Birth Cohort Study, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
- Provincial Clinical Research Center for Child Health, Guangzhou, China
| | - Jinhua Lu
- Division of Birth Cohort Study, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
- Provincial Clinical Research Center for Child Health, Guangzhou, China
| | - Yuqin Gu
- School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
| | - Xiaoyan Xia
- Division of Birth Cohort Study, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
- Department of Women's Health, Provincial Key Clinical Specialty of Woman and Child Health, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
| | - Shanshan Lin
- Division of Birth Cohort Study, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
- Department of Women's Health, Provincial Key Clinical Specialty of Woman and Child Health, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
| | - Wenhao Zhou
- Division of Neonatology and Center for Newborn Care, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
| | - Qiaomei Fu
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Huimin Xia
- Provincial Clinical Research Center for Child Health, Guangzhou, China.
- Provincial Key Laboratory of Research in Structure Birth Defect Disease, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
- Department of Pediatric Surgery, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
| | - Xiu Qiu
- Division of Birth Cohort Study, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
- Provincial Clinical Research Center for Child Health, Guangzhou, China.
- Department of Women's Health, Provincial Key Clinical Specialty of Woman and Child Health, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
| |
Collapse
|
12
|
Zhang Y, Zhang H, Wu Y. A general approach for inferring the ancestry of recent ancestors of an admixed individual. Proc Natl Acad Sci U S A 2024; 121:e2316242120. [PMID: 38165936 PMCID: PMC10786287 DOI: 10.1073/pnas.2316242120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 11/27/2023] [Indexed: 01/04/2024] Open
Abstract
The genome of an individual from an admixed population consists of segments originated from different ancestral populations. Most existing ancestry inference approaches focus on calling these segments for the extant individual. In this paper, we present a general ancestry inference approach for inferring recent ancestors from an extant genome. Given the genome of an individual from a recently admixed population, our method can estimate the proportions of the genomes of the recent ancestors of this individual that originated from some ancestral populations. The key step of our method is the inference of ancestors (called founders) right after the formation of an admixed population. The inferred founders can then be used to infer the ancestry of recent ancestors of an extant individual. Our method is implemented in a computer program called PedMix2. To the best of our knowledge, there is no existing method that can practically infer ancestors beyond grandparents from an extant individual's genome. Results on both simulated and real data show that PedMix2 performs well in ancestry inference.
Collapse
Affiliation(s)
- Yiming Zhang
- School of Computing, College of Engineering, University of Connecticut, Storrs, CT06269
| | - Haotian Zhang
- School of Computing, College of Engineering, University of Connecticut, Storrs, CT06269
| | - Yufeng Wu
- School of Computing, College of Engineering, University of Connecticut, Storrs, CT06269
| |
Collapse
|
13
|
Vasiliou V. Celebrating 20 years of human genomics: a journey of discovery. Hum Genomics 2024; 18:1. [PMID: 38163870 PMCID: PMC10759603 DOI: 10.1186/s40246-023-00569-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024] Open
Affiliation(s)
- Vasilis Vasiliou
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, USA.
| |
Collapse
|
14
|
Nogrady B. Australian Indigenous genomes are highly diverse and unlike those anywhere else. Nature 2024; 625:15-16. [PMID: 38093071 DOI: 10.1038/d41586-023-04006-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
|
15
|
Allentoft ME, Sikora M, Fischer A, Sjögren KG, Ingason A, Macleod R, Rosengren A, Schulz Paulsson B, Jørkov MLS, Novosolov M, Stenderup J, Price TD, Fischer Mortensen M, Nielsen AB, Ulfeldt Hede M, Sørensen L, Nielsen PO, Rasmussen P, Jensen TZT, Refoyo-Martínez A, Irving-Pease EK, Barrie W, Pearson A, Sousa da Mota B, Demeter F, Henriksen RA, Vimala T, McColl H, Vaughn A, Vinner L, Renaud G, Stern A, Johannsen NN, Ramsøe AD, Schork AJ, Ruter A, Gotfredsen AB, Henning Nielsen B, Brinch Petersen E, Kannegaard E, Hansen J, Buck Pedersen K, Pedersen L, Klassen L, Meldgaard M, Johansen M, Uldum OC, Lotz P, Lysdahl P, Bangsgaard P, Petersen PV, Maring R, Iversen R, Wåhlin S, Anker Sørensen S, Andersen SH, Jørgensen T, Lynnerup N, Lawson DJ, Rasmussen S, Korneliussen TS, Kjær KH, Durbin R, Nielsen R, Delaneau O, Werge T, Kristiansen K, Willerslev E. 100 ancient genomes show repeated population turnovers in Neolithic Denmark. Nature 2024; 625:329-337. [PMID: 38200294 PMCID: PMC10781617 DOI: 10.1038/s41586-023-06862-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 11/13/2023] [Indexed: 01/12/2024]
Abstract
Major migration events in Holocene Eurasia have been characterized genetically at broad regional scales1-4. However, insights into the population dynamics in the contact zones are hampered by a lack of ancient genomic data sampled at high spatiotemporal resolution5-7. Here, to address this, we analysed shotgun-sequenced genomes from 100 skeletons spanning 7,300 years of the Mesolithic period, Neolithic period and Early Bronze Age in Denmark and integrated these with proxies for diet (13C and 15N content), mobility (87Sr/86Sr ratio) and vegetation cover (pollen). We observe that Danish Mesolithic individuals of the Maglemose, Kongemose and Ertebølle cultures form a distinct genetic cluster related to other Western European hunter-gatherers. Despite shifts in material culture they displayed genetic homogeneity from around 10,500 to 5,900 calibrated years before present, when Neolithic farmers with Anatolian-derived ancestry arrived. Although the Neolithic transition was delayed by more than a millennium relative to Central Europe, it was very abrupt and resulted in a population turnover with limited genetic contribution from local hunter-gatherers. The succeeding Neolithic population, associated with the Funnel Beaker culture, persisted for only about 1,000 years before immigrants with eastern Steppe-derived ancestry arrived. This second and equally rapid population replacement gave rise to the Single Grave culture with an ancestry profile more similar to present-day Danes. In our multiproxy dataset, these major demographic events are manifested as parallel shifts in genotype, phenotype, diet and land use.
Collapse
Affiliation(s)
- Morten E Allentoft
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, Perth, Western Australia, Australia.
| | - Martin Sikora
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
| | - Anders Fischer
- Cluster of Excellence ROOTS, Kiel University, Kiel, Germany
- Sealand Archaeology, Kalundborg, Denmark
| | - Karl-Göran Sjögren
- Department of Historical Studies, Gothenburg University, Göteborg, Sweden
| | - Andrés Ingason
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Copenhagen, Denmark
| | - Ruairidh Macleod
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK
- Research Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Anders Rosengren
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Copenhagen, Denmark
| | | | | | - Maria Novosolov
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Jesper Stenderup
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - T Douglas Price
- Laboratory for Archaeological Chemistry, Department of Anthropology, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | | | | | | | | | | | - Alba Refoyo-Martínez
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Evan K Irving-Pease
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - William Barrie
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Alice Pearson
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Bárbara Sousa da Mota
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Fabrice Demeter
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Eco-anthropologie (EA), Dpt ABBA, Muséum National d'Histoire Naturelle, CNRS, Université Paris Cité, Musée de l'Homme, Paris, France
| | - Rasmus A Henriksen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Tharsika Vimala
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Hugh McColl
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Andrew Vaughn
- Center for Computational Biology, University of California, Berkeley, USA
| | - Lasse Vinner
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Gabriel Renaud
- Department of Health Technology, Section of Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Aaron Stern
- Center for Computational Biology, University of California, Berkeley, USA
| | | | - Abigail Daisy Ramsøe
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Andrew Joseph Schork
- Laboratory of Biological Anthropology, University of Copenhagen, Copenhagen, Denmark
- Neurogenomics Division, The Translational Genomics Research Institute (TGEN), Phoenix, AZ, USA
| | - Anthony Ruter
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Anne Birgitte Gotfredsen
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | | | | | | - Morten Meldgaard
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Health and Nature, University of Greenland, Nuuk, Greenland
| | | | | | - Per Lotz
- Museum Nordsjælland, Hillerød, Denmark
- Museum Vestsjælland, Holbæk, Denmark
| | - Per Lysdahl
- Vendsyssel Historiske Museum, Hjørring, Denmark
| | - Pernille Bangsgaard
- Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Rikke Maring
- Department of Archaeology and Heritage Studies, Aarhus University, Aarhus, Denmark
- Museum Østjylland, Randers, Denmark
| | - Rune Iversen
- The Saxo Institute, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | - Niels Lynnerup
- Laboratory of Biological Anthropology, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Daniel J Lawson
- Institute of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK
| | - Simon Rasmussen
- Novo Nordisk Foundation Centre for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N, Denmark
| | | | - Kurt H Kjær
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Rasmus Nielsen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Integrative Biology and Statistics, UC Berkeley, Berkeley, CA, USA
| | - Olivier Delaneau
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Thomas Werge
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Kristian Kristiansen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Historical Studies, Gothenburg University, Göteborg, Sweden
| | - Eske Willerslev
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK.
- MARUM Center for Marine Environmental Sciences and Faculty of Geosciences, University of Bremen, Bremen, Germany.
| |
Collapse
|
16
|
Alig SK, Shahrokh Esfahani M, Garofalo A, Li MY, Rossi C, Flerlage T, Flerlage JE, Adams R, Binkley MS, Shukla N, Jin MC, Olsen M, Telenius A, Mutter JA, Schroers-Martin JG, Sworder BJ, Rai S, King DA, Schultz A, Bögeholz J, Su S, Kathuria KR, Liu CL, Kang X, Strohband MJ, Langfitt D, Pobre-Piza KF, Surman S, Tian F, Spina V, Tousseyn T, Buedts L, Hoppe R, Natkunam Y, Fornecker LM, Castellino SM, Advani R, Rossi D, Lynch R, Ghesquières H, Casasnovas O, Kurtz DM, Marks LJ, Link MP, André M, Vandenberghe P, Steidl C, Diehn M, Alizadeh AA. Distinct Hodgkin lymphoma subtypes defined by noninvasive genomic profiling. Nature 2024; 625:778-787. [PMID: 38081297 DOI: 10.1038/s41586-023-06903-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 11/28/2023] [Indexed: 01/06/2024]
Abstract
The scarcity of malignant Hodgkin and Reed-Sternberg cells hampers tissue-based comprehensive genomic profiling of classic Hodgkin lymphoma (cHL). By contrast, liquid biopsies show promise for molecular profiling of cHL due to relatively high circulating tumour DNA (ctDNA) levels1-4. Here we show that the plasma representation of mutations exceeds the bulk tumour representation in most cases, making cHL particularly amenable to noninvasive profiling. Leveraging single-cell transcriptional profiles of cHL tumours, we demonstrate Hodgkin and Reed-Sternberg ctDNA shedding to be shaped by DNASE1L3, whose increased tumour microenvironment-derived expression drives high ctDNA concentrations. Using this insight, we comprehensively profile 366 patients, revealing two distinct cHL genomic subtypes with characteristic clinical and prognostic correlates, as well as distinct transcriptional and immunological profiles. Furthermore, we identify a novel class of truncating IL4R mutations that are dependent on IL-13 signalling and therapeutically targetable with IL-4Rα-blocking antibodies. Finally, using PhasED-seq5, we demonstrate the clinical value of pretreatment and on-treatment ctDNA levels for longitudinally refining cHL risk prediction and for detection of radiographically occult minimal residual disease. Collectively, these results support the utility of noninvasive strategies for genotyping and dynamic monitoring of cHL, as well as capturing molecularly distinct subtypes with diagnostic, prognostic and therapeutic potential.
Collapse
Affiliation(s)
- Stefan K Alig
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | | | - Andrea Garofalo
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Michael Yu Li
- Centre for Lymphoid Cancer, British Columbia Cancer, Vancouver, British Columbia, Canada
| | - Cédric Rossi
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
- Hematology Department, University Hospital F. Mitterrand and Inserm UMR 1231, Dijon, France
| | - Tim Flerlage
- Department of Infectious Diseases, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Jamie E Flerlage
- Department of Oncology, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Ragini Adams
- Department of Pediatrics, Division of Pediatric Hematology and Oncology, Stanford University, Stanford, CA, USA
| | - Michael S Binkley
- Department of Radiation Oncology, Stanford University Medical Center, Stanford, CA, USA
| | - Navika Shukla
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Michael C Jin
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Mari Olsen
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Adèle Telenius
- Centre for Lymphoid Cancer, British Columbia Cancer, Vancouver, British Columbia, Canada
| | - Jurik A Mutter
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Joseph G Schroers-Martin
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Brian J Sworder
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Shinya Rai
- Centre for Lymphoid Cancer, British Columbia Cancer, Vancouver, British Columbia, Canada
| | - Daniel A King
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Andre Schultz
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Jan Bögeholz
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Shengqin Su
- Department of Radiation Oncology, Stanford University Medical Center, Stanford, CA, USA
| | - Karan R Kathuria
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Chih Long Liu
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Xiaoman Kang
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Maya J Strohband
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Deanna Langfitt
- Department of Bone Marrow Transplant and Cellular Therapy, St Jude Children's Research Hospital, Memphis, TN, USA
| | | | - Sherri Surman
- Department of Infectious Diseases, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Feng Tian
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Valeria Spina
- Laboratory of Molecular Diagnostics, Department of Medical Genetics EOLAB, Bellinzona, Switzerland
| | - Thomas Tousseyn
- Department of Imaging and Pathology, KU Leuven, Leuven, Belgium
| | | | - Richard Hoppe
- Department of Radiation Oncology, Stanford University Medical Center, Stanford, CA, USA
| | | | - Luc-Matthieu Fornecker
- Institut de Cancérologie Strasbourg Europe (ICANS) and University of Strasbourg, Strasbourg, France
| | - Sharon M Castellino
- Department of Pediatrics, Emory University, Aflac Cancer and Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, GA, USA
| | - Ranjana Advani
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Davide Rossi
- Clinic of Hematology, Oncology Institute of Southern Switzerland, Ente Ospedaliero Cantonale, Bellinzona, Switzerland
- Laboratory of Experimental Hematology, Institute of Oncology Research, Bellinzona, Switzerland
- Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland
| | - Ryan Lynch
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Hervé Ghesquières
- Department of Hematology, Centre Hospitalier Lyon Sud, Hospices Civils de Lyon, Pierre Benite, France
| | - Olivier Casasnovas
- Hematology Department, University Hospital F. Mitterrand and Inserm UMR 1231, Dijon, France
| | - David M Kurtz
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA
| | - Lianna J Marks
- Department of Pediatrics, Division of Pediatric Hematology and Oncology, Stanford University, Stanford, CA, USA
| | - Michael P Link
- Department of Pediatrics, Division of Pediatric Hematology and Oncology, Stanford University, Stanford, CA, USA
| | - Marc André
- Department of Haematology, Université Catholique de Louvain, CHU UCL Namur, Yvoir, Belgium
| | - Peter Vandenberghe
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Department of Hematology, University Hospitals Leuven, Leuven, Belgium
| | - Christian Steidl
- Centre for Lymphoid Cancer, British Columbia Cancer, Vancouver, British Columbia, Canada
| | - Maximilian Diehn
- Department of Radiation Oncology, Stanford University Medical Center, Stanford, CA, USA.
| | - Ash A Alizadeh
- Department of Medicine, Divisions of Oncology and Hematology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
17
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, Reich D. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet 2024; 56:143-151. [PMID: 38123640 PMCID: PMC10786714 DOI: 10.1038/s41588-023-01582-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 10/20/2023] [Indexed: 12/23/2023]
Abstract
Long DNA segments shared between two individuals, known as identity-by-descent (IBD), reveal recent genealogical connections. Here we introduce ancIBD, a method for identifying IBD segments in ancient human DNA (aDNA) using a hidden Markov model and imputed genotype probabilities. We demonstrate that ancIBD accurately identifies IBD segments >8 cM for aDNA data with an average depth of >0.25× for whole-genome sequencing or >1× for 1240k single nucleotide polymorphism capture data. Applying ancIBD to 4,248 ancient Eurasian individuals, we identify relatives up to the sixth degree and genealogical connections between archaeological groups. Notably, we reveal long IBD sharing between Corded Ware and Yamnaya groups, indicating that the Yamnaya herders of the Pontic-Caspian Steppe and the Steppe-related ancestry in various European Corded Ware groups share substantial co-ancestry within only a few hundred years. These results show that detecting IBD segments can generate powerful insights into the growing aDNA record, both on a small scale relevant to life stories and on a large scale relevant to major cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germany
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Iñigo Olalde
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- BIOMICs Research Group, University of the Basque Country, Vitoria-Gasteiz, Spain
- Ikerbasque-Basque Foundation of Science, Bilbao, Spain
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
18
|
Irving-Pease EK, Refoyo-Martínez A, Barrie W, Ingason A, Pearson A, Fischer A, Sjögren KG, Halgren AS, Macleod R, Demeter F, Henriksen RA, Vimala T, McColl H, Vaughn AH, Speidel L, Stern AJ, Scorrano G, Ramsøe A, Schork AJ, Rosengren A, Zhao L, Kristiansen K, Iversen AKN, Fugger L, Sudmant PH, Lawson DJ, Durbin R, Korneliussen T, Werge T, Allentoft ME, Sikora M, Nielsen R, Racimo F, Willerslev E. The selection landscape and genetic legacy of ancient Eurasians. Nature 2024; 625:312-320. [PMID: 38200293 PMCID: PMC10781624 DOI: 10.1038/s41586-023-06705-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 10/03/2023] [Indexed: 01/12/2024]
Abstract
The Holocene (beginning around 12,000 years ago) encompassed some of the most significant changes in human evolution, with far-reaching consequences for the dietary, physical and mental health of present-day populations. Using a dataset of more than 1,600 imputed ancient genomes1, we modelled the selection landscape during the transition from hunting and gathering, to farming and pastoralism across West Eurasia. We identify key selection signals related to metabolism, including that selection at the FADS cluster began earlier than previously reported and that selection near the LCT locus predates the emergence of the lactase persistence allele by thousands of years. We also find strong selection in the HLA region, possibly due to increased exposure to pathogens during the Bronze Age. Using ancient individuals to infer local ancestry tracts in over 400,000 samples from the UK Biobank, we identify widespread differences in the distribution of Mesolithic, Neolithic and Bronze Age ancestries across Eurasia. By calculating ancestry-specific polygenic risk scores, we show that height differences between Northern and Southern Europe are associated with differential Steppe ancestry, rather than selection, and that risk alleles for mood-related phenotypes are enriched for Neolithic farmer ancestry, whereas risk alleles for diabetes and Alzheimer's disease are enriched for Western hunter-gatherer ancestry. Our results indicate that ancient selection and migration were large contributors to the distribution of phenotypic diversity in present-day Europeans.
Collapse
Affiliation(s)
- Evan K Irving-Pease
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
| | - Alba Refoyo-Martínez
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - William Barrie
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK
| | - Andrés Ingason
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Roskilde, Denmark
| | - Alice Pearson
- Department of Genetics, University of Cambridge, Cambridge, UK
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - Anders Fischer
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Historical Studies, University of Gothenburg, Gothenburg, Sweden
- Sealand Archaeology, Kalundborg, Denmark
| | - Karl-Göran Sjögren
- Department of Historical Studies, University of Gothenburg, Gothenburg, Sweden
| | - Alma S Halgren
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Ruairidh Macleod
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK
- UCL Genetics Institute, University College London, London, UK
| | - Fabrice Demeter
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Eco-anthropologie, Muséum national d'Histoire naturelle, CNRS, Université Paris Cité, Musée de l'Homme, Paris, France
| | - Rasmus A Henriksen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Tharsika Vimala
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Hugh McColl
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Andrew H Vaughn
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Leo Speidel
- UCL Genetics Institute, University College London, London, UK
- Ancient Genomics Laboratory, The Francis Crick Institute, London, UK
| | - Aaron J Stern
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Gabriele Scorrano
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Abigail Ramsøe
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Andrew J Schork
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Roskilde, Denmark
- Neurogenomics Division, The Translational Genomics Research Institute (TGEN), Phoenix, AZ, USA
| | - Anders Rosengren
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Roskilde, Denmark
| | - Lei Zhao
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Kristian Kristiansen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Historical Studies, University of Gothenburg, Gothenburg, Sweden
| | - Astrid K N Iversen
- Oxford Centre for Neuroinflammation, Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
- Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Lars Fugger
- Oxford Centre for Neuroinflammation, Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
- Department of Clinical Medicine, Aarhus University Hospital, Aarhus, Denmark
- MRC Human Immunology Unit, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Daniel J Lawson
- Institute of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Thorfinn Korneliussen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Thomas Werge
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Institute of Biological Psychiatry, Mental Health Center Sct Hans, Copenhagen University Hospital, Copenhagen, Denmark
| | - Morten E Allentoft
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Science, Curtin University, Perth, Western Australia, Australia
| | - Martin Sikora
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Nielsen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
- Departments of Integrative Biology and Statistics, UC Berkeley, Berkeley, CA, USA.
| | - Fernando Racimo
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
| | - Eske Willerslev
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK.
- MARUM Center for Marine Environmental Sciences and Faculty of Geosciences, University of Bremen, Bremen, Germany.
| |
Collapse
|
19
|
Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, Alföldi J, Watts NA, Vittal C, Gauthier LD, Poterba T, Wilson MW, Tarasova Y, Phu W, Grant R, Yohannes MT, Koenig Z, Farjoun Y, Banks E, Donnelly S, Gabriel S, Gupta N, Ferriera S, Tolonen C, Novod S, Bergelson L, Roazen D, Ruano-Rubio V, Covarrubias M, Llanwarne C, Petrillo N, Wade G, Jeandet T, Munshi R, Tibbetts K, O'Donnell-Luria A, Solomonson M, Seed C, Martin AR, Talkowski ME, Rehm HL, Daly MJ, Tiao G, Neale BM, MacArthur DG, Karczewski KJ. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 2024; 625:92-100. [PMID: 38057664 DOI: 10.1038/s41586-023-06045-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 04/03/2023] [Indexed: 12/08/2023]
Abstract
The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.
Collapse
Affiliation(s)
- Siwei Chen
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
| | - Laurent C Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Julia K Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ryan L Collins
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Masahiro Kanai
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Qingbo Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Jessica Alföldi
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Christopher Vittal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Laura D Gauthier
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Timothy Poterba
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael W Wilson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Yekaterina Tarasova
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - William Phu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Riley Grant
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mary T Yohannes
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zan Koenig
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yossi Farjoun
- Richards Lab, Lady Davis Institute, Montreal, Quebec, Canada
| | - Eric Banks
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Stacey Gabriel
- Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Namrata Gupta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven Ferriera
- Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Charlotte Tolonen
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sam Novod
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Louis Bergelson
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David Roazen
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Miguel Covarrubias
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Nikelle Petrillo
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gordon Wade
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Thibault Jeandet
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ruchi Munshi
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kathleen Tibbetts
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Cotton Seed
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alicia R Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
20
|
Bergström A. Shared chromosomal segments connect ancient human societies. Nat Genet 2024; 56:10-11. [PMID: 38123641 DOI: 10.1038/s41588-023-01606-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Affiliation(s)
- Anders Bergström
- School of Biological Sciences, University of East Anglia, Norwich, UK.
| |
Collapse
|
21
|
Zhernakova DV, Wang D, Liu L, Andreu-Sánchez S, Zhang Y, Ruiz-Moreno AJ, Peng H, Plomp N, Del Castillo-Izquierdo Á, Gacesa R, Lopera-Maya EA, Temba GS, Kullaya VI, van Leeuwen SS, Xavier RJ, de Mast Q, Joosten LAB, Riksen NP, Rutten JHW, Netea MG, Sanna S, Wijmenga C, Weersma RK, Zhernakova A, Harmsen HJM, Fu J. Host genetic regulation of human gut microbial structural variation. Nature 2024; 625:813-821. [PMID: 38172637 PMCID: PMC10808065 DOI: 10.1038/s41586-023-06893-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 11/23/2023] [Indexed: 01/05/2024]
Abstract
Although the impact of host genetics on gut microbial diversity and the abundance of specific taxa is well established1-6, little is known about how host genetics regulates the genetic diversity of gut microorganisms. Here we conducted a meta-analysis of associations between human genetic variation and gut microbial structural variation in 9,015 individuals from four Dutch cohorts. Strikingly, the presence rate of a structural variation segment in Faecalibacterium prausnitzii that harbours an N-acetylgalactosamine (GalNAc) utilization gene cluster is higher in individuals who secrete the type A oligosaccharide antigen terminating in GalNAc, a feature that is jointly determined by human ABO and FUT2 genotypes, and we could replicate this association in a Tanzanian cohort. In vitro experiments demonstrated that GalNAc can be used as the sole carbohydrate source for F. prausnitzii strains that carry the GalNAc-metabolizing pathway. Further in silico and in vitro studies demonstrated that other ABO-associated species can also utilize GalNAc, particularly Collinsella aerofaciens. The GalNAc utilization genes are also associated with the host's cardiometabolic health, particularly in individuals with mucosal A-antigen. Together, the findings of our study demonstrate that genetic associations across the human genome and bacterial metagenome can provide functional insights into the reciprocal host-microbiome relationship.
Collapse
Affiliation(s)
- Daria V Zhernakova
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
| | - Daoming Wang
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
- University of Groningen, University Medical Center Groningen, Department of Pediatrics, Groningen, The Netherlands
| | - Lei Liu
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands
| | - Sergio Andreu-Sánchez
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
- University of Groningen, University Medical Center Groningen, Department of Pediatrics, Groningen, The Netherlands
| | - Yue Zhang
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
- University of Groningen, University Medical Center Groningen, Department of Pediatrics, Groningen, The Netherlands
| | - Angel J Ruiz-Moreno
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
- University of Groningen, University Medical Center Groningen, Department of Pediatrics, Groningen, The Netherlands
| | - Haoran Peng
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
| | - Niels Plomp
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands
- University of Groningen, University Medical Center Groningen, Department of Gastroenterology and Hepatology, Groningen, The Netherlands
| | - Ángela Del Castillo-Izquierdo
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands
| | - Ranko Gacesa
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
- University of Groningen, University Medical Center Groningen, Department of Gastroenterology and Hepatology, Groningen, The Netherlands
| | - Esteban A Lopera-Maya
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
| | - Godfrey S Temba
- Department of Internal Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Medical Biochemistry and Molecular Biology, Kilimanjaro Christian Medical University College, Moshi, Tanzania
- Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Vesla I Kullaya
- Department of Medical Biochemistry and Molecular Biology, Kilimanjaro Christian Medical University College, Moshi, Tanzania
- Kilimanjaro Clinical Research Institute, Kilimanjaro Christian Medical Center, Moshi, Tanzania
| | - Sander S van Leeuwen
- University of Groningen, University Medical Center Groningen, Department of Laboratory Medicine, Groningen, The Netherlands
| | - Ramnik J Xavier
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Computational and Integrative Biology, Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA
| | - Quirijn de Mast
- Department of Internal Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Leo A B Joosten
- Department of Internal Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Medical Genetics, Iuliu Haţieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Niels P Riksen
- Department of Internal Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Joost H W Rutten
- Department of Internal Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Mihai G Netea
- Department of Internal Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
- Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Immunology and Metabolism, Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
- Human Genomics Laboratory, Craiova University of Medicine and Pharmacy, Craiova, Romania
| | - Serena Sanna
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
- Institute for Genetic and Biomedical Research, National Research Council, Cagliari, Italy
| | - Cisca Wijmenga
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
| | - Rinse K Weersma
- University of Groningen, University Medical Center Groningen, Department of Gastroenterology and Hepatology, Groningen, The Netherlands
| | - Alexandra Zhernakova
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands
| | - Hermie J M Harmsen
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands.
| | - Jingyuan Fu
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands.
- University of Groningen, University Medical Center Groningen, Department of Pediatrics, Groningen, The Netherlands.
| |
Collapse
|
22
|
Ren X, Yang H, Nierenberg JL, Sun Y, Chen J, Beaman C, Pham T, Nobuhara M, Takagi MA, Narayan V, Li Y, Ziv E, Shen Y. High-throughput PRIME-editing screens identify functional DNA variants in the human genome. Mol Cell 2023; 83:4633-4645.e9. [PMID: 38134886 PMCID: PMC10766087 DOI: 10.1016/j.molcel.2023.11.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 10/07/2023] [Accepted: 11/16/2023] [Indexed: 12/24/2023]
Abstract
Despite tremendous progress in detecting DNA variants associated with human disease, interpreting their functional impact in a high-throughput and single-base resolution manner remains challenging. Here, we develop a pooled prime-editing screen method, PRIME, that can be applied to characterize thousands of coding and non-coding variants in a single experiment with high reproducibility. To showcase its applications, we first identified essential nucleotides for a 716 bp MYC enhancer via PRIME-mediated single-base resolution analysis. Next, we applied PRIME to functionally characterize 1,304 genome-wide association study (GWAS)-identified non-coding variants associated with breast cancer and 3,699 variants from ClinVar. We discovered that 103 non-coding variants and 156 variants of uncertain significance are functional via affecting cell fitness. Collectively, we demonstrate that PRIME is capable of characterizing genetic variants at single-base resolution and scale, advancing accurate genome annotation for disease risk prediction, diagnosis, and therapeutic target identification.
Collapse
Affiliation(s)
- Xingjie Ren
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Han Yang
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Jovia L Nierenberg
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
| | - Yifan Sun
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Cooper Beaman
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Thu Pham
- Pharmaceutical Sciences and Pharmacogenomics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Mai Nobuhara
- Pharmaceutical Sciences and Pharmacogenomics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Maya Asami Takagi
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Vivek Narayan
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Department of Genetics, University of North Carolina, Chapel Hill, NC, USA; Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - Elad Ziv
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Division of General Internal Medicine, Department of Medicine, and Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Yin Shen
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA; Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
23
|
LoTempio J, Delot E, Vilain E. Benchmarking long-read genome sequence alignment tools for human genomics applications. PeerJ 2023; 11:e16515. [PMID: 38130927 PMCID: PMC10734412 DOI: 10.7717/peerj.16515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/02/2023] [Indexed: 12/23/2023] Open
Abstract
Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform.
Collapse
Affiliation(s)
- Jonathan LoTempio
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| | - Emmanuele Delot
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, United States of America
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, United States of America
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| |
Collapse
|
24
|
Capps B, Chadwick R, Lederman Z, Lysaght T, Mills C, Mulvihill JJ, Oetting WS, Winship I. The Human Genome Organisation (HUGO) and a vision for Ecogenomics: the Ecological Genome Project. Hum Genomics 2023; 17:115. [PMID: 38111041 PMCID: PMC10726505 DOI: 10.1186/s40246-023-00560-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/30/2023] [Indexed: 12/20/2023] Open
Abstract
BACKGROUND The following outlines ethical reasons for widening the Human Genome Organisation's (HUGO) mandate to include ecological genomics. MAIN: The environment influences an organism's genome through ambient factors in the biosphere (e.g. climate and UV radiation), as well as the agents it comes into contact with, i.e. the epigenetic and mutagenic effects of inanimate chemicals and pollution, and pathogenic organisms. Emerging scientific consensus is that social determinants of health, environmental conditions and genetic factors work together to influence the risk of many complex illnesses. That paradigm can also explain the environmental and ecological determinants of health as factors that underlie the (un)healthy ecosystems on which communities rely. We suggest that The Ecological Genome Project is an aspirational opportunity to explore connections between the human genome and nature. We propose consolidating a view of Ecogenomics to provide a blueprint to respond to the environmental challenges that societies face. This can only be achieved by interdisciplinary engagement between genomics and the broad field of ecology and related practice of conservation. In this respect, the One Health approach is a model for environmental orientated work. The idea of Ecogenomics-a term that has been used to relate to a scientific field of ecological genomics-becomes the conceptual study of genomes within the social and natural environment. CONCLUSION The HUGO Committee on Ethics, Law and Society (CELS) recommends that an interdisciplinary One Health approach should be adopted in genomic sciences to promote ethical environmentalism. This perspective has been reviewed and endorsed by the HUGO CELS and the HUGO Executive Board.
Collapse
Affiliation(s)
- Benjamin Capps
- Department of Bioethics, Dalhousie University, 5849 University Avenue, CRC Building, Room C-312, PO Box 15000, Halifax, NS, B3H 4R2, Canada.
| | | | | | - Tamra Lysaght
- National University of Singapore, Singapore, Singapore
| | | | - John J Mulvihill
- University of Oklahoma Health Sciences Center, Oklahoma City, USA
| | | | | |
Collapse
|
25
|
Abstract
Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R2 = 0.97243 for baseline vcfeval to 0.99996 for vcfdist.
Collapse
Affiliation(s)
- Tim Dunn
- Computer Science and Engineering, University of Michigan, 2260 Hayward Street, Ann Arbor, MI, 48109, USA.
| | - Satish Narayanasamy
- Computer Science and Engineering, University of Michigan, 2260 Hayward Street, Ann Arbor, MI, 48109, USA
| |
Collapse
|
26
|
Reis ALM, Rapadas M, Hammond JM, Gamaarachchi H, Stevanovski I, Ayuputeri Kumaheri M, Chintalaphani SR, Dissanayake DSB, Siggs OM, Hewitt AW, Llamas B, Brown A, Baynam G, Mann GJ, McMorran BJ, Easteal S, Hermes A, Jenkins MR, Patel HR, Deveson IW. The landscape of genomic structural variation in Indigenous Australians. Nature 2023; 624:602-610. [PMID: 38093003 PMCID: PMC10733147 DOI: 10.1038/s41586-023-06842-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 11/07/2023] [Indexed: 12/20/2023]
Abstract
Indigenous Australians harbour rich and unique genomic diversity. However, Aboriginal and Torres Strait Islander ancestries are historically under-represented in genomics research and almost completely missing from reference datasets1-3. Addressing this representation gap is critical, both to advance our understanding of global human genomic diversity and as a prerequisite for ensuring equitable outcomes in genomic medicine. Here we apply population-scale whole-genome long-read sequencing4 to profile genomic structural variation across four remote Indigenous communities. We uncover an abundance of large insertion-deletion variants (20-49 bp; n = 136,797), structural variants (50 b-50 kb; n = 159,912) and regions of variable copy number (>50 kb; n = 156). The majority of variants are composed of tandem repeat or interspersed mobile element sequences (up to 90%) and have not been previously annotated (up to 62%). A large fraction of structural variants appear to be exclusive to Indigenous Australians (12% lower-bound estimate) and most of these are found in only a single community, underscoring the need for broad and deep sampling to achieve a comprehensive catalogue of genomic structural variation across the Australian continent. Finally, we explore short tandem repeats throughout the genome to characterize allelic diversity at 50 known disease loci5, uncover hundreds of novel repeat expansion sites within protein-coding genes, and identify unique patterns of diversity and constraint among short tandem repeat sequences. Our study sheds new light on the dimensions and dynamics of genomic structural variation within and beyond Australia.
Collapse
Affiliation(s)
- Andre L M Reis
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Melissa Rapadas
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Jillian M Hammond
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Hasindu Gamaarachchi
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- School of Computer Science and Engineering, University of New South Wales, Sydney, New South Wales, Australia
| | - Igor Stevanovski
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Meutia Ayuputeri Kumaheri
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
| | - Sanjog R Chintalaphani
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Duminda S B Dissanayake
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Owen M Siggs
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia
- Department of Ophthalmology, Flinders University, Bedford Park, South Australia, Australia
| | - Alex W Hewitt
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - Bastien Llamas
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Australian Centre for Ancient DNA, School of Biological Sciences and Environment Institute, University of Adelaide, Adelaide, South Australia, Australia
- ARC Centre of Excellence for Australian Biodiversity and Heritage, University of Adelaide, Adelaide, South Australia, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, South Australia, Australia
| | - Alex Brown
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, South Australia, Australia
| | - Gareth Baynam
- Telethon Kids Institute and Division of Paediatrics, Faculty of Health and Medical Sciences, University of Western Australia, Perth, Western Australia, Australia
- Genetic Services of Western Australia, Western Australian Department of Health, Perth, Western Australia, Australia
- Western Australian Register of Developmental Anomalies, Western Australian Department of Health, Perth, Western Australia, Australia
| | - Graham J Mann
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Brendan J McMorran
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Simon Easteal
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Azure Hermes
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Misty R Jenkins
- Immunology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Hardip R Patel
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia.
| | - Ira W Deveson
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales, Australia.
- Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
27
|
Callaway E. World's biggest set of human genome sequences opens to scientists. Nature 2023; 624:16-17. [PMID: 38036674 DOI: 10.1038/d41586-023-03763-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
|
28
|
Choo ZN, Behr JM, Deshpande A, Hadi K, Yao X, Tian H, Takai K, Zakusilo G, Rosiene J, Da Cruz Paula A, Weigelt B, Setton J, Riaz N, Powell SN, Busam K, Shoushtari AN, Ariyan C, Reis-Filho J, de Lange T, Imieliński M. Most large structural variants in cancer genomes can be detected without long reads. Nat Genet 2023; 55:2139-2148. [PMID: 37945902 PMCID: PMC10703688 DOI: 10.1038/s41588-023-01540-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 09/19/2023] [Indexed: 11/12/2023]
Abstract
Short-read sequencing is the workhorse of cancer genomics yet is thought to miss many structural variants (SVs), particularly large chromosomal alterations. To characterize missing SVs in short-read whole genomes, we analyzed 'loose ends'-local violations of mass balance between adjacent DNA segments. In the landscape of loose ends across 1,330 high-purity cancer whole genomes, most large (>10-kb) clonal SVs were fully resolved by short reads in the 87% of the human genome where copy number could be reliably measured. Some loose ends represent neotelomeres, which we propose as a hallmark of the alternative lengthening of telomeres phenotype. These pan-cancer findings were confirmed by long-molecule profiles of 38 breast cancer and melanoma cases. Our results indicate that aberrant homologous recombination is unlikely to drive the majority of large cancer SVs. Furthermore, analysis of mass balance in short-read whole genome data provides a surprisingly complete picture of cancer chromosomal structure.
Collapse
Affiliation(s)
- Zi-Ning Choo
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional MD PhD Program, Weill Cornell Medicine, New York, NY, USA
- Physiology and Biophysics PhD Program, Weill Cornell Medicine, New York, NY, USA
| | - Julie M Behr
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Aditya Deshpande
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Kevin Hadi
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Physiology and Biophysics PhD Program, Weill Cornell Medicine, New York, NY, USA
| | - Xiaotong Yao
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Huasong Tian
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Kaori Takai
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - George Zakusilo
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - Joel Rosiene
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | | | - Britta Weigelt
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Jeremy Setton
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Nadeem Riaz
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Simon N Powell
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Klaus Busam
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | | | | | - Titia de Lange
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - Marcin Imieliński
- New York Genome Center, New York, NY, USA.
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.
- Perlmutter Cancer Center, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
29
|
Silcocks M, Farlow A, Hermes A, Tsambos G, Patel HR, Huebner S, Baynam G, Jenkins MR, Vukcevic D, Easteal S, Leslie S. Indigenous Australian genomes show deep structure and rich novel variation. Nature 2023; 624:593-601. [PMID: 38093005 PMCID: PMC10733150 DOI: 10.1038/s41586-023-06831-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 11/03/2023] [Indexed: 12/20/2023]
Abstract
The Indigenous peoples of Australia have a rich linguistic and cultural history. How this relates to genetic diversity remains largely unknown because of their limited engagement with genomic studies. Here we analyse the genomes of 159 individuals from four remote Indigenous communities, including people who speak a language (Tiwi) not from the most widespread family (Pama-Nyungan). This large collection of Indigenous Australian genomes was made possible by careful community engagement and consultation. We observe exceptionally strong population structure across Australia, driven by divergence times between communities of 26,000-35,000 years ago and long-term low but stable effective population sizes. This demographic history, including early divergence from Papua New Guinean (47,000 years ago) and Eurasian groups1, has generated the highest proportion of previously undescribed genetic variation seen outside Africa and the most extended homozygosity compared with global samples. A substantial proportion of this variation is not observed in global reference panels or clinical datasets, and variation with predicted functional consequence is more likely to be homozygous than in other populations, with consequent implications for medical genomics2. Our results show that Indigenous Australians are not a single homogeneous genetic group and their genetic relationship with the peoples of New Guinea is not uniform. These patterns imply that the full breadth of Indigenous Australian genetic diversity remains uncharacterized, potentially limiting genomic medicine and equitable healthcare for Indigenous Australians.
Collapse
Affiliation(s)
- Matthew Silcocks
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- University of Melbourne, School of Biosciences, Parkville, Victoria, Australia
| | - Ashley Farlow
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- University of Melbourne, School of Mathematics and Statistics, Parkville, Victoria, Australia
| | - Azure Hermes
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Georgia Tsambos
- University of Melbourne, School of Mathematics and Statistics, Parkville, Victoria, Australia
| | - Hardip R Patel
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Sharon Huebner
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Gareth Baynam
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Faculty of Health and Medical Sciences, Division of Paediatrics and Telethon Kids Institute, University of Western Australia, Perth, Western Australia, Australia
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital and Rare Care Centre, Perth Children's Hospital, Perth, Western Australia, Australia
| | - Misty R Jenkins
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
- Immunology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- University of Melbourne, Department of Medical Biology, Parkville, Victoria, Australia
| | - Damjan Vukcevic
- University of Melbourne, School of Mathematics and Statistics, Parkville, Victoria, Australia
| | - Simon Easteal
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Stephen Leslie
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory, Australia.
- University of Melbourne, School of Biosciences, Parkville, Victoria, Australia.
- University of Melbourne, School of Mathematics and Statistics, Parkville, Victoria, Australia.
| |
Collapse
|
30
|
Greulich BM, Rajendran S, Downing NF, Nicholas TR, Hollenhorst PC. A complex with poly(A)-binding protein and EWS facilitates the transcriptional function of oncogenic ETS transcription factors in prostate cells. J Biol Chem 2023; 299:105453. [PMID: 37956771 PMCID: PMC10704431 DOI: 10.1016/j.jbc.2023.105453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
The ETS transcription factor ERG is aberrantly expressed in approximately 50% of prostate tumors due to chromosomal rearrangements such as TMPRSS2/ERG. The ability of ERG to drive oncogenesis in prostate epithelial cells requires interaction with distinct coactivators, such as the RNA-binding protein EWS. Here, we find that ERG has both direct and indirect interactions with EWS, and the indirect interaction is mediated by the poly-A RNA-binding protein PABPC1. PABPC1 directly bound both ERG and EWS. ERG expression in prostate cells promoted PABPC1 localization to the nucleus and recruited PABPC1 to ERG/EWS-binding sites in the genome. Knockdown of PABPC1 in prostate cells abrogated ERG-mediated phenotypes and decreased the ability of ERG to activate transcription. These findings define a complex including ERG and the RNA-binding proteins EWS and PABPC1 that represents a potential therapeutic target for ERG-positive prostate cancer and identify a novel nuclear role for PABPC1.
Collapse
Affiliation(s)
| | - Saranya Rajendran
- Medical Sciences Program, Indiana University School of Medicine, Bloomington, Indiana, USA
| | - Nicholas F Downing
- Medical Sciences Program, Indiana University School of Medicine, Bloomington, Indiana, USA
| | - Taylor R Nicholas
- Medical Sciences Program, Indiana University School of Medicine, Bloomington, Indiana, USA
| | - Peter C Hollenhorst
- Medical Sciences Program, Indiana University School of Medicine, Bloomington, Indiana, USA.
| |
Collapse
|
31
|
Nakatsuka N, Holguin B, Sedig J, Langenwalter PE, Carpenter J, Culleton BJ, García-Moreno C, Harper TK, Martin D, Martínez-Ramírez J, Porcayo-Michelini A, Tiesler V, Villapando-Canchola ME, Valdes Herrera A, Callan K, Curtis E, Kearns A, Iliev L, Lawson AM, Mah M, Mallick S, Micco A, Michel M, Workman JN, Oppenheimer J, Qiu L, Zalzala F, Rohland N, Punzo Diaz JL, Johnson JR, Reich D. Genetic continuity and change among the Indigenous peoples of California. Nature 2023; 624:122-129. [PMID: 37993721 PMCID: PMC10872549 DOI: 10.1038/s41586-023-06771-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 10/20/2023] [Indexed: 11/24/2023]
Abstract
Before the colonial period, California harboured more language variation than all of Europe, and linguistic and archaeological analyses have led to many hypotheses to explain this diversity1. We report genome-wide data from 79 ancient individuals from California and 40 ancient individuals from Northern Mexico dating to 7,400-200 years before present (BP). Our analyses document long-term genetic continuity between people living on the Northern Channel Islands of California and the adjacent Santa Barbara mainland coast from 7,400 years BP to modern Chumash groups represented by individuals who lived around 200 years BP. The distinctive genetic lineages that characterize present-day and ancient people from Northwest Mexico increased in frequency in Southern and Central California by 5,200 years BP, providing evidence for northward migrations that are candidates for spreading Uto-Aztecan languages before the dispersal of maize agriculture from Mexico2-4. Individuals from Baja California share more alleles with the earliest individual from Central California in the dataset than with later individuals from Central California, potentially reflecting an earlier linguistic substrate, whose impact on local ancestry was diluted by later migrations from inland regions1,5. After 1,600 years BP, ancient individuals from the Channel Islands lived in communities with effective sizes similar to those in pre-agricultural Caribbean and Patagonia, and smaller than those on the California mainland and in sampled regions of Mexico.
Collapse
Affiliation(s)
- Nathan Nakatsuka
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Harvard-MIT Division of Health Sciences and Technology, Boston, MA, USA.
| | - Brian Holguin
- Department of Anthropology, University of California at Santa Barbara, Santa Barbara, CA, USA
| | - Jakob Sedig
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | | | - John Carpenter
- Instituto Nacional de Antropología e Historia, Sonora, Hermosillo, México
| | - Brendan J Culleton
- Institute of Energy and the Environment, The Pennsylvania State University, University Park, PA, USA
| | | | - Thomas K Harper
- Institute of Energy and the Environment, The Pennsylvania State University, University Park, PA, USA
| | - Debra Martin
- Department of Anthropology, University of Nevada, Las Vegas, NV, USA
| | | | | | - Vera Tiesler
- Universidad Autónoma de Yucatán, Facultad de Ciencias Antropológicas, Mérida, México
| | | | | | - Kim Callan
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Elizabeth Curtis
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Aisling Kearns
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Lora Iliev
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Ann Marie Lawson
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Matthew Mah
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Adam Micco
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Megan Michel
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - J Noah Workman
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Jonas Oppenheimer
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Lijun Qiu
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Fatma Zalzala
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nadin Rohland
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | | | - John R Johnson
- Santa Barbara Museum of Natural History, Santa Barbara, CA, USA.
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
32
|
Bukhnikashvili L. Overlaps Between CDS Regions of Protein-Coding Genes in the Human Genome: A Case Study on the NR1D1-THRA Gene Pair. J Mol Evol 2023; 91:963-975. [PMID: 38006429 DOI: 10.1007/s00239-023-10147-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 11/12/2023] [Indexed: 11/27/2023]
Abstract
For several decades, it has been known that a substantial number of genes within human DNA exhibit overlap; however, the biological and evolutionary significance of these overlaps remain poorly understood. This study focused on investigating specific instances of overlap where the overlapping DNA region encompasses the coding DNA sequences (CDSs) of protein-coding genes. The results revealed that proteins encoded by overlapping CDSs exhibit greater disorder than those from nonoverlapping CDSs. Additionally, these DNA regions were identified as GC-rich. This could be partially attributed to the absence of stop codons from two distinct reading frames rather than one. Furthermore, these regions were found to harbour fewer single-nucleotide polymorphism (SNP) sites, possibly due to constraints arising from the overlapping state where mutations could affect two genes simultaneously.While elucidating these properties, the NR1D1-THRA gene pair emerged as an exceptional case with highly structured proteins and a distinctly conserved sequence across eutherian mammals. Both NR1D1 and THRA are nuclear receptors lacking a ligand-binding domain at their C-terminus, which is the region where these gene pairs overlap. The NR1D1 gene is involved in the regulation of circadian rhythm, while the THRA gene encodes a thyroid hormone receptor, and both play crucial roles in various physiological processes. This study suggests that, in addition to their well-established functions, the specifically overlapping CDS regions of these genes may encode protein segments with additional, yet undiscovered, biological roles.
Collapse
|
33
|
Chu C, Lin EW, Tran A, Jin H, Ho NI, Veit A, Cortes-Ciriano I, Burns KH, Ting DT, Park PJ. The landscape of human SVA retrotransposons. Nucleic Acids Res 2023; 51:11453-11465. [PMID: 37823611 PMCID: PMC10681720 DOI: 10.1093/nar/gkad821] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 09/12/2023] [Accepted: 09/20/2023] [Indexed: 10/13/2023] Open
Abstract
SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.
Collapse
Affiliation(s)
- Chong Chu
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Eric W Lin
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Charlestown, MA 02129, USA
- Department of Medicine, Massachusetts General Hospital Harvard Medical School, Boston, MA 02114, USA
| | - Antuan Tran
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Hu Jin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Natalie I Ho
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Charlestown, MA 02129, USA
- Department of Medicine, Massachusetts General Hospital Harvard Medical School, Boston, MA 02114, USA
| | - Alexander Veit
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Isidro Cortes-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Kathleen H Burns
- Department of Pathology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA
| | - David T Ting
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Charlestown, MA 02129, USA
- Department of Medicine, Massachusetts General Hospital Harvard Medical School, Boston, MA 02114, USA
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
34
|
Stroup EK, Ji Z. Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease. Nat Commun 2023; 14:7378. [PMID: 37968271 PMCID: PMC10651852 DOI: 10.1038/s41467-023-43266-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 11/05/2023] [Indexed: 11/17/2023] Open
Abstract
The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site optimization across the human genome, we develop deep/machine learning models to identify genome-wide putative polyA sites at unprecedented nucleotide-level resolution and calculate their strength and usage in the genomic context. Our models quantitatively measure position-specific motif importance and their crosstalk in polyA site formation and cleavage heterogeneity. The intronic site expression is governed by the surrounding splicing landscape. The usage of alternative polyA sites in terminal exons is modulated by their relative locations and distance to downstream genes. Finally, we apply our models to reveal thousands of disease- and trait-associated genetic variants altering polyadenylation activity. Altogether, our models represent a valuable resource to dissect molecular mechanisms mediating genome-wide polyA site expression and characterize their functional roles in human diseases.
Collapse
Affiliation(s)
- Emily Kunce Stroup
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Zhe Ji
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA.
- Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL, 60628, USA.
| |
Collapse
|
35
|
Miga KH, Eichler EE. Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes. Am J Hum Genet 2023; 110:1832-1840. [PMID: 37922882 PMCID: PMC10645551 DOI: 10.1016/j.ajhg.2023.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 11/07/2023] Open
Abstract
Advances in long-read sequencing and assembly now mean that individual labs can generate phased genomes that are more accurate and more contiguous than the original human reference genome. With declining costs and increasing democratization of technology, we suggest that complete genome assemblies, where both parental haplotypes are phased telomere to telomere, will become standard in human genetics. Soon, even in clinical settings where rigorous sample-handling standards must be met, affected individuals could have reference-grade genomes fully sequenced and assembled in just a few hours given advances in technology, computational processing, and annotation. Complete genetic variant discovery will transform how we map, catalog, and associate variation with human disease and fundamentally change our understanding of the genetic diversity of all humans.
Collapse
Affiliation(s)
- Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
36
|
White S, Haas M, Laginha KJ, Laurendet K, Gaff C, Vears D, Newson AJ. What's in a name? Justifying terminology for genomic findings beyond the initial test indication: A scoping review. Genet Med 2023; 25:100936. [PMID: 37454281 DOI: 10.1016/j.gim.2023.100936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 07/09/2023] [Indexed: 07/18/2023] Open
Abstract
Genome sequencing can generate findings beyond the initial test indication that may be relevant to a patient or research participant's health. In the decade since the American College of Medical Genetics and Genomics published its recommendations for reporting these findings, consensus regarding terminology has remained elusive and a variety of terms are in use globally. We conducted a scoping review to explore terminology choice and the justifications underlying those choices. Documents were included if they contained a justification for their choice of term(s) related to findings beyond the initial genomic test indication. From 3571 unique documents, 52 were included, just over half of which pertained to the clinical context (n = 29, 56%). We identified four inter-related concepts used to defend or oppose terms: expectedness of the finding, effective communication, relatedness to the original test indication, and how genomic information was generated. A variety of justifications were used to oppose the term "incidental," whereas "secondary" had broader support as a term to describe findings deliberately sought. Terminology choice would benefit from further work to include the views of patients. We contend that clear definitions will improve ethical debate and support communication about genomic findings beyond the initial test indication.
Collapse
Affiliation(s)
- Stephanie White
- Sydney Health Ethics, Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, NSW, Australia; Australian Genomics, Parkville, VIC, Australia; Graduate School of Health, University of Technology Sydney, Sydney, NSW, Australia
| | - Matilda Haas
- Australian Genomics, Parkville, VIC, Australia; Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Kitty-Jean Laginha
- Sydney Health Ethics, Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, NSW, Australia; Australian Genomics, Parkville, VIC, Australia
| | - Kirsten Laurendet
- Sydney Health Ethics, Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, NSW, Australia; Australian Genomics, Parkville, VIC, Australia
| | - Clara Gaff
- Australian Genomics, Parkville, VIC, Australia; Murdoch Children's Research Institute, Parkville, VIC, Australia; Department of Paediatrics, University of Melbourne, Parkville, VIC, Australia; Melbourne Genomics Health Alliance, Parkville, VIC, Australia
| | - Danya Vears
- Australian Genomics, Parkville, VIC, Australia; Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Ainsley J Newson
- Sydney Health Ethics, Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, NSW, Australia; Australian Genomics, Parkville, VIC, Australia.
| |
Collapse
|
37
|
Calandrelli R, Wen X, Charles Richard JL, Luo Z, Nguyen TC, Chen CJ, Qi Z, Xue S, Chen W, Yan Z, Wu W, Zaleta-Rivera K, Hu R, Yu M, Wang Y, Li W, Ma J, Ren B, Zhong S. Genome-wide analysis of the interplay between chromatin-associated RNA and 3D genome organization in human cells. Nat Commun 2023; 14:6519. [PMID: 37845234 PMCID: PMC10579264 DOI: 10.1038/s41467-023-42274-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/05/2023] [Indexed: 10/18/2023] Open
Abstract
The interphase genome is dynamically organized in the nucleus and decorated with chromatin-associated RNA (caRNA). It remains unclear whether the genome architecture modulates the spatial distribution of caRNA and vice versa. Here, we generate a resource of genome-wide RNA-DNA and DNA-DNA contact maps in human cells. These maps reveal the chromosomal domains demarcated by locally transcribed RNA, hereafter termed RNA-defined chromosomal domains. Further, the spreading of caRNA is constrained by the boundaries of topologically associating domains (TADs), demonstrating the role of the 3D genome structure in modulating the spatial distribution of RNA. Conversely, stopping transcription or acute depletion of RNA induces thousands of chromatin loops genome-wide. Activation or suppression of the transcription of specific genes suppresses or creates chromatin loops straddling these genes. Deletion of a specific caRNA-producing genomic sequence promotes chromatin loops that straddle the interchromosomal target sequences of this caRNA. These data suggest a feedback loop where the 3D genome modulates the spatial distribution of RNA, which in turn affects the dynamic 3D genome organization.
Collapse
Affiliation(s)
- Riccardo Calandrelli
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Xingzhao Wen
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | | | - Zhifei Luo
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Tri C Nguyen
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Chien-Ju Chen
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - Zhijie Qi
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Shuanghong Xue
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Weizhong Chen
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Zhangming Yan
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Weixin Wu
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Kathia Zaleta-Rivera
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Rong Hu
- Department of Cellular and Molecular Medicine, Center for Epigenomics, University of California San Diego, La Jolla, CA, USA
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Miao Yu
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Yuchuan Wang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Wenbo Li
- Department of Biochemistry and Molecular Biology, McGovern Medical School, University of Texas Health Science Center, Houston, TX, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, Center for Epigenomics, University of California San Diego, La Jolla, CA, USA
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Sheng Zhong
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
38
|
Medina-Muñoz SG, Ortega-Del Vecchyo D, Cruz-Hervert LP, Ferreyra-Reyes L, García-García L, Moreno-Estrada A, Ragsdale AP. Demographic modeling of admixed Latin American populations from whole genomes. Am J Hum Genet 2023; 110:1804-1816. [PMID: 37725976 PMCID: PMC10577084 DOI: 10.1016/j.ajhg.2023.08.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 08/17/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Demographic models of Latin American populations often fail to fully capture their complex evolutionary history, which has been shaped by both recent admixture and deeper-in-time demographic events. To address this gap, we used high-coverage whole-genome data from Indigenous American ancestries in present-day Mexico and existing genomes from across Latin America to infer multiple demographic models that capture the impact of different timescales on genetic diversity. Our approach, which combines analyses of allele frequencies and ancestry tract length distributions, represents a significant improvement over current models in predicting patterns of genetic variation in admixed Latin American populations. We jointly modeled the contribution of European, African, East Asian, and Indigenous American ancestries into present-day Latin American populations. We infer that the ancestors of Indigenous Americans and East Asians diverged ∼30 thousand years ago, and we characterize genetic contributions of recent migrations from East and Southeast Asia to Peru and Mexico. Our inferred demographic histories are consistent across different genomic regions and annotations, suggesting that our inferences are robust to the potential effects of linked selection. In conjunction with published distributions of fitness effects for new nonsynonymous mutations in humans, we show in large-scale simulations that our models recover important features of both neutral and deleterious variation. By providing a more realistic framework for understanding the evolutionary history of Latin American populations, our models can help address the historical under-representation of admixed groups in genomics research and can be a valuable resource for future studies of populations with complex admixture and demographic histories.
Collapse
Affiliation(s)
- Santiago G Medina-Muñoz
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato 36824, Mexico
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de Mexico, Juriquilla, Querétaro 76230, Mexico
| | | | | | | | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato 36824, Mexico.
| | - Aaron P Ragsdale
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato 36824, Mexico; Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
39
|
Taber JM, Peters E, Klein WMP, Cameron LD, Turbitt E, Biesecker BB. Motivations to learn genomic information are not exceptional: Lessons from behavioral science. Clin Genet 2023; 104:397-405. [PMID: 37491896 DOI: 10.1111/cge.14401] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 06/29/2023] [Accepted: 06/30/2023] [Indexed: 07/27/2023]
Abstract
Whether to undergo genome sequencing in a clinical or research context is generally a voluntary choice. Individuals are often motivated to learn genomic information even when clinical utility-the possibility that the test could inform medical recommendations or health outcomes-is low or absent. Motivations to seek one's genomic information can be cognitive, affective, social, or mixed (e.g., cognitive and affective) in nature. These motivations are based on the perceived value of the information, specifically, its clinical utility and personal utility. We suggest that motivations to learn genomic information are no different from motivations to learn other types of personal information, including one's health status and disease risk. Here, we review behavioral science relevant to motivations that may drive engagement with genome sequencing, both in the presence of varying degrees of clinical utility and in the absence of clinical utility. Specifically, we elucidate 10 motivations that are expected to underlie decisions to undergo genome sequencing. Recognizing these motivations to learn genomic information will guide future research and ultimately help clinicians to facilitate informed decision making among individuals as genome sequencing becomes increasingly available.
Collapse
Affiliation(s)
- Jennifer M Taber
- Department of Psychological Sciences, Kent State University, Kent, Ohio, USA
| | - Ellen Peters
- Center for Science Communication Research and Psychology Department, University of Oregon, Eugene, Oregon, USA
| | - William M P Klein
- Behavioral Research Program, National Cancer Institute, Bethesda, Maryland, USA
| | - Linda D Cameron
- Department of Psychological Sciences, University of California, Merced, California, USA
| | - Erin Turbitt
- Graduate School of Health, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Barbara B Biesecker
- Genomics, Bioinformatics and Translational Science, RTI International, Research Triangle Park, North Carolina, USA
| |
Collapse
|
40
|
Allou L, Mundlos S. Disruption of regulatory domains and novel transcripts as disease-causing mechanisms. Bioessays 2023; 45:e2300010. [PMID: 37381881 DOI: 10.1002/bies.202300010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 05/24/2023] [Accepted: 06/06/2023] [Indexed: 06/30/2023]
Abstract
Deletions, duplications, insertions, inversions, and translocations, collectively called structural variations (SVs), affect more base pairs of the genome than any other sequence variant. The recent technological advancements in genome sequencing have enabled the discovery of tens of thousands of SVs per human genome. These SVs primarily affect non-coding DNA sequences, but the difficulties in interpreting their impact limit our understanding of human disease etiology. The functional annotation of non-coding DNA sequences and methodologies to characterize their three-dimensional (3D) organization in the nucleus have greatly expanded our understanding of the basic mechanisms underlying gene regulation, thereby improving the interpretation of SVs for their pathogenic impact. Here, we discuss the various mechanisms by which SVs can result in altered gene regulation and how these mechanisms can result in rare genetic disorders. Beyond changing gene expression, SVs can produce novel gene-intergenic fusion transcripts at the SV breakpoints.
Collapse
Affiliation(s)
- Lila Allou
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Stefan Mundlos
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
41
|
Rautiainen M, Nurk S, Walenz BP, Logsdon GA, Porubsky D, Rhie A, Eichler EE, Phillippy AM, Koren S. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat Biotechnol 2023; 41:1474-1482. [PMID: 36797493 PMCID: PMC10427740 DOI: 10.1038/s41587-023-01662-6] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 01/03/2023] [Indexed: 02/18/2023]
Abstract
The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.
Collapse
Affiliation(s)
- Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, UK
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
42
|
Ziyatdinov A, Torres J, Alegre-Díaz J, Backman J, Mbatchou J, Turner M, Gaynor SM, Joseph T, Zou Y, Liu D, Wade R, Staples J, Panea R, Popov A, Bai X, Balasubramanian S, Habegger L, Lanche R, Lopez A, Maxwell E, Jones M, García-Ortiz H, Ramirez-Reyes R, Santacruz-Benítez R, Nag A, Smith KR, Damask A, Lin N, Paulding C, Reppell M, Zöllner S, Jorgenson E, Salerno W, Petrovski S, Overton J, Reid J, Thornton TA, Abecasis G, Berumen J, Orozco-Orozco L, Collins R, Baras A, Hill MR, Emberson JR, Marchini J, Kuri-Morales P, Tapia-Conyer R. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 2023; 622:784-793. [PMID: 37821707 PMCID: PMC10600010 DOI: 10.1038/s41586-023-06595-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 08/31/2023] [Indexed: 10/13/2023]
Abstract
The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.
Collapse
Affiliation(s)
| | - Jason Torres
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
| | - Jesús Alegre-Díaz
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | | | | | - Michael Turner
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- Oxford Kidney Unit, Churchill Hospital, Oxford, UK
| | | | | | - Yuxin Zou
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Daren Liu
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Rachel Wade
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | - Alex Popov
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | - Alex Lopez
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | - Raul Ramirez-Reyes
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Rogelio Santacruz-Benítez
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Abhishek Nag
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | - Katherine R Smith
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | - Amy Damask
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Nan Lin
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | | | | | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | | | | | | | | | - Jaime Berumen
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | | | - Rory Collins
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Michael R Hill
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan R Emberson
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | - Pablo Kuri-Morales
- Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico
- Faculty of Medicine, National Autonomous University of Mexico, Mexico City, Mexico
| | - Roberto Tapia-Conyer
- Faculty of Medicine, National Autonomous University of Mexico, Mexico City, Mexico.
| |
Collapse
|
43
|
Sohail M, Palma-Martínez MJ, Chong AY, Quinto-Cortés CD, Barberena-Jonas C, Medina-Muñoz SG, Ragsdale A, Delgado-Sánchez G, Cruz-Hervert LP, Ferreyra-Reyes L, Ferreira-Guerrero E, Mongua-Rodríguez N, Canizales-Quintero S, Jimenez-Kaufmann A, Moreno-Macías H, Aguilar-Salinas CA, Auckland K, Cortés A, Acuña-Alonzo V, Gignoux CR, Wojcik GL, Ioannidis AG, Fernández-Valverde SL, Hill AVS, Tusié-Luna MT, Mentzer AJ, Novembre J, García-García L, Moreno-Estrada A. Mexican Biobank advances population and medical genomics of diverse ancestries. Nature 2023; 622:775-783. [PMID: 37821706 PMCID: PMC10600006 DOI: 10.1038/s41586-023-06560-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/22/2023] [Indexed: 10/13/2023]
Abstract
Latin America continues to be severely underrepresented in genomics research, and fine-scale genetic histories and complex trait architectures remain hidden owing to insufficient data1. To fill this gap, the Mexican Biobank project genotyped 6,057 individuals from 898 rural and urban localities across all 32 states in Mexico at a resolution of 1.8 million genome-wide markers with linked complex trait and disease information creating a valuable nationwide genotype-phenotype database. Here, using ancestry deconvolution and inference of identity-by-descent segments, we inferred ancestral population sizes across Mesoamerican regions over time, unravelling Indigenous, colonial and postcolonial demographic dynamics2-6. We observed variation in runs of homozygosity among genomic regions with different ancestries reflecting distinct demographic histories and, in turn, different distributions of rare deleterious variants. We conducted genome-wide association studies (GWAS) for 22 complex traits and found that several traits are better predicted using the Mexican Biobank GWAS compared to the UK Biobank GWAS7,8. We identified genetic and environmental factors associating with trait variation, such as the length of the genome in runs of homozygosity as a predictor for body mass index, triglycerides, glucose and height. This study provides insights into the genetic histories of individuals in Mexico and dissects their complex trait architectures, both crucial for making precision and preventive medicine initiatives accessible worldwide.
Collapse
Affiliation(s)
- Mashaal Sohail
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), Irapuato, Mexico.
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Centro de Ciencias Genómicas (CCG), Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Mexico.
| | - María J Palma-Martínez
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), Irapuato, Mexico
| | - Amanda Y Chong
- The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Consuelo D Quinto-Cortés
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), Irapuato, Mexico
| | - Carmina Barberena-Jonas
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), Irapuato, Mexico
| | - Santiago G Medina-Muñoz
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), Irapuato, Mexico
| | - Aaron Ragsdale
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), Irapuato, Mexico
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | | | - Luis Pablo Cruz-Hervert
- Instituto Nacional de Salud Pública (INSP), Cuernavaca, Mexico
- División de Estudios de Posgrado e Investigación, Facultad de Odontología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | | | | | | | - Andrés Jimenez-Kaufmann
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), Irapuato, Mexico
| | - Hortensia Moreno-Macías
- Unidad de Biología Molecular y Medicina Genómica, Instituto de Investigaciones Biomédicas UNAM/Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
- Universidad Autónoma Metropolitana, Mexico City, Mexico
| | - Carlos A Aguilar-Salinas
- Division de Nutrición, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
| | - Kathryn Auckland
- The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Adrián Cortés
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | | | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | - Selene L Fernández-Valverde
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), Irapuato, Mexico
- School of Biotechnology and Biomolecular Sciences and the RNA Institute, The University of New South Wales, Sydney, New South Wales, Australia
| | - Adrian V S Hill
- The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- The Jenner Institute, University of Oxford, Oxford, UK
| | - María Teresa Tusié-Luna
- Unidad de Biología Molecular y Medicina Genómica, Instituto de Investigaciones Biomédicas UNAM/Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
| | - Alexander J Mentzer
- The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | | | - Andrés Moreno-Estrada
- Unidad de Genómica Avanzada (UGA-LANGEBIO), Centro de Investigación y Estudios Avanzados del IPN (Cinvestav), Irapuato, Mexico.
| |
Collapse
|
44
|
Eldjarn GH, Ferkingstad E, Lund SH, Helgason H, Magnusson OT, Gunnarsdottir K, Olafsdottir TA, Halldorsson BV, Olason PI, Zink F, Gudjonsson SA, Sveinbjornsson G, Magnusson MI, Helgason A, Oddsson A, Halldorsson GH, Magnusson MK, Saevarsdottir S, Eiriksdottir T, Masson G, Stefansson H, Jonsdottir I, Holm H, Rafnar T, Melsted P, Saemundsdottir J, Norddahl GL, Thorleifsson G, Ulfarsson MO, Gudbjartsson DF, Thorsteinsdottir U, Sulem P, Stefansson K. Large-scale plasma proteomics comparisons through genetics and disease associations. Nature 2023; 622:348-358. [PMID: 37794188 PMCID: PMC10567571 DOI: 10.1038/s41586-023-06563-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 08/22/2023] [Indexed: 10/06/2023]
Abstract
High-throughput proteomics platforms measuring thousands of proteins in plasma combined with genomic and phenotypic information have the power to bridge the gap between the genome and diseases. Here we performed association studies of Olink Explore 3072 data generated by the UK Biobank Pharma Proteomics Project1 on plasma samples from more than 50,000 UK Biobank participants with phenotypic and genotypic data, stratifying on British or Irish, African and South Asian ancestries. We compared the results with those of a SomaScan v4 study on plasma from 36,000 Icelandic people2, for 1,514 of whom Olink data were also available. We found modest correlation between the two platforms. Although cis protein quantitative trait loci were detected for a similar absolute number of assays on the two platforms (2,101 on Olink versus 2,120 on SomaScan), the proportion of assays with such supporting evidence for assay performance was higher on the Olink platform (72% versus 43%). A considerable number of proteins had genomic associations that differed between the platforms. We provide examples where differences between platforms may influence conclusions drawn from the integration of protein levels with the study of diseases. We demonstrate how leveraging the diverse ancestries of participants in the UK Biobank helps to detect novel associations and refine genomic location. Our results show the value of the information provided by the two most commonly used high-throughput proteomics platforms and demonstrate the differences between them that at times provides useful complementarity.
Collapse
Affiliation(s)
| | | | - Sigrun H Lund
- deCODE Genetics/Amgen, Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Hannes Helgason
- deCODE Genetics/Amgen, Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | | | - Bjarni V Halldorsson
- deCODE Genetics/Amgen, Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | | | | | | | | | - Agnar Helgason
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | | | | | - Magnus K Magnusson
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Saedis Saevarsdottir
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | | | - Ingileif Jonsdottir
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Hilma Holm
- deCODE Genetics/Amgen, Reykjavik, Iceland
| | | | - Pall Melsted
- deCODE Genetics/Amgen, Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | | | - Magnus O Ulfarsson
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Faculty of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland
| | - Daniel F Gudbjartsson
- deCODE Genetics/Amgen, Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Unnur Thorsteinsdottir
- deCODE Genetics/Amgen, Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Kari Stefansson
- deCODE Genetics/Amgen, Reykjavik, Iceland.
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
45
|
Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, Liu P, Zhang S, Chen G, Xu J, Zhou H, Zhou L, Qian X, Liu C, Tan S, Zhou C, Dai W, Xu M, Qi Y, Wang X, Guo L, Fan G, Wang A, Deng Y, Zhang Y, Jin J, He Y, Guo C, Guo G, Zhou Q, Xu X, Yang H, Wang J, Xu S, Mao Y, Jin X, Ruan J, Zhang G. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res 2023; 33:745-761. [PMID: 37452091 PMCID: PMC10542383 DOI: 10.1038/s41422-023-00849-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Collapse
Affiliation(s)
- Chentao Yang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yang Zhou
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI Research-Wuhan, BGI, Wuhan, Hubei, China
| | - Yanni Song
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Dongya Wu
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yan Zeng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Lei Nie
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jinjin Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Hongling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaobo Qian
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chenlu Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | | | | | - Wei Dai
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Mengyang Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yanwei Qi
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Xiaobo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lidong Guo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Aijun Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Yunqiu He
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunxue Guo
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Hangzhou, Hangzhou, Zhejiang, China
| | - Guoji Guo
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing Zhou
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Guojie Zhang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
46
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. Nature 2023; 622:41-47. [PMID: 37794265 PMCID: PMC10575709 DOI: 10.1038/s41586-023-06490-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 07/27/2023] [Indexed: 10/06/2023]
Abstract
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, Sao Paulo, Brazil
| | | | - Francisco M De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Tempus Labs, Chicago, IL, USA
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Royston, UK
| | - Artemis G Hatzigeorgiou
- Department of Computer Science and Biomedical Informatics, Universithy of Thessaly, Lamia, Greece
- Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
- Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research, University of Bern, Bern, Switzerland
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Human Technopole, Milan, Italy.
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
47
|
Ichikawa K, Kawahara R, Asano T, Morishita S. A landscape of complex tandem repeats within individual human genomes. Nat Commun 2023; 14:5530. [PMID: 37709751 PMCID: PMC10502081 DOI: 10.1038/s41467-023-41262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 08/28/2023] [Indexed: 09/16/2023] Open
Abstract
Markedly expanded tandem repeats (TRs) have been correlated with ~60 diseases. TR diversity has been considered a clue toward understanding missing heritability. However, haplotype-resolved long TRs remain mostly hidden or blacked out because their complex structures (TRs composed of various units and minisatellites containing >10-bp units) make them difficult to determine accurately with existing methods. Here, using a high-precision algorithm to determine complex TR structures from long, accurate reads of PacBio HiFi, an investigation of 270 Japanese control samples yields several genome-wide findings. Approximately 322,000 TRs are difficult to impute from the surrounding single-nucleotide variants. Greater genetic divergence of TR loci is significantly correlated with more events of younger replication slippage. Complex TRs are more abundant than single-unit TRs, and a tendency for complex TRs to consist of <10-bp units and single-unit TRs to be minisatellites is statistically significant at loci with ≥500-bp TRs. Of note, 8909 loci with extended TRs (>100b longer than the mode) contain several known disease-associated TRs and are considered candidates for association with disorders. Overall, complex TRs and minisatellites are found to be abundant and diverse, even in genetically small Japanese populations, yielding insights into the landscape of long TRs.
Collapse
Affiliation(s)
- Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Riki Kawahara
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Takeshi Asano
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan.
| |
Collapse
|
48
|
Li C, Chen L, Pan G, Zhang W, Li SC. Deciphering complex breakage-fusion-bridge genome rearrangements with Ambigram. Nat Commun 2023; 14:5528. [PMID: 37684230 PMCID: PMC10491683 DOI: 10.1038/s41467-023-41259-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 08/28/2023] [Indexed: 09/10/2023] Open
Abstract
Breakage-fusion-bridge (BFB) is a complex rearrangement that leads to tumor malignancy. Existing models for detecting BFBs rely on the ideal BFB hypothesis, ruling out the possibility of BFBs entangled with other structural variations, that is, complex BFBs. We propose an algorithm Ambigram to identify complex BFB and reconstruct the rearranged structure of the local genome during the cancer subclone evolution process. Ambigram handles data from short, linked, long, and single-cell sequences, and optical mapping technologies. Ambigram successfully deciphers the gold- or silver-standard complex BFBs against the state-of-the-art in multiple cancers. Ambigram dissects the intratumor heterogeneity of complex BFB events with single-cell reads from melanoma and gastric cancer. Furthermore, applying Ambigram to liver and cervical cancer data suggests that the BFB mechanism may mediate oncovirus integrations. BFB also exists in noncancer genomics. Investigating the complete human genome reference with Ambigram suggests that the BFB mechanism may be involved in two genome reorganizations of Homo Sapiens during evolution. Moreover, Ambigram discovers the signals of recurrent foldback inversions and complex BFBs in whole genome data from the 1000 genome project, and congenital heart diseases, respectively.
Collapse
Affiliation(s)
- Chaohui Li
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Lingxi Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Guangze Pan
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Wenqian Zhang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
49
|
Brandes N, Goldman G, Wang CH, Ye CJ, Ntranos V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat Genet 2023; 55:1512-1522. [PMID: 37563329 PMCID: PMC10484790 DOI: 10.1038/s41588-023-01465-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 07/05/2023] [Indexed: 08/12/2023]
Abstract
Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-million-parameter protein language model, to predict all ~450 million possible missense variant effects in the human genome, and made all predictions available on a web portal. ESM1b outperformed existing methods in classifying ~150,000 ClinVar/HGMD missense variants as pathogenic or benign and predicting measurements across 28 deep mutational scan datasets. We further annotated ~2 million variants as damaging only in specific protein isoforms, demonstrating the importance of considering all isoforms when predicting variant effects. Our approach also generalizes to more complex coding variants such as in-frame indels and stop-gains. Together, these results establish protein language models as an effective, accurate and general approach to predicting variant effects.
Collapse
Affiliation(s)
- Nadav Brandes
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Grant Goldman
- Biological and Medical Informatics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Charlotte H Wang
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Chun Jimmie Ye
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Parker Institute for Cancer Immunotherapy, University of California, San Francisco, San Francisco, CA, USA.
- Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA, USA.
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
- Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA.
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
| | - Vasilis Ntranos
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA.
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Diabetes Center, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
50
|
Hallast P, Ebert P, Loftus M, Yilmaz F, Audano PA, Logsdon GA, Bonder MJ, Zhou W, Höps W, Kim K, Li C, Hoyt SJ, Dishuck PC, Porubsky D, Tsetsos F, Kwon JY, Zhu Q, Munson KM, Hasenfeld P, Harvey WT, Lewis AP, Kordosky J, Hoekzema K, O'Neill RJ, Korbel JO, Tyler-Smith C, Eichler EE, Shi X, Beck CR, Marschall T, Konkel MK, Lee C. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 2023; 621:355-364. [PMID: 37612510 PMCID: PMC10726138 DOI: 10.1038/s41586-023-06425-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/11/2023] [Indexed: 08/25/2023]
Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Collapse
Affiliation(s)
- Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Wolfram Höps
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Kwondo Kim
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fotios Tsetsos
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jee Young Kwon
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Patrick Hasenfeld
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Jan O Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|