1
|
Smaruj PN, Xiao Y, Fudenberg G. Recipes and ingredients for deep learning models of 3D genome folding. Curr Opin Genet Dev 2025; 91:102308. [PMID: 39862604 DOI: 10.1016/j.gde.2024.102308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 12/19/2024] [Accepted: 12/31/2024] [Indexed: 01/27/2025]
Abstract
Three-dimensional genome folding plays roles in gene regulation and disease. In this review, we compare and contrast recent deep learning models for predicting genome contact maps. We survey preprocessing, architecture, training, evaluation, and interpretation methods, highlighting the capabilities and limitations of different models. In each area, we highlight challenges, opportunities, and potential future directions for genome-folding models.
Collapse
Affiliation(s)
- Paulina N Smaruj
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Yao Xiao
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Geoffrey Fudenberg
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Chen H, Xu S. Population genomics advances in frontier ethnic minorities in China. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-024-2659-2. [PMID: 39643831 DOI: 10.1007/s11427-024-2659-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 06/18/2024] [Indexed: 12/09/2024]
Abstract
China, with its large geographic span, possesses rich genetic diversity across vast frontier regions in addition to the Han Chinese majority. Importantly, demographic events and various natural and cultural environments in Chinese frontier regions have shaped the genomic diversity of ethnic minorities via local adaptations. Thus, insights into the genetic diversity and adaptive evolution of these under-represented ethnic groups are crucial for understanding evolutionary scenarios and biomedical implications in East Asian populations. Here, we focus on ethnic minorities in Chinese frontier regions and review research advances regarding genomic diversity, genetic structure, population history, genetic admixture, and local adaptation. We first provide an overview of the extensive genetic diversity across populations in different Chinese frontier regions. Next, we summarize research progress regarding genetic ancestry, demographic history, the adaptive process, and the archaic identification of multiple ethnic minorities in different Chinese frontier regions. Finally, we discuss the gaps and opportunities in genomic studies of Chinese populations and the need for a more comprehensive understanding of genomic diversity and the evolution of populations of East Asian ancestry in the post-genomic era.
Collapse
Affiliation(s)
- Hao Chen
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Shuhua Xu
- Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
3
|
Villalba A, Smajdor A, Brassington I, Cutas D. The ethics of synthetic DNA. JOURNAL OF MEDICAL ETHICS 2024:jme-2024-110124. [PMID: 39567177 DOI: 10.1136/jme-2024-110124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Accepted: 10/13/2024] [Indexed: 11/22/2024]
Abstract
In this paper, we discuss the ethical concerns that may arise from the synthesis of human DNA. To date, only small stretches of DNA have been constructed, but the prospect of generating human genomes is becoming feasible. At the same time, the significance of genes for identity, health and reproduction is coming under increased scrutiny. We examine the implications of DNA synthesis and its impact on debates over the relationship with our DNA and the ownership of our genes, its potential to disrupt common understandings of reproduction and privacy, and the way in which synthetic DNA challenges traditional associations between genes and identity. We explore the degree to which synthetic DNA may further undermine overgeneticised accounts of identity, health, reproduction, parenthood and privacy that are prevalent in the public domain and in some areas of policy-making. While avoiding making normative claims of our own, we conclude that there is a need for reflection on the ethical implications of these developing technologies before they are on us.
Collapse
Affiliation(s)
- Adrian Villalba
- Université Paris Cité, Paris, France
- University of Granada, Granada, Spain
| | | | | | | |
Collapse
|
4
|
Kobayashi Y, Chen E, Facio FM, Metz H, Poll SR, Swartzlander D, Johnson B, Aradhya S. Clinical Variant Reclassification in Hereditary Disease Genetic Testing. JAMA Netw Open 2024; 7:e2444526. [PMID: 39504018 PMCID: PMC11541632 DOI: 10.1001/jamanetworkopen.2024.44526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 09/17/2024] [Indexed: 11/08/2024] Open
Abstract
Importance Because accurate and consistent classification of DNA sequence variants is fundamental to germline genetic testing, understanding patterns of initial variant classification (VC) and subsequent reclassification from large-scale, empirical data can help improve VC methods, promote equity among race, ethnicity, and ancestry (REA) groups, and provide insights to inform clinical practice. Objectives To measure the degree to which initial VCs met certainty thresholds set by professional guidelines and quantify the rates of, the factors associated with, and the impact of reclassification among more than 2 million variants. Design, Setting, and Participants This cohort study used clinical multigene panel and exome sequencing data from diagnostic testing for hereditary disorders, carrier screening, or preventive genetic screening from individuals for whom genetic testing was performed between January 1, 2015, and June 30, 2023. Exposure DNA variants were classified into 1 of 5 categories: benign, likely benign, variant of uncertain significance (VUS), likely pathogenic, or pathogenic. Main Outcomes and Measures The main outcomes were accuracy of classifications, rates and directions of reclassifications, evidence contributing to reclassifications, and their impact across different clinical areas and REA groups. One-way analysis of variance followed by post hoc pairwise Tukey honest significant difference tests were used to analyze differences among means, and pairwise Pearson χ2 tests with Bonferroni corrections were used to compare categorical variables among groups. Results The cohort comprised 3 272 035 individuals (median [range] age, 44 [0-89] years; 2 240 506 female [68.47%] and 1 030 729 male [31.50%]; 216 752 Black [6.62%]; 336 414 Hispanic [10.28%]; 1 804 273 White [55.14%]). Among 2 051 736 variants observed over 8 years in this cohort, 94 453 (4.60%) were reclassified. Some variants were reclassified more than once, resulting in 105 172 total reclassification events. The majority (64 752 events [61.65%]) were changes from VUS to either likely benign, benign, likely pathogenic, or pathogenic categories. An additional 37.66% of reclassifications (39 608 events) were gains in classification certainty to terminal categories (ie, likely benign to benign and likely pathogenic to pathogenic). Only a small fraction (663 events [0.63%]) moved toward less certainty, or very rarely (61 events [0.06%]) were classification reversals. When normalized by the number of individuals tested, VUS reclassification rates were higher among specific underrepresented REA populations (Ashkenazi Jewish, Asian, Black, Hispanic, Pacific Islander, and Sephardic Jewish). Approximately one-half of VUS reclassifications (37 074 of 64 840 events [57.18%]) resulted from improved use of data from computational modeling. Conclusions and Relevance In this cohort study of individuals undergoing genetic testing, the empirically estimated accuracy of pathogenic, likely pathogenic, benign, and likely benign classifications exceeded the certainty thresholds set by current VC guidelines, suggesting the need to reevaluate definitions of these classifications. The relative contribution of various strategies to resolve VUS, including emerging machine learning-based computational methods, RNA analysis, and cascade family testing, provides useful insights that can be applied toward further improving VC methods, reducing the rate of VUS, and generating more definitive results for patients.
Collapse
Affiliation(s)
- Yuya Kobayashi
- Labcorp Genetics Inc (formerly Invitae Corporation), San Francisco, California
| | - Elaine Chen
- Invitae Corporation (now part of Labcorp Genetics), San Francisco, California
- Now with Midi Health, Los Altos Hills, California
| | - Flavia M. Facio
- Labcorp Genetics Inc (formerly Invitae Corporation), San Francisco, California
| | - Hillery Metz
- Labcorp Genetics Inc (formerly Invitae Corporation), San Francisco, California
| | - Sarah R. Poll
- Labcorp Genetics Inc (formerly Invitae Corporation), San Francisco, California
| | - Dan Swartzlander
- Labcorp Genetics Inc (formerly Invitae Corporation), San Francisco, California
| | - Britt Johnson
- Invitae Corporation (now part of Labcorp Genetics), San Francisco, California
- Now with GeneDx, Stamford, Connecticut
| | - Swaroop Aradhya
- Invitae Corporation (now part of Labcorp Genetics), San Francisco, California
- Now with Illumina, San Diego, California
- Department of Pathology, Stanford University, Stanford, California
| |
Collapse
|
5
|
Adams PE, Thies JL, Sutton JM, Millwood JD, Caldwell GA, Caldwell KA, Fierst JL. Identifying transgene insertions in Caenorhabditis elegans genomes with Oxford Nanopore sequencing. PeerJ 2024; 12:e18100. [PMID: 39285918 PMCID: PMC11404476 DOI: 10.7717/peerj.18100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 08/26/2024] [Indexed: 09/19/2024] Open
Abstract
Genetically modified organisms are commonly used in disease research and agriculture but the precise genomic alterations underlying transgenic mutations are often unknown. The position and characteristics of transgenes, including the number of independent insertions, influences the expression of both transgenic and wild-type sequences. We used long-read, Oxford Nanopore Technologies (ONT) to sequence and assemble two transgenic strains of Caenorhabditis elegans commonly used in the research of neurodegenerative diseases: BY250 (pPdat-1::GFP) and UA44 (GFP and human α-synuclein), a model for Parkinson's research. After scaffolding to the reference, the final assembled sequences were ∼102 Mb with N50s of 17.9 Mb and 18.0 Mb, respectively, and L90s of six contiguous sequences, representing chromosome-level assemblies. Each of the assembled sequences contained more than 99.2% of the Nematoda BUSCO genes found in the C. elegans reference and 99.5% of the annotated C. elegans reference protein-coding genes. We identified the locations of the transgene insertions and confirmed that all transgene sequences were inserted in intergenic regions, leaving the organismal gene content intact. The transgenic C. elegans genomes presented here will be a valuable resource for Parkinson's research as well as other neurodegenerative diseases. Our work demonstrates that long-read sequencing is a fast, cost-effective way to assemble genome sequences and characterize mutant lines and strains.
Collapse
Affiliation(s)
- Paula E Adams
- Department of Biological Sciences, Auburn University, Auburn, AL, United States of America
- Department of Biological Sciences, University of Alabama - Tuscaloosa, Tuscaloosa, AL, United States of America
| | - Jennifer L Thies
- Department of Biological Sciences, University of Alabama - Tuscaloosa, Tuscaloosa, AL, United States of America
- Curriculum in Toxicology and Environmental Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America
| | - John M Sutton
- Department of Biological Sciences, University of Alabama - Tuscaloosa, Tuscaloosa, AL, United States of America
- Absci, Vancouver, WA, United States of America
| | - Joshua D Millwood
- Department of Biological Sciences, University of Alabama - Tuscaloosa, Tuscaloosa, AL, United States of America
- Department of Biological and Environmental Sciences, University of West Alabama, Livingston, AL, United States of America
| | - Guy A Caldwell
- Department of Biological Sciences, University of Alabama - Tuscaloosa, Tuscaloosa, AL, United States of America
| | - Kim A Caldwell
- Department of Biological Sciences, University of Alabama - Tuscaloosa, Tuscaloosa, AL, United States of America
| | - Janna L Fierst
- Department of Biological Sciences, Florida International University, Miami, FL, United States of America
- Biomolecular Sciences Institute, Florida International University, Miami, FL, United States of America
| |
Collapse
|
6
|
L Rocha J, Lou RN, Sudmant PH. Structural variation in humans and our primate kin in the era of telomere-to-telomere genomes and pangenomics. Curr Opin Genet Dev 2024; 87:102233. [PMID: 39042999 PMCID: PMC11695101 DOI: 10.1016/j.gde.2024.102233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/02/2024] [Accepted: 07/05/2024] [Indexed: 07/25/2024]
Abstract
Structural variants (SVs) account for the majority of base pair differences both within and between primate species. However, our understanding of inter- and intra-species SV has been historically hampered by the quality of draft primate genomes and the absence of genome resources for key taxa. Recently, advances in long-read sequencing and genome assembly have begun to radically reshape our understanding of SVs. Two landmark achievements include the publication of a human telomere-to-telomere (T2T) genome as well as the development of the first human pangenome reference. In this review, we first look back to the major works laying the foundation for these projects. We then examine the ways in which T2T genome assemblies and pangenomes are transforming our understanding of and approach to primate SV. Finally, we discuss what the future of primate SV research may look like in the era of T2T genomes and pangenomics.
Collapse
Affiliation(s)
- Joana L Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@joanocha
| | - Runyang N Lou
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@NicolasLou10
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, USA.
| |
Collapse
|
7
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
8
|
Tavakoli N, Gibney D, Aluru S. GraphSlimmer: Preserving Read Mappability with the Minimum Number of Variants. J Comput Biol 2024; 31:616-637. [PMID: 38990757 DOI: 10.1089/cmb.2024.0601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024] Open
Abstract
Modern genomic datasets, like those generated under the 1000 Genome Project, contain millions of variants belonging to known haplotypes. Although these datasets are more representative than a single reference sequence and can alleviate issues like reference bias, they are significantly more computationally burdensome to work with, often involving large-indexed genome graph data structures for tasks such as read mapping. The construction, preprocessing, and mapping algorithms can require substantial computational resources depending on the size of these variant sets. Moreover, the accuracy of mapping algorithms has been shown to decrease when working with complete variant sets. Therefore, a drastically reduced set of variants that preserves important properties of the original set is desirable. This work provides a technique for finding a minimal subset of variants S such that for given parameters α and δ, all substrings up to length α in the haplotypes are guaranteed to be still alignable to the appropriate locations with either Hamming or edit distance at most δ, using only S . Our contributions include showing the NP-hardness and inapproximability of these optimization problems and providing Integer Linear Programming (ILP) formulations. Our edit distance ILP formulation carefully decomposes the problem according to variant locations, which allows it to scale to support all of chromosome 22's variants from the 1000 Genome Project. Our experiments also demonstrate a significant reduction in the number of variants. For example, for moderately long reads, e.g., α = 1000, over 75% of the variants can be removed while preserving read mappability with edit distance at most one.
Collapse
Affiliation(s)
- Neda Tavakoli
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Gxeorgia, USA
| | - Daniel Gibney
- Department of Computer Science, University of Texas at Dallas, Richardson, Texas, USA
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Gxeorgia, USA
| |
Collapse
|
9
|
Mikhaylova V, Rzepka M, Kawamura T, Xia Y, Chang PL, Zhou S, Paasch A, Pham L, Modi N, Yao L, Perez-Agustin A, Pagans S, Boles TC, Lei M, Wang Y, Garcia-Bassets I, Chen Z. Targeted phasing of 2-200 kilobase DNA fragments with a short-read sequencer and a single-tube linked-read library method. Sci Rep 2024; 14:7988. [PMID: 38580715 PMCID: PMC10997766 DOI: 10.1038/s41598-024-58733-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 04/02/2024] [Indexed: 04/07/2024] Open
Abstract
In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2-200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2-200 kb targets using a short-read sequencer.
Collapse
Affiliation(s)
| | - Madison Rzepka
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Yu Xia
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Peter L Chang
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Amber Paasch
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Long Pham
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Naisarg Modi
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Likun Yao
- Department of Medicine, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Adrian Perez-Agustin
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | - Sara Pagans
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | | | - Ming Lei
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | - Yong Wang
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | | | - Zhoutao Chen
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA.
| |
Collapse
|
10
|
Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Marschall T, Li H, Paten B, Abel HJ, Antonacci-Fulton LL, Asri M, Baid G, Baker CA, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson MJP, Chang PC, Chang XH, Cheng H, Chu J, Cody S, Colonna V, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler EE, Eizenga JM, Fairley S, Fedrigo O, Felsenfeld AL, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton RS, Gao Y, Garg S, Garrison E, Garrison NA, Giron CG, Green RE, Groza C, Guarracino A, Haggerty L, Hall IM, Harvey WT, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis ED, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Li H, Liao WW, Lu S, Lu TY, Lucas JK, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin FJ, McCartney A, McDaniel J, Miga KH, Mitchell MW, Monlong J, Mountcastle J, Munson KM, Mwaniki MN, Nattestad M, Novak AM, Nurk S, Olsen HE, Olson ND, Paten B, Pesout T, Phillippy AM, Popejoy AB, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Sibbesen JA, Sirén J, Smith MW, Sofia HJ, Tayoun ANA, Thibaud-Nissen F, Tomlinson C, Tricomi FF, Villani F, Vollger MR, Wagner J, Walenz B, Wang T, Wood JMD, Zimin AV, Zook JM. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol 2024; 42:663-673. [PMID: 37165083 PMCID: PMC10638906 DOI: 10.1038/s41587-023-01793-w] [Citation(s) in RCA: 60] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 04/18/2023] [Indexed: 05/12/2023]
Abstract
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
Collapse
Affiliation(s)
- Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Haley J. Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Carl A. Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, McGill University, Montreal, QC, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | | | - Mark J. P. Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | - Xian H. Chang
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - Robert M. Cook-Deegan
- Arizona State University, Barrett and O’Connor Washington Center, Washington, DC, USA
| | - Omar E. Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Daniel Doerr
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jana Ebler
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L. Felsenfeld
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Nanibaa’ A. Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Erich D. Jarvis
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E. Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A. Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | | | - Jan O. Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Julian K. Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Hugo Magalhães
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Charles Markello
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Tobias Marschall
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | | | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E. Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice B. Popejoy
- Department of Public Health Sciences, University of California, Davis, Davis, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A. Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ashley D. Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I. Schultz
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Jonas A. Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Michael W. Smith
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J. Sofia
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N. Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children’s Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Aleksey V. Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
11
|
Venner E, Patterson K, Kalra D, Wheeler MM, Chen YJ, Kalla SE, Yuan B, Karnes JH, Walker K, Smith JD, McGee S, Radhakrishnan A, Haddad A, Empey PE, Wang Q, Lichtenstein L, Toledo D, Jarvik G, Musick A, Gibbs RA. The frequency of pathogenic variation in the All of Us cohort reveals ancestry-driven disparities. Commun Biol 2024; 7:174. [PMID: 38374434 PMCID: PMC10876563 DOI: 10.1038/s42003-023-05708-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 12/13/2023] [Indexed: 02/21/2024] Open
Abstract
Disparities in data underlying clinical genomic interpretation is an acknowledged problem, but there is a paucity of data demonstrating it. The All of Us Research Program is collecting data including whole-genome sequences, health records, and surveys for at least a million participants with diverse ancestry and access to healthcare, representing one of the largest biomedical research repositories of its kind. Here, we examine pathogenic and likely pathogenic variants that were identified in the All of Us cohort. The European ancestry subgroup showed the highest overall rate of pathogenic variation, with 2.26% of participants having a pathogenic variant. Other ancestry groups had lower rates of pathogenic variation, including 1.62% for the African ancestry group and 1.32% in the Latino/Admixed American ancestry group. Pathogenic variants were most frequently observed in genes related to Breast/Ovarian Cancer or Hypercholesterolemia. Variant frequencies in many genes were consistent with the data from the public gnomAD database, with some notable exceptions resolved using gnomAD subsets. Differences in pathogenic variant frequency observed between ancestral groups generally indicate biases of ascertainment of knowledge about those variants, but some deviations may be indicative of differences in disease prevalence. This work will allow targeted precision medicine efforts at revealed disparities.
Collapse
Affiliation(s)
- Eric Venner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| | - Karynne Patterson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Marsha M Wheeler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Yi-Ju Chen
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Sara E Kalla
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Bo Yuan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jason H Karnes
- University of Arizona, R Ken Coit College of Pharmacy, Department of Pharmacy Practice and Science, Tucson, AZ, USA
- Vanderbilt University Medical Center, Department of Biomedical Informatics, Boston, MA, USA
| | - Kimberly Walker
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Joshua D Smith
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sean McGee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Andrew Haddad
- Department of Pharmaceutical Sciences, University of Pittsburgh School of Pharmacy, Pittsburgh, PA, USA
| | - Philip E Empey
- Department of Pharmacy and Therapeutics, University of Pittsburgh School of Pharmacy, Pittsburgh, PA, USA
| | - Qiaoyan Wang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Diana Toledo
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gail Jarvik
- Department of Medicine (Medical Genetics), University of Washington School of Medicine, Seattle, WA, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Anjene Musick
- NIH All of Us Research Program, National Institutes of Health Office of the Director, Bethesda, MD, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
12
|
Abondio P, Bruno F, Passarino G, Montesanto A, Luiselli D. Pangenomics: A new era in the field of neurodegenerative diseases. Ageing Res Rev 2024; 94:102180. [PMID: 38163518 DOI: 10.1016/j.arr.2023.102180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/14/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024]
Abstract
A pangenome is composed of all the genetic variability of a group of individuals, and its application to the study of neurodegenerative diseases may provide valuable insights into the underlying aspects of genetic heterogenetiy for these complex ailments, including gene expression, epigenetics, and translation mechanisms. Furthermore, a reference pangenome allows for the identification of previously undetected structural commonalities and differences among individuals, which may help in the diagnosis of a disease, support the prediction of what will happen over time (prognosis) and aid in developing novel treatments in the perspective of personalized medicine. Therefore, in the present review, the application of the pangenome concept to the study of neurodegenerative diseases will be discussed and analyzed for its potential to enable an improvement in diagnosis and prognosis for these illnesses, leading to the development of tailored treatments for individual patients from the knowledge of the genomic composition of a whole population.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy.
| | - Francesco Bruno
- Academy of Cognitive Behavioral Sciences of Calabria (ASCoC), Lamezia Terme, Italy; Regional Neurogenetic Centre (CRN), Department of Primary Care, Azienda Sanitaria Provinciale Di Catanzaro, Viale A. Perugini, 88046 Lamezia Terme, CZ, Italy; Association for Neurogenetic Research (ARN), Lamezia Terme, CZ, Italy
| | - Giuseppe Passarino
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Alberto Montesanto
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
13
|
Dotan E, Lynch SM, Ryan JC, Mitchell EP. Disparities in care of older adults of color with cancer: A narrative review. Cancer Med 2024; 13:e6790. [PMID: 38234214 PMCID: PMC10905558 DOI: 10.1002/cam4.6790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/06/2023] [Accepted: 11/23/2023] [Indexed: 01/19/2024] Open
Abstract
This review describes the barriers and challenges faced by older adults of color with cancer and highlights methods to improve their overall care. In the next decade, cancer incidence rates are expected to increase in the United States for people aged ≥65 years. A large proportion will be older adults of color who often have worse outcomes than older White patients. Many issues contribute to racial disparities in older adults, including biological factors and social determinants of health (SDOH) related to healthcare access, socioeconomic concerns, systemic racism, mistrust, and the neighborhood where a person lives. These disparities are exacerbated by age-related challenges often experienced by older adults, such as decreased functional status, impaired cognition, high rates of comorbidities and polypharmacy, poor nutrition, and limited social support. Additionally, underrepresentation of both patients of color and older adults in cancer clinical research results in a lack of adequate data to guide the management of these patients. Use of geriatric assessments (GA) can aid providers in uncovering age-related concerns and personalizing interventions for older patients. Research demonstrates the ability of GA-directed care to result in fewer treatment-related toxicities and improved quality of life, thus supporting the routine incorporation of validated GA into these patients' care. GA can be enhanced by including evaluation of SDOH, which can help healthcare providers understand and address the needs of older adults of color with cancer who face disparities related to their age and race.
Collapse
Affiliation(s)
- Efrat Dotan
- Department of Hematology/OncologyFox Chase Cancer CenterPhiladelphiaPennsylvaniaUSA
| | | | | | - Edith P. Mitchell
- Clinical Professor of Medicine and Medical OncologySidney Kimmel Cancer Center at JeffersonPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
14
|
Zhang Z, Jiang T, Li G, Cao S, Liu Y, Liu B, Wang Y. Kled: an ultra-fast and sensitive structural variant detection tool for long-read sequencing data. Brief Bioinform 2024; 25:bbae049. [PMID: 38385878 PMCID: PMC10883419 DOI: 10.1093/bib/bbae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/12/2024] [Accepted: 01/26/2024] [Indexed: 02/23/2024] Open
Abstract
Structural Variants (SVs) are a crucial type of genetic variant that can significantly impact phenotypes. Therefore, the identification of SVs is an essential part of modern genomic analysis. In this article, we present kled, an ultra-fast and sensitive SV caller for long-read sequencing data given the specially designed approach with a novel signature-merging algorithm, custom refinement strategies and a high-performance program structure. The evaluation results demonstrate that kled can achieve optimal SV calling compared to several state-of-the-art methods on simulated and real long-read data for different platforms and sequencing depths. Furthermore, kled excels at rapid SV calling and can efficiently utilize multiple Central Processing Unit (CPU) cores while maintaining low memory usage. The source code for kled can be obtained from https://github.com/CoREse/kled.
Collapse
Affiliation(s)
- Zhendong Zhang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tao Jiang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Gaoyang Li
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuqi Cao
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Bo Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|
15
|
Dylus D, Altenhoff A, Majidian S, Sedlazeck FJ, Dessimoz C. Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree. Nat Biotechnol 2024; 42:139-147. [PMID: 37081138 PMCID: PMC10791578 DOI: 10.1038/s41587-023-01753-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 03/16/2023] [Indexed: 04/22/2023]
Abstract
Current methods for inference of phylogenetic trees require running complex pipelines at substantial computational and labor costs, with additional constraints in sequencing coverage, assembly and annotation quality, especially for large datasets. To overcome these challenges, we present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes and bypasses traditional steps in phylogeny inference, such as genome assembly, annotation and all-versus-all sequence comparisons, while retaining accuracy. In a benchmark encompassing a broad variety of datasets, Read2Tree is 10-100 times faster than assembly-based approaches and in most cases more accurate-the exception being when sequencing coverage is high and reference species very distant. Here, to illustrate the broad applicability of the tool, we reconstruct a yeast tree of life of 435 species spanning 590 million years of evolution. We also apply Read2Tree to >10,000 Coronaviridae samples, accurately classifying highly diverse animal samples and near-identical severe acute respiratory syndrome coronavirus 2 sequences on a single tree. The speed, accuracy and versatility of Read2Tree enable comparative genomics at scale.
Collapse
Affiliation(s)
- David Dylus
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- F. Hoffmann-La Roche Ltd, Immunology, Infectious Disease, and Ophthalmology (I2O), Roche Pharmaceutical Research and Early Development (pRED), Basel, Switzerland
| | - Adrian Altenhoff
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computer Science, ETH, Zurich, Switzerland
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Department of Computer Science, University College London, London, UK.
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK.
| |
Collapse
|
16
|
Faltejsková K, Vondrášek J. PAPerFly: Partial Assembly-based Peak Finder for ab initio binding site reconstruction. BMC Bioinformatics 2023; 24:487. [PMID: 38114921 PMCID: PMC10731698 DOI: 10.1186/s12859-023-05613-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/11/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND The specific recognition of a DNA locus by a given transcription factor is a widely studied issue. It is generally agreed that the recognition can be influenced not only by the binding motif but by the larger context of the binding site. In this work, we present a novel heuristic algorithm that can reconstruct the unique binding sites captured in a sequencing experiment without using the reference genome. RESULTS We present PAPerFly, the Partial Assembly-based Peak Finder, a tool for the binding site and binding context reconstruction from the sequencing data without any prior knowledge. This tool operates without the need to know the reference genome of the respective organism. We employ algorithmic approaches that are used during genome assembly. The proposed algorithm constructs a de Bruijn graph from the sequencing data. Based on this graph, sequences and their enrichment are reconstructed using a novel heuristic algorithm. The reconstructed sequences are aligned and the peaks in the sequence enrichment are identified. Our approach was tested by processing several ChIP-seq experiments available in the ENCODE database and comparing the results of Paperfly and standard methods. CONCLUSIONS We show that PAPerFly, an algorithm tailored for experiment analysis without the reference genome, yields better results than an aggregation of ChIP-seq agnostic tools. Our tool is freely available at https://github.com/Caeph/paperfly/ or on Zenodo ( https://doi.org/10.5281/zenodo.7116424 ).
Collapse
Affiliation(s)
- Kateřina Faltejsková
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 542/2, 160 00, Prague, Czech Republic.
- Computer Science Institute, Faculty of Mathematics and Physics, Charles University, Malostranské náměstí 25, 118 00, Prague, Czech Republic.
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 542/2, 160 00, Prague, Czech Republic.
| |
Collapse
|
17
|
LoTempio J, Delot E, Vilain E. Benchmarking long-read genome sequence alignment tools for human genomics applications. PeerJ 2023; 11:e16515. [PMID: 38130927 PMCID: PMC10734412 DOI: 10.7717/peerj.16515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/02/2023] [Indexed: 12/23/2023] Open
Abstract
Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform.
Collapse
Affiliation(s)
- Jonathan LoTempio
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| | - Emmanuele Delot
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, United States of America
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, United States of America
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| |
Collapse
|
18
|
Kabata F, Thaldar D. The human genome as the common heritage of humanity. Front Genet 2023; 14:1282515. [PMID: 38028596 PMCID: PMC10662319 DOI: 10.3389/fgene.2023.1282515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
While debate on the international regulation of human genomic research remains unsettled, the Universal Declaration on the Human Genome and Human Rights, 1997 qualifies the human genome as "heritage of humankind" in a symbolic sense. Using document analysis this article assesses whether, how and to what extent the common heritage framework is relevant in regulation of human genomic research. The article traces the history of the Human Genome Project to reveal the international community's race against privatization of the human genome and its resulting qualification as the common heritage of humanity. Further, it reviews the archival records of UNESCO's International Bioethics Committee to discover the rationale for qualifying the human genome as common heritage of humankind. The article finds that the common heritage of mankind framework remains relevant to the application of the human genome at the collective level. However, the framework is at odds with the individual dimension of the human genome based on individual personality rights. The article thus argues that the right to benefit from scientific progress and its applications offers an alternative international regulatory framework for human genomic research.
Collapse
Affiliation(s)
- Faith Kabata
- School of Law, University of KwaZulu-Natal, Durban, South Africa
| | - Donrich Thaldar
- School of Law, University of KwaZulu-Natal, Durban, South Africa
- Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics, Harvard Law School, Cambridge, MA, United States
| |
Collapse
|
19
|
Chrisman B, He C, Jung JY, Stockham N, Paskov K, Washington P, Petereit J, Wall DP. Localizing unmapped sequences with families to validate the Telomere-to-Telomere assembly and identify new hotspots for genetic diversity. Genome Res 2023; 33:1734-1746. [PMID: 37879860 PMCID: PMC10691534 DOI: 10.1101/gr.277175.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 05/25/2023] [Indexed: 10/27/2023]
Abstract
Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and ∼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.
Collapse
Affiliation(s)
- Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA;
- Nevada Bioinformatics Center, University of Nevada, Reno, Nevada 89557, USA
| | - Chloe He
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
| | - Jae-Yoon Jung
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California 94305, USA
| | - Nate Stockham
- Department of Neuroscience, Stanford University, Stanford, California 94305, USA
| | - Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | - Juli Petereit
- Nevada Bioinformatics Center, University of Nevada, Reno, Nevada 89557, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California 94305, USA
| |
Collapse
|
20
|
Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, Liu P, Zhang S, Chen G, Xu J, Zhou H, Zhou L, Qian X, Liu C, Tan S, Zhou C, Dai W, Xu M, Qi Y, Wang X, Guo L, Fan G, Wang A, Deng Y, Zhang Y, Jin J, He Y, Guo C, Guo G, Zhou Q, Xu X, Yang H, Wang J, Xu S, Mao Y, Jin X, Ruan J, Zhang G. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res 2023; 33:745-761. [PMID: 37452091 PMCID: PMC10542383 DOI: 10.1038/s41422-023-00849-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Collapse
Affiliation(s)
- Chentao Yang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yang Zhou
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI Research-Wuhan, BGI, Wuhan, Hubei, China
| | - Yanni Song
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Dongya Wu
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yan Zeng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Lei Nie
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jinjin Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Hongling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaobo Qian
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chenlu Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | | | | | - Wei Dai
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Mengyang Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yanwei Qi
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Xiaobo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lidong Guo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Aijun Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Yunqiu He
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunxue Guo
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Hangzhou, Hangzhou, Zhejiang, China
| | - Guoji Guo
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing Zhou
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Guojie Zhang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
21
|
Lee H, Greer SU, Pavlichin DS, Zhou B, Urban AE, Weissman T, Ji HP. Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. CELL REPORTS METHODS 2023; 3:100543. [PMID: 37671027 PMCID: PMC10475782 DOI: 10.1016/j.crmeth.2023.100543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 04/14/2023] [Accepted: 07/06/2023] [Indexed: 09/07/2023]
Abstract
The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.
Collapse
Affiliation(s)
- HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stephanie U. Greer
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Dmitri S. Pavlichin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Alexander E. Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| |
Collapse
|
22
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
23
|
Rulten SL, Grose RP, Gatz SA, Jones JL, Cameron AJM. The Future of Precision Oncology. Int J Mol Sci 2023; 24:12613. [PMID: 37628794 PMCID: PMC10454858 DOI: 10.3390/ijms241612613] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/03/2023] [Accepted: 08/05/2023] [Indexed: 08/27/2023] Open
Abstract
Our understanding of the molecular mechanisms underlying cancer development and evolution have evolved rapidly over recent years, and the variation from one patient to another is now widely recognized. Consequently, one-size-fits-all approaches to the treatment of cancer have been superseded by precision medicines that target specific disease characteristics, promising maximum clinical efficacy, minimal safety concerns, and reduced economic burden. While precision oncology has been very successful in the treatment of some tumors with specific characteristics, a large number of patients do not yet have access to precision medicines for their disease. The success of next-generation precision oncology depends on the discovery of new actionable disease characteristics, rapid, accurate, and comprehensive diagnosis of complex phenotypes within each patient, novel clinical trial designs with improved response rates, and worldwide access to novel targeted anticancer therapies for all patients. This review outlines some of the current technological trends, and highlights some of the complex multidisciplinary efforts that are underway to ensure that many more patients with cancer will be able to benefit from precision oncology in the near future.
Collapse
Affiliation(s)
| | - Richard P. Grose
- Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK; (R.P.G.); (J.L.J.)
| | - Susanne A. Gatz
- Cancer Research UK Clinical Trials Unit (CRCTU), Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK;
| | - J. Louise Jones
- Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK; (R.P.G.); (J.L.J.)
| | - Angus J. M. Cameron
- Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK; (R.P.G.); (J.L.J.)
| |
Collapse
|
24
|
Yu H, Zheng Z, Su J, Lam TW, Luo R. Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP. BMC Bioinformatics 2023; 24:308. [PMID: 37537536 PMCID: PMC10401749 DOI: 10.1186/s12859-023-05434-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 07/31/2023] [Indexed: 08/05/2023] Open
Abstract
BACKGROUND With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data. RESULTS We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at: https://github.com/HKU-BAL/Clair3-MP . CONCLUSIONS These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications.
Collapse
Affiliation(s)
- Huijing Yu
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Zhenxian Zheng
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Junhao Su
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| |
Collapse
|
25
|
Ma J, Cáceres M, Salmela L, Mäkinen V, Tomescu AI. Chaining for accurate alignment of erroneous long reads to acyclic variation graphs. Bioinformatics 2023; 39:btad460. [PMID: 37494467 PMCID: PMC10423031 DOI: 10.1093/bioinformatics/btad460] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/08/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit [Garrison et al. (Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9)] is a popular aligner of short reads, GraphAligner [Rautiainen and Marschall (GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253-28)] is the state-of-the-art aligner of erroneous long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. RESULTS We present a new algorithm to co-linearly chain a set of seeds in a string labeled acyclic graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of erroneous long reads to acyclic variation graphs, GraphChainer. We run experiments aligning real and simulated PacBio CLR reads with average error rates 15% and 5%. Compared to GraphAligner, GraphChainer aligns 12-17% more reads, and 21-28% more total read length, on real PacBio CLR reads from human chromosomes 1, 22, and the whole human pangenome. On both simulated and real data, GraphChainer aligns between 95% and 99% of all reads, and of total read length. We also show that minigraph [Li et al. (The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020;21:265-19.)] and minichain [Chandra and Jain (Sequence to graph alignment using gap-sensitive co-linear chaining. In: Proceedings of the 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2023). Springer, 2023, 58-73.)] obtain an accuracy of <60% on this setting. AVAILABILITY AND IMPLEMENTATION GraphChainer is freely available at https://github.com/algbio/GraphChainer. The datasets and evaluation pipeline can be reached from the previous address.
Collapse
Affiliation(s)
- Jun Ma
- Department of Computer Science, University of Helsinki, 00014 Helsinki, Finland
| | - Manuel Cáceres
- Department of Computer Science, University of Helsinki, 00014 Helsinki, Finland
| | - Leena Salmela
- Department of Computer Science, University of Helsinki, 00014 Helsinki, Finland
| | - Veli Mäkinen
- Department of Computer Science, University of Helsinki, 00014 Helsinki, Finland
| | - Alexandru I Tomescu
- Department of Computer Science, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|
26
|
Houwaart T, Scholz S, Pollock NR, Palmer WH, Kichula KM, Strelow D, Le DB, Belick D, Hülse L, Lautwein T, Wachtmeister T, Wollenweber TE, Henrich B, Köhrer K, Parham P, Guethlein LA, Norman PJ, Dilthey AT. Complete sequences of six major histocompatibility complex haplotypes, including all the major MHC class II structures. HLA 2023; 102:28-43. [PMID: 36932816 PMCID: PMC10986641 DOI: 10.1111/tan.15020] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 02/10/2023] [Accepted: 02/24/2023] [Indexed: 03/19/2023]
Abstract
Accurate and comprehensive immunogenetic reference panels are key to the successful implementation of population-scale immunogenomics. The 5Mbp Major Histocompatibility Complex (MHC) is the most polymorphic region of the human genome and associated with multiple immune-mediated diseases, transplant matching and therapy responses. Analysis of MHC genetic variation is severely complicated by complex patterns of sequence variation, linkage disequilibrium and a lack of fully resolved MHC reference haplotypes, increasing the risk of spurious findings on analyzing this medically important region. Integrating Illumina, ultra-long Nanopore, and PacBio HiFi sequencing as well as bespoke bioinformatics, we completed five of the alternative MHC reference haplotypes of the current (GRCh38/hg38) build of the human reference genome and added one other. The six assembled MHC haplotypes encompass the DR1 and DR4 haplotype structures in addition to the previously completed DR2 and DR3, as well as six distinct classes of the structurally variable C4 region. Analysis of the assembled haplotypes showed that MHC class II sequence structures, including repeat element positions, are generally conserved within the DR haplotype supergroups, and that sequence diversity peaks in three regions around HLA-A, HLA-B+C, and the HLA class II genes. Demonstrating the potential for improved short-read analysis, the number of proper read pairs recruited to the MHC was found to be increased by 0.06%-0.49% in a 1000 Genomes Project read remapping experiment with seven diverse samples. Furthermore, the assembled haplotypes can serve as references for the community and provide the basis of a structurally accurate genotyping graph of the complete MHC region.
Collapse
Affiliation(s)
- Torsten Houwaart
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Nicholas R. Pollock
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - William H. Palmer
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - Katherine M. Kichula
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - Daniel Strelow
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Duyen B. Le
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Dana Belick
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Lisanna Hülse
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Tobias Lautwein
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Thorsten Wachtmeister
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Tassilo E. Wollenweber
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Birgit Henrich
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Karl Köhrer
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Peter Parham
- Department of Structural Biology, and Department of Microbiology and ImmunologyStanford UniversityStanfordCaliforniaUSA
| | - Lisbeth A. Guethlein
- Department of Structural Biology, and Department of Microbiology and ImmunologyStanford UniversityStanfordCaliforniaUSA
| | - Paul J. Norman
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - Alexander T. Dilthey
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| |
Collapse
|
27
|
Gao Y, Yang X, Chen H, Tan X, Yang Z, Deng L, Wang B, Kong S, Li S, Cui Y, Lei C, Wang Y, Pan Y, Ma S, Sun H, Zhao X, Shi Y, Yang Z, Wu D, Wu S, Zhao X, Shi B, Jin L, Hu Z, Lu Y, Chu J, Ye K, Xu S. A pangenome reference of 36 Chinese populations. Nature 2023; 619:112-121. [PMID: 37316654 PMCID: PMC10322713 DOI: 10.1038/s41586-023-06173-7] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 05/05/2023] [Indexed: 06/16/2023]
Abstract
Human genomics is witnessing an ongoing paradigm shift from a single reference sequence to a pangenome form, but populations of Asian ancestry are underrepresented. Here we present data from the first phase of the Chinese Pangenome Consortium, including a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 minority Chinese ethnic groups. With an average 30.65× high-fidelity long-read sequence coverage, an average contiguity N50 of more than 35.63 megabases and an average total size of 3.01 gigabases, the CPC core assemblies add 189 million base pairs of euchromatic polymorphic sequences and 1,367 protein-coding gene duplications to GRCh38. We identified 15.9 million small variants and 78,072 structural variants, of which 5.9 million small variants and 34,223 structural variants were not reported in a recently released pangenome reference1. The Chinese Pangenome Consortium data demonstrate a remarkable increase in the discovery of novel and missing sequences when individuals are included from underrepresented minority ethnic groups. The missing reference sequences were enriched with archaic-derived alleles and genes that confer essential functions related to keratinization, response to ultraviolet radiation, DNA repair, immunological responses and lifespan, implying great potential for shedding new light on human evolution and recovering missing heritability in complex disease mapping.
Collapse
Affiliation(s)
- Yang Gao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Hao Chen
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Xinjiang Tan
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Zhaoqing Yang
- Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, China
| | - Lian Deng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Baonan Wang
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Shuang Kong
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Songyang Li
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Yuhang Cui
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Chang Lei
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Yimin Wang
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yuwen Pan
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Sen Ma
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hao Sun
- Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, China
| | - Xiaohan Zhao
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Yingbing Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Ziyi Yang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Shaoyuan Wu
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, China
| | - Xingming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Ministry of Education Key (MOE) Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science Fudan University, Shanghai, China
| | - Binyin Shi
- Department of Endocrinology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| | - Zhibin Hu
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yan Lu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China.
| | - Jiayou Chu
- Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, China.
| | - Kai Ye
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China.
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, China.
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
28
|
Smith TPL, Bickhart DM, Boichard D, Chamberlain AJ, Djikeng A, Jiang Y, Low WY, Pausch H, Demyda-Peyrás S, Prendergast J, Schnabel RD, Rosen BD. The Bovine Pangenome Consortium: democratizing production and accessibility of genome assemblies for global cattle breeds and other bovine species. Genome Biol 2023; 24:139. [PMID: 37337218 DOI: 10.1186/s13059-023-02975-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 05/19/2023] [Indexed: 06/21/2023] Open
Abstract
The Bovine Pangenome Consortium (BPC) is an international collaboration dedicated to the assembly of cattle genomes to develop a more complete representation of cattle genomic diversity. The goal of the BPC is to provide genome assemblies and a community-agreed pangenome representation to replace breed-specific reference assemblies for cattle genomics. The BPC invites partners sharing our vision to participate in the production of these assemblies and the development of a common, community-approved, pangenome reference as a public resource for the research community ( https://bovinepangenome.github.io/ ). This community-driven resource will provide the context for comparison between studies and the future foundation for cattle genomic selection.
Collapse
Affiliation(s)
- Timothy P L Smith
- US Meat Animal Research Center, USDA-ARS, Clay Center, NE, 68933, USA
| | | | - Didier Boichard
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Amanda J Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Appolinaire Djikeng
- Centre for Tropical Livestock Genetics and Health, ILRI Kenya, Nairobi, 30709-00100, Kenya
- Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK
| | - Yu Jiang
- Center for Ruminant Genetics and Evolution, Northwest A&F University, Yangling, 712100, China
| | - Wai Y Low
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, SA, 5371, Australia
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8092, Zurich, Switzerland
| | - Sebastian Demyda-Peyrás
- Departamento de Producción Animal, Facultad de Ciencias Veterinarias, Universidad Nacional de La Plata, 1900, La Plata, Argentina
- Consejo Superior de Investigaciones Científicas Y Tecnológicas (CONICET), CCT-La Plata, 1900, La Plata, Argentina
| | - James Prendergast
- Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, UK
| | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, 20705, USA.
| |
Collapse
|
29
|
A pangenome reference representative of 36 minority Chinese ethnic groups. Nature 2023:10.1038/d41586-023-01675-w. [PMID: 37316594 DOI: 10.1038/d41586-023-01675-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
|
30
|
Abondio P, Cilli E, Luiselli D. Human Pangenomics: Promises and Challenges of a Distributed Genomic Reference. Life (Basel) 2023; 13:1360. [PMID: 37374141 DOI: 10.3390/life13061360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/02/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
A pangenome is a collection of the common and unique genomes that are present in a given species. It combines the genetic information of all the genomes sampled, resulting in a large and diverse range of genetic material. Pangenomic analysis offers several advantages compared to traditional genomic research. For example, a pangenome is not bound by the physical constraints of a single genome, so it can capture more genetic variability. Thanks to the introduction of the concept of pangenome, it is possible to use exceedingly detailed sequence data to study the evolutionary history of two different species, or how populations within a species differ genetically. In the wake of the Human Pangenome Project, this review aims at discussing the advantages of the pangenome around human genetic variation, which are then framed around how pangenomic data can inform population genetics, phylogenetics, and public health policy by providing insights into the genetic basis of diseases or determining personalized treatments, targeting the specific genetic profile of an individual. Moreover, technical limitations, ethical concerns, and legal considerations are discussed.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Elisabetta Cilli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
31
|
Matalon DR, Zepeda-Mendoza CJ, Aarabi M, Brown K, Fullerton SM, Kaur S, Quintero-Rivera F, Vatta M. Clinical, technical, and environmental biases influencing equitable access to clinical genetics/genomics testing: A points to consider statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2023; 25:100812. [PMID: 37058144 DOI: 10.1016/j.gim.2023.100812] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/07/2023] [Indexed: 04/15/2023] Open
Affiliation(s)
- Dena R Matalon
- Division of Medical Genetics, Department of Pediatrics, Stanford Medicine, Stanford University, Stanford, CA
| | - Cinthya J Zepeda-Mendoza
- Divisions of Hematopathology and Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN
| | - Mahmoud Aarabi
- UPMC Medical Genetics and Genomics Laboratories, UPMC Magee-Womens Hospital, Pittsburgh, PA; Departments of Pathology and Obstetrics, Gynecology and Reproductive Sciences, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | | | - Stephanie M Fullerton
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA; Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA
| | - Shagun Kaur
- Department of Child Health, Phoenix Children's Hospital, University of Arizona College of Medicine-Phoenix, Phoenix, AZ
| | - Fabiola Quintero-Rivera
- Division of Genetic and Genomic Medicine, Departments of Pathology, Laboratory Medicine, and Pediatrics, University of California Irvine, Irvine, CA
| | | |
Collapse
|
32
|
Yang S, Kim SH, Kang M, Joo JY. Harnessing deep learning into hidden mutations of neurological disorders for therapeutic challenges. Arch Pharm Res 2023:10.1007/s12272-023-01450-5. [PMID: 37261600 DOI: 10.1007/s12272-023-01450-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 05/26/2023] [Indexed: 06/02/2023]
Abstract
The relevant study of transcriptome-wide variations and neurological disorders in the evolved field of genomic data science is on the rise. Deep learning has been highlighted utilizing algorithms on massive amounts of data in a human-like manner, and is expected to predict the dependency or druggability of hidden mutations within the genome. Enormous mutational variants in coding and noncoding transcripts have been discovered along the genome by far, despite of the fine-tuned genetic proofreading machinery. These variants could be capable of inducing various pathological conditions, including neurological disorders, which require lifelong care. Several limitations and questions emerge, including the use of conventional processes via limited patient-driven sequence acquisitions and decoding-based inferences as well as how rare variants can be deduced as a population-specific etiology. These puzzles require harnessing of advanced systems for precise disease prediction, drug development and drug applications. In this review, we summarize the pathophysiological discoveries of pathogenic variants in both coding and noncoding transcripts in neurological disorders, and the current advantage of deep learning applications. In addition, we discuss the challenges encountered and how to outperform them with advancing interpretation.
Collapse
Affiliation(s)
- Sumin Yang
- Department of Pharmacy, College of Pharmacy, Hanyang University, Rm 407, Bldg.42, 55 Hanyangdaehak-Ro, Sangnok-Gu Ansan, Ansan, Gyeonggi-Do, 15588, Republic of Korea
| | - Sung-Hyun Kim
- Department of Pharmacy, College of Pharmacy, Hanyang University, Rm 407, Bldg.42, 55 Hanyangdaehak-Ro, Sangnok-Gu Ansan, Ansan, Gyeonggi-Do, 15588, Republic of Korea
| | - Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, NV, 89154, USA
| | - Jae-Yeol Joo
- Department of Pharmacy, College of Pharmacy, Hanyang University, Rm 407, Bldg.42, 55 Hanyangdaehak-Ro, Sangnok-Gu Ansan, Ansan, Gyeonggi-Do, 15588, Republic of Korea.
| |
Collapse
|
33
|
Zhuo X, Hsu S, Purushotham D, Kuntala PK, Harrison JK, Du AY, Chen S, Li D, Wang T. Comparing genomic and epigenomic features across species using the WashU Comparative Epigenome Browser. Genome Res 2023; 33:824-835. [PMID: 37156621 PMCID: PMC10317122 DOI: 10.1101/gr.277550.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 05/03/2023] [Indexed: 05/10/2023]
Abstract
Genome browsers have become an intuitive and critical tool to visualize and analyze genomic features and data. Conventional genome browsers display data/annotations on a single reference genome/assembly; there are also genomic alignment viewer/browsers that help users visualize alignment, mismatch, and rearrangement between syntenic regions. However, there is a growing need for a comparative epigenome browser that can display genomic and epigenomic data sets across different species and enable users to compare them between syntenic regions. Here, we present the WashU Comparative Epigenome Browser. It allows users to load functional genomic data sets/annotations mapped to different genomes and display them over syntenic regions simultaneously. The browser also displays genetic differences between the genomes from single-nucleotide variants (SNVs) to structural variants (SVs) to visualize the association between epigenomic differences and genetic differences. Instead of anchoring all data sets to the reference genome coordinates, it creates independent coordinates of different genome assemblies to faithfully present features and data mapped to different genomes. It uses a simple, intuitive genome-align track to illustrate the syntenic relationship between different species. It extends the widely used WashU Epigenome Browser infrastructure and can be expanded to support multiple species. This new browser function will greatly facilitate comparative genomic/epigenomic research, as well as support the recent growing needs to directly compare and benchmark the T2T CHM13 assembly and other human genome assemblies.
Collapse
Affiliation(s)
- Xiaoyu Zhuo
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Silas Hsu
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Deepak Purushotham
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Prashant Kumar Kuntala
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Jessica K Harrison
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Alan Y Du
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Samuel Chen
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Daofeng Li
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| |
Collapse
|
34
|
Mikhaylova V, Rzepka M, Kawamura T, Xia Y, Chang PL, Zhou S, Pham L, Modi N, Yao L, Perez-Agustin A, Pagans S, Boles TC, Lei M, Wang Y, Garcia-Bassets I, Chen Z. Targeted Phasing of 2-200 Kilobase DNA Fragments with a Short-Read Sequencer and a Single-Tube Linked-Read Library Method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.05.531179. [PMID: 36945366 PMCID: PMC10028795 DOI: 10.1101/2023.03.05.531179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
In the human genome, heterozygous sites are genomic positions with different alleles inherited from each parent. On average, there is a heterozygous site every 1-2 kilobases (kb). Resolving whether two alleles in neighboring heterozygous positions are physically linked-that is, phased-is possible with a short-read sequencer if the sequencing library captures long-range information. TELL-Seq is a library preparation method based on millions of barcoded micro-sized beads that enables instrument-free phasing of a whole human genome in a single PCR tube. TELL-Seq incorporates a unique molecular identifier (barcode) to the short reads generated from the same high-molecular-weight (HMW) DNA fragment (known as 'linked-reads'). However, genome-scale TELL-Seq is not cost-effective for applications focusing on a single locus or a few loci. Here, we present an optimized TELL-Seq protocol that enables the cost-effective phasing of enriched loci (targets) of varying sizes, purity levels, and heterozygosity. Targeted TELL-Seq maximizes linked-read efficiency and library yield while minimizing input requirements, fragment collisions on microbeads, and sequencing burden. To validate the targeted protocol, we phased seven 180-200 kb loci enriched by CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis, four 20 kb loci enriched by CRISPR/Cas9-mediated protection from exonuclease digestion, and six 2-13 kb loci amplified by PCR. The selected targets have clinical and research relevance (BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA). These analyses reveal that targeted TELL-Seq provides a reliable way of phasing allelic variants within targets (2-200 kb in length) with the low cost and high accuracy of short-read sequencing.
Collapse
Affiliation(s)
| | - Madison Rzepka
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | | | - Yu Xia
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | - Peter L. Chang
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | | | - Long Pham
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | - Naisarg Modi
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | - Likun Yao
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093 USA
| | - Adrian Perez-Agustin
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | - Sara Pagans
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | | | - Ming Lei
- Universal Sequencing Technology Corp., Canton, MA 02021, USA
| | - Yong Wang
- Universal Sequencing Technology Corp., Canton, MA 02021, USA
| | | | - Zhoutao Chen
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| |
Collapse
|
35
|
Deorowicz S, Danek A, Li H. AGC: compact representation of assembled genomes with fast queries and updates. Bioinformatics 2023; 39:7067744. [PMID: 36864624 PMCID: PMC9994791 DOI: 10.1093/bioinformatics/btad097] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 01/13/2023] [Indexed: 03/04/2023] Open
Abstract
MOTIVATION High-quality sequence assembly is the ultimate representation of complete genetic information of an individual. Several ongoing pangenome projects are producing collections of high-quality assemblies of various species. Each project has already generated assemblies of hundreds of gigabytes on disk, greatly impeding the distribution of and access to such rich datasets. RESULTS Here, we show how to reduce the size of the sequenced genomes by 2-3 orders of magnitude. Our tool compresses the genomes significantly better than the existing programs and is much faster. Moreover, its unique feature is the ability to access any contig (or its part) in a fraction of a second and easily append new samples to the compressed collections. Thanks to this, AGC could be useful not only for backup or transfer purposes but also for routine analysis of pangenome sequences in common pipelines. With the rapidly reduced cost and improved accuracy of sequencing technologies, we anticipate more comprehensive pangenome projects with much larger sample sizes. AGC is likely to become a foundation tool to store, distribute and access pangenome data. AVAILABILITY AND IMPLEMENTATION The source code of AGC is available at https://github.com/refresh-bio/agc. The package can be installed via Bioconda at https://anaconda.org/bioconda/agc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sebastian Deorowicz
- Department of Algorithmics and Software, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice 44-100, Poland
| | - Agnieszka Danek
- Department of Algorithmics and Software, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice 44-100, Poland
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
36
|
Marco-Sola S, Eizenga JM, Guarracino A, Paten B, Garrison E, Moreto M. Optimal gap-affine alignment in O(s) space. Bioinformatics 2023; 39:7030690. [PMID: 36749013 PMCID: PMC9940620 DOI: 10.1093/bioinformatics/btad074] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 01/02/2023] [Indexed: 02/08/2023] Open
Abstract
MOTIVATION Pairwise sequence alignment remains a fundamental problem in computational biology and bioinformatics. Recent advances in genomics and sequencing technologies demand faster and scalable algorithms that can cope with the ever-increasing sequence lengths. Classical pairwise alignment algorithms based on dynamic programming are strongly limited by quadratic requirements in time and memory. The recently proposed wavefront alignment algorithm (WFA) introduced an efficient algorithm to perform exact gap-affine alignment in O(ns) time, where s is the optimal score and n is the sequence length. Notwithstanding these bounds, WFA's O(s2) memory requirements become computationally impractical for genome-scale alignments, leading to a need for further improvement. RESULTS In this article, we present the bidirectional WFA algorithm, the first gap-affine algorithm capable of computing optimal alignments in O(s) memory while retaining WFA's time complexity of O(ns). As a result, this work improves the lowest known memory bound O(n) to compute gap-affine alignments. In practice, our implementation never requires more than a few hundred MBs aligning noisy Oxford Nanopore Technologies reads up to 1 Mbp long while maintaining competitive execution times. AVAILABILITY AND IMPLEMENTATION All code is publicly available at https://github.com/smarco/BiWFA-paper. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain.,Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona 08193, Spain
| | - Jordan M Eizenga
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Andrea Guarracino
- Genomics Research Centre, Human Technopole, Milan 20157, Italy.,Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Miquel Moreto
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain.,Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona 08034, Spain
| |
Collapse
|
37
|
Mostajo-Radji MA. A Latin American perspective on neurodiplomacy. FRONTIERS IN MEDICAL TECHNOLOGY 2023; 4:1005043. [PMID: 36712171 PMCID: PMC9880232 DOI: 10.3389/fmedt.2022.1005043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 12/08/2022] [Indexed: 01/15/2023] Open
Affiliation(s)
- Mohammed A. Mostajo-Radji
- UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, CA, United States
| |
Collapse
|
38
|
Dylus D, Altenhoff A, Majidian S, Sedlazeck FJ, Dessimoz C. Read2Tree: scalable and accurate phylogenetic trees from raw reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.04.18.488678. [PMID: 36561179 PMCID: PMC9774205 DOI: 10.1101/2022.04.18.488678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The inference of phylogenetic trees is foundational to biology. However, state-of-the-art phylogenomics requires running complex pipelines, at significant computational and labour costs, with additional constraints in sequencing coverage, assembly and annotation quality. To overcome these challenges, we present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes. In a benchmark encompassing a broad variety of datasets, our assembly-free approach was 10-100x faster than conventional approaches, and in most cases more accurate-the exception being when sequencing coverage was high and reference species very distant. To illustrate the broad applicability of the tool, we reconstructed a yeast tree of life of 435 species spanning 590 million years of evolution. Applied to Coronaviridae samples, Read2Tree accurately classified highly diverse animal samples and near-identical SARS-CoV-2 sequences on a single tree-thereby exhibiting remarkable breadth and depth. The speed, accuracy, and versatility of Read2Tree enables comparative genomics at scale.
Collapse
Affiliation(s)
- David Dylus
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- present address: F. Hoffmann-La Roche Ltd, Immunology, Infectious Disease, and Ophthalmology (I2O), Roche Pharmaceutical Research and Early Development (pRED), Basel, 4070, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Adrian Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computer Science, ETH, 8092 Zurich, Switzerland
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, 77005, USA
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Centre for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E, UK
| |
Collapse
|
39
|
Modzelewski AJ, Gan Chong J, Wang T, He L. Mammalian genome innovation through transposon domestication. Nat Cell Biol 2022; 24:1332-1340. [PMID: 36008480 PMCID: PMC9729749 DOI: 10.1038/s41556-022-00970-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 06/27/2022] [Indexed: 01/13/2023]
Abstract
Since the discovery of transposons, their sheer abundance in host genomes has puzzled many. While historically viewed as largely harmless 'parasitic' DNAs during evolution, transposons are not a mere record of ancient genome invasion. Instead, nearly every element of transposon biology has been integrated into host biology. Here we review how host genome sequences introduced by transposon activities provide raw material for genome innovation and document the distinct evolutionary path of each species.
Collapse
Affiliation(s)
- Andrew J Modzelewski
- Division of Cellular and Developmental Biology, MCB Department, University of California, Berkeley, CA, USA
- Department of Biomedical Sciences, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Johnny Gan Chong
- Division of Cellular and Developmental Biology, MCB Department, University of California, Berkeley, CA, USA
| | - Ting Wang
- Department of Genetics, Edison Family Center for Genome Science and System Biology, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Lin He
- Division of Cellular and Developmental Biology, MCB Department, University of California, Berkeley, CA, USA.
| |
Collapse
|
40
|
Zhou Y, Yang L, Han X, Han J, Hu Y, Li F, Xia H, Peng L, Boschiero C, Rosen BD, Bickhart DM, Zhang S, Guo A, Van Tassell CP, Smith TPL, Yang L, Liu GE. Assembly of a pangenome for global cattle reveals missing sequences and novel structural variations, providing new insights into their diversity and evolutionary history. Genome Res 2022; 32:1585-1601. [PMID: 35977842 PMCID: PMC9435747 DOI: 10.1101/gr.276550.122] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 07/21/2022] [Indexed: 02/03/2023]
Abstract
A cattle pangenome representation was created based on the genome sequences of 898 cattle representing 57 breeds. The pangenome identified 83 Mb of sequence not found in the cattle reference genome, representing 3.1% novel sequence compared with the 2.71-Gb reference. A catalog of structural variants developed from this cattle population identified 3.3 million deletions, 0.12 million inversions, and 0.18 million duplications. Estimates of breed ancestry and hybridization between cattle breeds using insertion/deletions as markers were similar to those produced by single nucleotide polymorphism-based analysis. Hundreds of deletions were observed to have stratification based on subspecies and breed. For example, an insertion of a Bov-tA1 repeat element was identified in the first intron of the APPL2 gene and correlated with cattle breed geographic distribution. This insertion falls within a segment overlapping predicted enhancer and promoter regions of the gene, and could affect important traits such as immune response, olfactory functions, cell proliferation, and glucose metabolism in muscle. The results indicate that pangenomes are a valuable resource for studying diversity and evolutionary history, and help to delineate how domestication, trait-based breeding, and adaptive introgression have shaped the cattle genome.
Collapse
Affiliation(s)
- Yang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Lv Yang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiaotao Han
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Jiazheng Han
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Yan Hu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Fan Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Han Xia
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Lingwei Peng
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Clarissa Boschiero
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, Maryland 20705, USA
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, Maryland 20705, USA
| | - Derek M Bickhart
- Dairy Forage Research Center, ARS USDA, Madison, Wisconsin 53706, USA
| | - Shujun Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Aizhen Guo
- The State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, Maryland 20705, USA
| | - Timothy P L Smith
- U.S. Meat Animal Research Center, ARS USDA, Clay Center, Nebraska 68933, USA
| | - Liguo Yang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - George E Liu
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, Maryland 20705, USA
| |
Collapse
|
41
|
Löytynoja A. Thousands of human mutation clusters are explained by short-range template switching. Genome Res 2022; 32:1437-1447. [PMID: 35760560 PMCID: PMC9435742 DOI: 10.1101/gr.276478.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 06/21/2022] [Indexed: 02/03/2023]
Abstract
Variation within human genomes is unevenly distributed, and variants show spatial clustering. DNA replication-related template switching is a poorly known mutational mechanism capable of causing major chromosomal rearrangements as well as creating short inverted sequence copies that appear as local mutation clusters in sequence comparisons. In this study, haplotype-resolved genome assemblies representing 25 human populations and multinucleotide variants aggregated from 140,000 human sequencing experiments were reanalyzed. Local template switching could explain thousands of complex mutation clusters across the human genome, the loci segregating within and between populations. During the study, computational tools were developed for identification of template switch events using both short-read sequencing data and genotype data, and for genotyping candidate loci using short-read data. The characteristics of template-switch mutations complicate their detection, and widely used analysis pipelines for short-read sequencing data, normally capable of identifying single nucleotide changes, were found to miss template-switch mutations of tens of base pairs, potentially invalidating medical genetic studies searching for a causative allele behind genetic diseases. Combined with the massive sequencing data now available for humans, the novel tools described here enable building catalogs of affected loci and studying the cellular mechanisms behind template switching in both healthy organisms and disease.
Collapse
Affiliation(s)
- Ari Löytynoja
- Institute of Biotechnology, University of Helsinki, FI-00014 Helsinki, Finland
| |
Collapse
|
42
|
Doyle JJ. Cell types as species: Exploring a metaphor. FRONTIERS IN PLANT SCIENCE 2022; 13:868565. [PMID: 36072310 PMCID: PMC9444152 DOI: 10.3389/fpls.2022.868565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 07/29/2022] [Indexed: 06/05/2023]
Abstract
The concept of "cell type," though fundamental to cell biology, is controversial. Cells have historically been classified into types based on morphology, physiology, or location. More recently, single cell transcriptomic studies have revealed fine-scale differences among cells with similar gross phenotypes. Transcriptomic snapshots of cells at various stages of differentiation, and of cells under different physiological conditions, have shown that in many cases variation is more continuous than discrete, raising questions about the relationship between cell type and cell state. Some researchers have rejected the notion of fixed types altogether. Throughout the history of discussions on cell type, cell biologists have compared the problem of defining cell type with the interminable and often contentious debate over the definition of arguably the most important concept in systematics and evolutionary biology, "species." In the last decades, systematics, like cell biology, has been transformed by the increasing availability of molecular data, and the fine-grained resolution of genetic relationships have generated new ideas about how that variation should be classified. There are numerous parallels between the two fields that make exploration of the "cell types as species" metaphor timely. These parallels begin with philosophy, with discussion of both cell types and species as being either individuals, groups, or something in between (e.g., homeostatic property clusters). In each field there are various different types of lineages that form trees or networks that can (and in some cases do) provide criteria for grouping. Developing and refining models for evolutionary divergence of species and for cell type differentiation are parallel goals of the two fields. The goal of this essay is to highlight such parallels with the hope of inspiring biologists in both fields to look for new solutions to similar problems outside of their own field.
Collapse
Affiliation(s)
- Jeff J. Doyle
- Section of Plant Biology and Section of Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
43
|
Resende CMM, Muniz MTC. Hardy-Weinberg Disequilibrium: could it bring conclusive results in studies that link genetic markers and risk to diseases development? Nutrition 2022; 103-104:111792. [DOI: 10.1016/j.nut.2022.111792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 06/14/2022] [Accepted: 06/27/2022] [Indexed: 11/26/2022]
|
44
|
A complete, telomere-to-telomere human genome sequence presents new opportunities for evolutionary genomics. Nat Methods 2022; 19:635-638. [PMID: 35689027 DOI: 10.1038/s41592-022-01512-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
45
|
Quan C, Lu H, Lu Y, Zhou G. Population-scale genotyping of structural variation in the era of long-read sequencing. Comput Struct Biotechnol J 2022; 20:2639-2647. [PMID: 35685364 PMCID: PMC9163579 DOI: 10.1016/j.csbj.2022.05.047] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 05/24/2022] [Accepted: 05/24/2022] [Indexed: 11/29/2022] Open
Abstract
Population-scale studies of structural variation (SV) are growing rapidly worldwide with the development of long-read sequencing technology, yielding a considerable number of novel SVs and complete gap-closed genome assemblies. Herein, we highlight recent studies using a hybrid sequencing strategy and present the challenges toward large-scale genotyping for SVs due to the reference bias. Genotyping SVs at a population scale remains challenging, which severely impacts genotype-based population genetic studies or genome-wide association studies of complex diseases. We summarize academic efforts to improve genotype quality through linear or graph representations of reference and alternative alleles. Graph-based genotypers capable of integrating diverse genetic information are effectively applied to large and diverse cohorts, contributing to unbiased downstream analysis. Meanwhile, there is still an urgent need in this field for efficient tools to construct complex graphs and perform sequence-to-graph alignments.
Collapse
Affiliation(s)
- Cheng Quan
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
| | - Hao Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
| | - Yiming Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
- Hebei University, Baoding, Hebei Province 071002, PR China
| | - Gangqiao Zhou
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
- Collaborative Innovation Center for Personalized Cancer Medicine, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu Province 211166, PR China
- Medical College of Guizhou University, Guiyang, Guizhou Province 550025, PR China
- Hebei University, Baoding, Hebei Province 071002, PR China
| |
Collapse
|
46
|
Brand CM, Colbran LL, Capra JA. Predicting Archaic Hominin Phenotypes from Genomic Data. Annu Rev Genomics Hum Genet 2022; 23:591-612. [PMID: 35440148 DOI: 10.1146/annurev-genom-111521-121903] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Ancient DNA provides a powerful window into the biology of extant and extinct species, including humans' closest relatives: Denisovans and Neanderthals. Here, we review what is known about archaic hominin phenotypes from genomic data and how those inferences have been made. We contend that understanding the influence of variants on lower-level molecular phenotypes-such as gene expression and protein function-is a promising approach to using ancient DNA to learn about archaic hominin traits. Molecular phenotypes have simpler genetic architectures than organism-level complex phenotypes, and this approach enables moving beyond association studies by proposing hypotheses about the effects of archaic variants that are testable in model systems. The major challenge to understanding archaic hominin phenotypes is broadening our ability to accurately map genotypes to phenotypes, but ongoing advances ensure that there will be much more to learn about archaic hominin phenotypes from their genomes. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 23 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Colin M Brand
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA; , .,Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA
| | - Laura L Colbran
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - John A Capra
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA; , .,Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA
| |
Collapse
|
47
|
Abstract
A near-complete sequence outlines a path for a more inclusive reference.
Collapse
|
48
|
Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, Cox E, Davidson C, Ermolaeva O, Farrell CM, Fatima R, Gil L, Goldfarb T, Gonzalez JM, Haddad D, Hardy M, Hunt T, Jackson J, Joardar VS, Kay M, Kodali VK, McGarvey KM, McMahon A, Mudge JM, Murphy DN, Murphy MR, Rajput B, Rangwala SH, Riddick LD, Thibaud-Nissen F, Threadgold G, Vatsan AR, Wallin C, Webb D, Flicek P, Birney E, Pruitt KD, Frankish A, Cunningham F, Murphy TD. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 2022; 604:310-315. [PMID: 35388217 PMCID: PMC9007741 DOI: 10.1038/s41586-022-04558-8] [Citation(s) in RCA: 218] [Impact Index Per Article: 72.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 02/07/2022] [Indexed: 12/25/2022]
Abstract
Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.
Collapse
Affiliation(s)
- Joannella Morales
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Alex Astashyn
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Eric Cox
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Olga Ermolaeva
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Catherine M Farrell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Laurent Gil
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Tamara Goldfarb
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Jose M Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - John Jackson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Vinita S Joardar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Michael Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Vamsi K Kodali
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kelly M McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Aoife McMahon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Daniel N Murphy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Michael R Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Bhanu Rajput
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Sanjida H Rangwala
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Lillian D Riddick
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Glen Threadgold
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Anjana R Vatsan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Craig Wallin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David Webb
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
49
|
Gershman A, Sauria MEG, Guitart X, Vollger MR, Hook PW, Hoyt SJ, Jain M, Shumate A, Razaghi R, Koren S, Altemose N, Caldas GV, Logsdon GA, Rhie A, Eichler EE, Schatz MC, O'Neill RJ, Phillippy AM, Miga KH, Timp W. Epigenetic patterns in a complete human genome. Science 2022; 376:eabj5089. [PMID: 35357915 PMCID: PMC9170183 DOI: 10.1126/science.abj5089] [Citation(s) in RCA: 140] [Impact Index Per Article: 46.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.
Collapse
Affiliation(s)
- Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Michael E G Sauria
- Department of Biology and Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Savannah J Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Roham Razaghi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA
| | - Gina V Caldas
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley CA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Michael C Schatz
- Department of Biology and Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
50
|
Altemose N, Glennis A, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, Hoyt SJ, Uralsky L, Ryabov FD, Shew CJ, Sauria MEG, Borchers M, Gershman A, Mikheenko A, Shepelev VA, Dvorkina T, Kunyavskaya O, Vollger MR, Rhie A, McCartney AM, Asri M, Lorig-Roach R, Shafin K, Aganezov S, Olson D, de Lima LG, Potapova T, Hartley GA, Haukness M, Kerpedjiev P, Gusev F, Tigyi K, Brooks S, Young A, Nurk S, Koren S, Salama SR, Paten B, Rogaev EI, Streets A, Karpen GH, Dernburg AF, Sullivan BA, Straight AF, Wheeler TJ, Gerton JL, Eichler EE, Phillippy AM, Timp W, Dennis MY, O'Neill RJ, Zook JM, Schatz MC, Pevzner PA, Diekhans M, Langley CH, Alexandrov IA, Miga KH. Complete genomic and epigenetic maps of human centromeres. Science 2022; 376:eabl4178. [PMID: 35357911 PMCID: PMC9233505 DOI: 10.1126/science.abl4178] [Citation(s) in RCA: 242] [Impact Index Per Article: 80.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.
Collapse
Affiliation(s)
- Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - A. Glennis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pragya Sidhwani
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Sasha A. Langley
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Lev Uralsky
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | | | | | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | | | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | | | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Fedor Gusev
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R. Salama
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School, Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gary H. Karpen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- BioEngineering and BioMedical Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Abby F. Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Institute for Quantitative Biosciences (QB3), University of California, Berkeley, Berkeley, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA
| | | | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical School, Department of Biochemistry and Molecular Biology and Cancer Center, University of Kansas, Kansas City, KS, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | - Rachel J. O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, San Diego, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Ivan A. Alexandrov
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| |
Collapse
|