1
|
Achom M, Sadagopan A, Bao C, McBride F, Li J, Konda P, Tourdot RW, Xu Q, Nakhoul M, Gallant DS, Ahmed UA, O'Toole J, Freeman D, Lee GSM, Hecht JL, Kauffman EC, Einstein DJ, Choueiri TK, Zhang CZ, Viswanathan SR. A genetic basis for sex differences in Xp11 translocation renal cell carcinoma. Cell 2024; 187:5735-5752.e25. [PMID: 39168126 PMCID: PMC11455617 DOI: 10.1016/j.cell.2024.07.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 06/21/2024] [Accepted: 07/23/2024] [Indexed: 08/23/2024]
Abstract
Xp11 translocation renal cell carcinoma (tRCC) is a rare, female-predominant cancer driven by a fusion between the transcription factor binding to IGHM enhancer 3 (TFE3) gene on chromosome Xp11.2 and a partner gene on either chromosome X (chrX) or an autosome. It remains unknown what types of rearrangements underlie TFE3 fusions, whether fusions can arise from both the active (chrXa) and inactive X (chrXi) chromosomes, and whether TFE3 fusions from chrXi translocations account for the female predominance of tRCC. To address these questions, we performed haplotype-specific analyses of chrX rearrangements in tRCC whole genomes. We show that TFE3 fusions universally arise as reciprocal translocations and that oncogenic TFE3 fusions can arise from chrXi:autosomal translocations. Female-specific chrXi:autosomal translocations result in a 2:1 female-to-male ratio of TFE3 fusions involving autosomal partner genes and account for the female predominance of tRCC. Our results highlight how X chromosome genetics constrains somatic chrX alterations and underlies cancer sex differences.
Collapse
Affiliation(s)
- Mingkee Achom
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Medicine, Harvard Medical School, Boston, MA 02215, USA
| | - Ananthan Sadagopan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Chunyang Bao
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02215, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Fiona McBride
- Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School, Boston, MA 02215, USA
| | - Jiao Li
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Medicine, Harvard Medical School, Boston, MA 02215, USA
| | - Prathyusha Konda
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Medicine, Harvard Medical School, Boston, MA 02215, USA
| | - Richard W Tourdot
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School, Boston, MA 02215, USA
| | - Qingru Xu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Maria Nakhoul
- Department of Informatics & Analytics, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Daniel S Gallant
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Usman Ali Ahmed
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Jillian O'Toole
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Dory Freeman
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Gwo-Shu Mary Lee
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Jonathan L Hecht
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Eric C Kauffman
- Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14203, USA
| | - David J Einstein
- Division of Medical Oncology, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Toni K Choueiri
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Medicine, Harvard Medical School, Boston, MA 02215, USA; Department of Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA
| | - Cheng-Zhong Zhang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02215, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Srinivas R Viswanathan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Medicine, Harvard Medical School, Boston, MA 02215, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA.
| |
Collapse
|
2
|
Sharp JA, Sparago E, Thomas R, Alimenti K, Wang W, Blower MD. Role of the SAF-A SAP domain in X inactivation, transcription, splicing, and cell proliferation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.09.612041. [PMID: 39314300 PMCID: PMC11419091 DOI: 10.1101/2024.09.09.612041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
SAF-A is conserved throughout vertebrates and has emerged as an important factor regulating a multitude of nuclear functions, including lncRNA localization, gene expression, and splicing. SAF-A has several functional domains, including an N-terminal SAP domain that binds directly to DNA. Phosphorylation of SAP domain serines S14 and S26 are important for SAF-A localization and function during mitosis, however whether these serines are involved in interphase functions of SAF-A is not known. In this study we tested for the role of the SAP domain, and SAP domain serines S14 and S26 in X chromosome inactivation, protein dynamics, gene expression, splicing, and cell proliferation. Here we show that the SAP domain serines S14 and S26 are required to maintain XIST RNA localization and polycomb-dependent histone modifications on the inactive X chromosome in female cells. In addition, we present evidence that an Xi localization signal resides in the SAP domain. We found that that the SAP domain is not required to maintain gene expression and plays only a minor role in mRNA splicing. In contrast, the SAF-A SAP domain, in particular serines S14 and S26, are required for normal protein dynamics, and to maintain normal cell proliferation. We propose a model whereby dynamic phosphorylation of SAF-A serines S14 and S26 mediates rapid turnover of SAF-A interactions with DNA during interphase.
Collapse
Affiliation(s)
- Judith A. Sharp
- Department of Biochemistry and Cell Biology, Chobanian and Avedisian School of Medicine, Boston University, 72 E. Concord St, K112, Boston, MA 02118
| | - Emily Sparago
- Department of Biochemistry and Cell Biology, Chobanian and Avedisian School of Medicine, Boston University, 72 E. Concord St, K112, Boston, MA 02118
| | - Rachael Thomas
- Department of Biochemistry and Cell Biology, Chobanian and Avedisian School of Medicine, Boston University, 72 E. Concord St, K112, Boston, MA 02118
| | - Kaitlyn Alimenti
- Department of Biochemistry and Cell Biology, Chobanian and Avedisian School of Medicine, Boston University, 72 E. Concord St, K112, Boston, MA 02118
| | - Wei Wang
- Department of Biochemistry and Cell Biology, Chobanian and Avedisian School of Medicine, Boston University, 72 E. Concord St, K112, Boston, MA 02118
| | - Michael D. Blower
- Department of Biochemistry and Cell Biology, Chobanian and Avedisian School of Medicine, Boston University, 72 E. Concord St, K112, Boston, MA 02118
| |
Collapse
|
3
|
Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet 2024; 25:658-670. [PMID: 38649458 DOI: 10.1038/s41576-024-00718-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/25/2024]
Abstract
Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly - the process of reconstructing the genome sequence of an organism from sequencing reads - has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best, but now technological advances in long-read sequencing enable the near-complete assembly of each chromosome - also known as telomere-to-telomere assembly - for many organisms. Here, we review recent progress on assembly algorithms and protocols, with a focus on how to derive near-telomere-to-telomere assemblies. We also discuss the additional developments that will be required to resolve remaining assembly gaps and to assemble non-diploid genomes.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Richard Durbin
- Department of Genetics, Cambridge University, Cambridge, UK.
| |
Collapse
|
4
|
Zhang CZ, Pellman D. Chromosome breakage-replication/fusion enables rapid DNA amplification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.17.608415. [PMID: 39229211 PMCID: PMC11370323 DOI: 10.1101/2024.08.17.608415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
DNA rearrangements are thought to arise from two classes of processes. The first class involves DNA breakage and fusion ("cut-and-paste") without net DNA gain or loss. The second class involves aberrant DNA replication ("copy-and-paste") and can produce either net DNA gain or loss. We previously demonstrated that the partitioning of chromosomes into aberrant structures of the nucleus, micronuclei or chromosome bridges, can generate cut-and-paste rearrangements by chromosome fragmentation and ligation. Surprisingly, in the progeny clones of single cells that have undergone chromosome bridge breakage, we identified large segmental duplications and short sequence insertions that are commonly attributed to copy-and-paste processes. Here, we demonstrate that both large duplications and short insertions are inherent outcomes of the replication and fusion of unligated DNA ends, a process we term breakage-replication/fusion (B-R/F). We propose that B-R/F provides a unifying explanation for complex rearrangement patterns including chromothripsis and chromoanasynthesis and enables rapid DNA amplification after chromosome fragmentation.
Collapse
|
5
|
Xiang Z, Liu Z, Dinh KN. Inference of chromosome selection parameters and missegregation rate in cancer from DNA-sequencing data. Sci Rep 2024; 14:17699. [PMID: 39085295 PMCID: PMC11291923 DOI: 10.1038/s41598-024-67842-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/16/2024] [Indexed: 08/02/2024] Open
Abstract
Aneuploidy is frequently observed in cancers and has been linked to poor patient outcome. Analysis of aneuploidy in DNA-sequencing (DNA-seq) data necessitates untangling the effects of the Copy Number Aberration (CNA) occurrence rates and the selection coefficients that act upon the resulting karyotypes. We introduce a parameter inference algorithm that takes advantage of both bulk and single-cell DNA-seq cohorts. The method is based on Approximate Bayesian Computation (ABC) and utilizes CINner, our recently introduced simulation algorithm of chromosomal instability in cancer. We examine three groups of statistics to summarize the data in the ABC routine: (A) Copy Number-based measures, (B) phylogeny tip statistics, and (C) phylogeny balance indices. Using these statistics, our method can recover both the CNA probabilities and selection parameters from ground truth data, and performs well even for data cohorts of relatively small sizes. We find that only statistics in groups A and C are well-suited for identifying CNA probabilities, and only group A carries the signals for estimating selection parameters. Moreover, the low number of CNA events at large scale compared to cell counts in single-cell samples means that statistics in group B cannot be estimated accurately using phylogeny reconstruction algorithms at the chromosome level. As data from both bulk and single-cell DNA-sequencing techniques becomes increasingly available, our inference framework promises to facilitate the analysis of distinct cancer types, differentiation between selection and neutral drift, and prediction of cancer clonal dynamics.
Collapse
Affiliation(s)
- Zijin Xiang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA
| | - Zhihan Liu
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA
| | - Khanh N Dinh
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA.
| |
Collapse
|
6
|
Chen Y, Huang JH, Sun Y, Zhang Y, Li Y, Xu X. Haplotype-resolved assembly of diploid and polyploid genomes using quantum computing. CELL REPORTS METHODS 2024; 4:100754. [PMID: 38614089 PMCID: PMC11133727 DOI: 10.1016/j.crmeth.2024.100754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 01/03/2024] [Accepted: 03/20/2024] [Indexed: 04/15/2024]
Abstract
Precision medicine's emphasis on individual genetic variants highlights the importance of haplotype-resolved assembly, a computational challenge in bioinformatics given its combinatorial nature. While classical algorithms have made strides in addressing this issue, the potential of quantum computing remains largely untapped. Here, we present the vehicle routing problem (VRP) assembler: an approach that transforms this task into a vehicle routing problem, an optimization formulation solvable on a quantum computer. We demonstrate its potential and feasibility through a proof of concept on short synthetic diploid and triploid genomes using a D-Wave quantum annealer. To tackle larger-scale assembly problems, we integrate the VRP assembler with Google's OR-Tools, achieving a haplotype-resolved local assembly across the human major histocompatibility complex (MHC) region. Our results show encouraging performance compared to Hifiasm with phasing accuracy approaching the theoretical limit, underscoring the promising future of quantum computing in bioinformatics.
Collapse
Affiliation(s)
- Yibo Chen
- BGI Research, Shenzhen 518083, China
| | | | - Yuhui Sun
- BGI Research, Shenzhen 518083, China
| | - Yong Zhang
- BGI Research, Wuhan 430047, China; Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen 518083, China.
| | - Yuxiang Li
- BGI Research, Wuhan 430047, China; Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen 518083, China.
| | - Xun Xu
- BGI Research, Shenzhen 518083, China; BGI Research, Wuhan 430047, China.
| |
Collapse
|
7
|
Volpe E, Corda L, Tommaso ED, Pelliccia F, Ottalevi R, Licastro D, Guarracino A, Capulli M, Formenti G, Tassone E, Giunta S. The complete diploid reference genome of RPE-1 identifies human phased epigenetic landscapes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.01.565049. [PMID: 38168337 PMCID: PMC10760208 DOI: 10.1101/2023.11.01.565049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Comparative analysis of recent human genome assemblies highlights profound sequence divergence that peaks within polymorphic loci such as centromeres. This raises the question about the adequacy of relying on human reference genomes to accurately analyze sequencing data derived from experimental cell lines. Here, we generated the complete diploid genome assembly for the human retinal epithelial cells (RPE-1), a widely used non-cancer laboratory cell line with a stable karyotype, to use as matched reference for multi-omics sequencing data analysis. Our RPE1v1.0 assembly presents completely phased haplotypes and chromosome-level scaffolds that span centromeres with ultra-high base accuracy (>QV60). We mapped the haplotype-specific genomic variation specific to this cell line including t(Xq;10q), a stable 73.18 Mb duplication of chromosome 10 translocated onto the microdeleted chromosome X telomere t(Xq;10q). Polymorphisms between haplotypes of the same genome reveals genetic and epigenetic variation for all chromosomes, especially at centromeres. The RPE-1 assembly as matched reference genome improves mapping quality of multi-omics reads originating from RPE-1 cells with drastic reduction in alignments mismatches compared to using the most complete human reference to date (CHM13). Leveraging the accuracy achieved using a matched reference, we were able to identify the kinetochore sites at base pair resolution and show unprecedented variation between haplotypes. This work showcases the use of matched reference genomes for multiomics analyses and serves as the foundation for a call to comprehensively assemble experimentally relevant cell lines for widespread application.
Collapse
Affiliation(s)
- Emilia Volpe
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Luca Corda
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Elena Di Tommaso
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Franca Pelliccia
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Riccardo Ottalevi
- Department of Bioinformatic, Dante Genomics Corp Inc., 667 Madison Avenue, New York, NY 10065 USA and S.s.17, 67100, L’Aquila, Italy
| | | | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Mattia Capulli
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L’Aquila, Italy
| | - Giulio Formenti
- The Rockefeller University, 1230 York Avenue, 10065 New York, USA
| | - Evelyne Tassone
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Simona Giunta
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| |
Collapse
|
8
|
Höjer P, Frick T, Siga H, Pourbozorgi P, Aghelpasand H, Martin M, Ahmadian A. BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies. Nucleic Acids Res 2023; 51:e114. [PMID: 37941142 PMCID: PMC10711428 DOI: 10.1093/nar/gkad1010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 10/04/2023] [Accepted: 10/18/2023] [Indexed: 11/10/2023] Open
Abstract
Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. We introduce Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10× Genomics, TELL-seq and stLFR. Running BLR on DBS linked-reads yielded megabase-scale phasing with low (<0.2%) switch error rates. Of 13616 protein-coding genes phased in the GIAB benchmark set (v4.2.1), 98.6% matched the BLR phasing. In addition, large structural variants showed concordance with HPRC-HG002 reference assembly calls. Compared to diploid assembly with PacBio HiFi reads, BLR phasing was more continuous when considering switch errors. We further show that integrating long reads at low coverage (∼10×) can improve phasing contiguity and reduce switch errors in tandem repeats. When compared to Long Ranger on 10× Genomics data, BLR showed an increase in phase block N50 with low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications. In conclusion, BLR provides a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.
Collapse
Affiliation(s)
- Pontus Höjer
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Tobias Frick
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Humam Siga
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Parham Pourbozorgi
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Hooman Aghelpasand
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Marcel Martin
- Stockholm University, Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Afshin Ahmadian
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| |
Collapse
|
9
|
Bao C, Tourdot RW, Brunette GJ, Stewart C, Sun L, Baba H, Watanabe M, Agoston AT, Jajoo K, Davison JM, Nason KS, Getz G, Wang KK, Imamura Y, Odze R, Bass AJ, Stachler MD, Zhang CZ. Genomic signatures of past and present chromosomal instability in Barrett's esophagus and early esophageal adenocarcinoma. Nat Commun 2023; 14:6203. [PMID: 37794034 PMCID: PMC10550953 DOI: 10.1038/s41467-023-41805-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 09/18/2023] [Indexed: 10/06/2023] Open
Abstract
The progression of precancerous lesions to malignancy is often accompanied by increasing complexity of chromosomal alterations but how these alterations arise is poorly understood. Here we perform haplotype-specific analysis of chromosomal copy-number evolution in the progression of Barrett's esophagus (BE) to esophageal adenocarcinoma (EAC) on multiregional whole-genome sequencing data of BE with dysplasia and microscopic EAC foci. We identify distinct patterns of copy-number evolution indicating multigenerational chromosomal instability that is initiated by cell division errors but propagated only after p53 loss. While abnormal mitosis, including whole-genome duplication, underlies chromosomal copy-number changes, segmental alterations display signatures of successive breakage-fusion-bridge cycles and chromothripsis of unstable dicentric chromosomes. Our analysis elucidates how multigenerational chromosomal instability generates copy-number variation in BE cells, precipitates complex alterations including DNA amplifications, and promotes their independent clonal expansion and transformation. In particular, we suggest sloping copy-number variation as a signature of ongoing chromosomal instability that precedes copy-number complexity. These findings suggest copy-number heterogeneity in advanced cancers originates from chromosomal instability in precancerous cells and such instability may be identified from the presence of sloping copy-number variation in bulk sequencing data.
Collapse
Affiliation(s)
- Chunyang Bao
- Department of Medical Oncology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA
- Department of Pathology, Brigham and Women's Hospital, 75 Francis St, Boston, MA, 02115, USA
- Cancer Program, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA
| | - Richard W Tourdot
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA
- Cancer Program, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA
- Department of Biomedical Informatics, Blavatnik Institute of Harvard Medical School, 10 Shattuck St, Boston, MA, 02115, USA
| | - Gregory J Brunette
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA
- Department of Biomedical Informatics, Blavatnik Institute of Harvard Medical School, 10 Shattuck St, Boston, MA, 02115, USA
| | - Chip Stewart
- Cancer Program, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA
| | - Lili Sun
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA
- Single-Cell Sequencing Program, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA
| | - Hideo Baba
- Department of Gastroenterological Surgery, Graduate School of Medical Sciences, Kumamoto University, 2 Chome-40-1 Kurokami, Chuo Ward, Kumamoto, Japan
| | - Masayuki Watanabe
- Department of Gastroenterological Surgery, Cancer Institute Hospital of Japanese Foundation of Cancer Research, 3-8-31 Ariake, Koto, Tokyo, Japan
| | - Agoston T Agoston
- Department of Pathology, Brigham and Women's Hospital, 75 Francis St, Boston, MA, 02115, USA
| | - Kunal Jajoo
- Division of Gastroenterology, Department of Medicine, Brigham and Women's Hospital, 75 Francis St, Boston, MA, 02115, USA
| | - Jon M Davison
- Department of Pathology, University of Pittsburgh School of Medicine, 200 Lothrop Street, Pittsburgh, PA, 15213, USA
| | - Katie S Nason
- Department of Surgery, Baystate Medical Center, University of Massachusetts Medical School, 759 Chestnut St, Springfield, MA, 01107, USA
| | - Gad Getz
- Department of Pathology, Brigham and Women's Hospital, 75 Francis St, Boston, MA, 02115, USA
| | - Kenneth K Wang
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA
| | - Yu Imamura
- Department of Gastroenterological Surgery, Cancer Institute Hospital of Japanese Foundation of Cancer Research, 3-8-31 Ariake, Koto, Tokyo, Japan
| | - Robert Odze
- Department of Pathology, Brigham and Women's Hospital, 75 Francis St, Boston, MA, 02115, USA
- Department of Pathology and Lab Medicine, Tufts University School of Medicine, 145 Harrison Ave, Boston, MA, 02111, USA
| | - Adam J Bass
- Department of Medical Oncology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA.
- Cancer Program, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA.
- Novartis Institutes for Biomedical Research, Cambridge, MA, USA.
| | - Matthew D Stachler
- Department of Medical Oncology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA.
- Department of Pathology, Brigham and Women's Hospital, 75 Francis St, Boston, MA, 02115, USA.
- Cancer Program, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA.
- Department of Pathology, University of California, San Francisco. 513 Parnassus Ave, San Francisco, CA, 94143, USA.
| | - Cheng-Zhong Zhang
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA.
- Department of Pathology, Brigham and Women's Hospital, 75 Francis St, Boston, MA, 02115, USA.
- Cancer Program, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA.
| |
Collapse
|
10
|
Li H, Durbin R. Genome assembly in the telomere-to-telomere era. ARXIV 2023:arXiv:2308.07877v1. [PMID: 37645045 PMCID: PMC10462168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
De novo assembly is the process of reconstructing the genome sequence of an organism from sequencing reads. Genome sequences are essential to biology, and assembly has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best but technological advances in long-read sequencing now enable near complete chromosome-level assembly, also known as telomere-to-telomere assembly, for many organisms. Here we review recent progress on assembly algorithms and protocols. We focus on how to derive near telomere-to-telomere assemblies and discuss potential future developments.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard Durbin
- Department of Genetics, Cambridge University, Cambridge, UK
| |
Collapse
|
11
|
Achom M, Sadagopan A, Bao C, McBride F, Xu Q, Konda P, Tourdot RW, Li J, Nakhoul M, Gallant DS, Ahmed UA, O’Toole J, Freeman D, Mary Lee GS, Hecht JL, Kauffman EC, Einstein DJ, Choueiri TK, Zhang CZ, Viswanathan SR. A genetic basis for cancer sex differences revealed in Xp11 translocation renal cell carcinoma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.04.552029. [PMID: 37577497 PMCID: PMC10418269 DOI: 10.1101/2023.08.04.552029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Xp11 translocation renal cell carcinoma (tRCC) is a female-predominant kidney cancer driven by translocations between the TFE3 gene on chromosome Xp11.2 and partner genes located on either chrX or on autosomes. The rearrangement processes that underlie TFE3 fusions, and whether they are linked to the female sex bias of this cancer, are largely unexplored. Moreover, whether oncogenic TFE3 fusions arise from both the active and inactive X chromosomes in females remains unknown. Here we address these questions by haplotype-specific analyses of whole-genome sequences of 29 tRCC samples from 15 patients and by re-analysis of 145 published tRCC whole-exome sequences. We show that TFE3 fusions universally arise as reciprocal translocations with minimal DNA loss or insertion at paired break ends. Strikingly, we observe a near exact 2:1 female:male ratio in TFE3 fusions arising via X:autosomal translocation (but not via X inversion), which accounts for the female predominance of tRCC. This 2:1 ratio is at least partially attributable to oncogenic fusions involving the inactive X chromosome and is accompanied by partial re-activation of silenced chrX genes on the rearranged chromosome. Our results highlight how somatic alterations involving the X chromosome place unique constraints on tumor initiation and exemplify how genetic rearrangements of the sex chromosomes can underlie cancer sex differences.
Collapse
Affiliation(s)
- Mingkee Achom
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Data Science, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Medicine, Harvard Medical School; Boston, MA, USA
| | - Ananthan Sadagopan
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
| | - Chunyang Bao
- Department of Data Science, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Pathology, Brigham and Women’s Hospital; Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard; Cambridge, MA, USA
| | - Fiona McBride
- Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School; Boston, MA, USA
| | - Qingru Xu
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Data Science, Dana-Farber Cancer Institute; Boston, MA, USA
| | - Prathyusha Konda
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Medicine, Harvard Medical School; Boston, MA, USA
| | - Richard W. Tourdot
- Department of Data Science, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School; Boston, MA, USA
| | - Jiao Li
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Medicine, Harvard Medical School; Boston, MA, USA
| | - Maria Nakhoul
- Department of Informatics & Analytics, Dana-Farber Cancer Institute; Boston, MA, USA
| | - Daniel S. Gallant
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
| | - Usman Ali Ahmed
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
| | - Jillian O’Toole
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
| | - Dory Freeman
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
| | - Gwo-Shu Mary Lee
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
| | - Jonathan L. Hecht
- Department of Pathology, Beth Israel Deaconess Medical Center; Boston, MA, USA
| | - Eric C Kauffman
- Department of Urology, Roswell Park Comprehensive Cancer Center; Buffalo, New York, USA
| | - David J Einstein
- Division of Medical Oncology, Beth Israel Deaconess Medical Center; Boston, MA, USA
| | - Toni K. Choueiri
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Medicine, Harvard Medical School; Boston, MA, USA
- Department of Medicine, Brigham and Women’s Hospital; Boston, MA, USA
| | - Cheng-Zhong Zhang
- Department of Data Science, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Pathology, Brigham and Women’s Hospital; Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard; Cambridge, MA, USA
| | - Srinivas R. Viswanathan
- Department of Medical Oncology, Dana-Farber Cancer Institute; Boston, MA, USA
- Department of Medicine, Harvard Medical School; Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard; Cambridge, MA, USA
- Department of Medicine, Brigham and Women’s Hospital; Boston, MA, USA
| |
Collapse
|
12
|
Papathanasiou S, Mynhier NA, Liu S, Brunette G, Stokasimov E, Jacob E, Li L, Comenho C, van Steensel B, Buenrostro JD, Zhang CZ, Pellman D. Heritable transcriptional defects from aberrations of nuclear architecture. Nature 2023; 619:184-192. [PMID: 37286600 PMCID: PMC10322708 DOI: 10.1038/s41586-023-06157-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 05/02/2023] [Indexed: 06/09/2023]
Abstract
Transcriptional heterogeneity due to plasticity of the epigenetic state of chromatin contributes to tumour evolution, metastasis and drug resistance1-3. However, the mechanisms that cause this epigenetic variation are incompletely understood. Here we identify micronuclei and chromosome bridges, aberrations in the nucleus common in cancer4,5, as sources of heritable transcriptional suppression. Using a combination of approaches, including long-term live-cell imaging and same-cell single-cell RNA sequencing (Look-Seq2), we identified reductions in gene expression in chromosomes from micronuclei. With heterogeneous penetrance, these changes in gene expression can be heritable even after the chromosome from the micronucleus has been re-incorporated into a normal daughter cell nucleus. Concomitantly, micronuclear chromosomes acquire aberrant epigenetic chromatin marks. These defects may persist as variably reduced chromatin accessibility and reduced gene expression after clonal expansion from single cells. Persistent transcriptional repression is strongly associated with, and may be explained by, markedly long-lived DNA damage. Epigenetic alterations in transcription may therefore be inherently coupled to chromosomal instability and aberrations in nuclear architecture.
Collapse
Affiliation(s)
- Stamatis Papathanasiou
- Department of Cell Biology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA.
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Institute of Molecular Biology, Mainz, Germany.
| | - Nikos A Mynhier
- Department of Cell Biology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Shiwei Liu
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Gregory Brunette
- Department of Cell Biology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Ema Stokasimov
- Department of Cell Biology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Etai Jacob
- Single-Cell Sequencing Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- AstraZeneca, Waltham, MA, USA
| | - Lanting Li
- Single-Cell Sequencing Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Caroline Comenho
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Bas van Steensel
- Division of Gene Regulation and Oncode Institute, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Jason D Buenrostro
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Cheng-Zhong Zhang
- Single-Cell Sequencing Program, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - David Pellman
- Department of Cell Biology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA.
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Single-Cell Sequencing Program, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
13
|
Gao T, Soldatov R, Sarkar H, Kurkiewicz A, Biederstedt E, Loh PR, Kharchenko PV. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nat Biotechnol 2023; 41:417-426. [PMID: 36163550 PMCID: PMC10289836 DOI: 10.1038/s41587-022-01468-y] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 08/11/2022] [Indexed: 11/09/2022]
Abstract
Genome instability and aberrant alterations of transcriptional programs both play important roles in cancer. Single-cell RNA sequencing (scRNA-seq) has the potential to investigate both genetic and nongenetic sources of tumor heterogeneity in a single assay. Here we present a computational method, Numbat, that integrates haplotype information obtained from population-based phasing with allele and expression signals to enhance detection of copy number variations from scRNA-seq. Numbat exploits the evolutionary relationships between subclones to iteratively infer single-cell copy number profiles and tumor clonal phylogeny. Analysis of 22 tumor samples, including multiple myeloma, gastric, breast and thyroid cancers, shows that Numbat can reconstruct the tumor copy number profile and precisely identify malignant cells in the tumor microenvironment. We identify genetic subpopulations with transcriptional signatures relevant to tumor progression and therapy resistance. Numbat requires neither sample-matched DNA data nor a priori genotyping, and is applicable to a wide range of experimental settings and cancer types.
Collapse
Affiliation(s)
- Teng Gao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Ruslan Soldatov
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Hirak Sarkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Adam Kurkiewicz
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Evan Biederstedt
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Peter V Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Stem Cell Institute, Cambridge, MA, USA.
- Altos Labs, San Diego, CA, USA.
| |
Collapse
|
14
|
Sinha S, Zhang CZ. Determining Complete Chromosomal Haplotypes by mLinker. Methods Mol Biol 2023; 2590:149-159. [PMID: 36335498 DOI: 10.1007/978-1-0716-2819-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Haplotype ("haploid genotype") phase is the combination of genotypes at sites of genetic variation along a chromosome [1]. We previously demonstrated that the complete chromosomal haplotype of diploid human genomes can be determined using molecular linkage from Hi-C sequencing and linked-reads sequencing [2]. In this chapter, we present a step-by-step guide to perform this analysis using mLinker, a software package for haplotype inference.
Collapse
Affiliation(s)
- Sumit Sinha
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA.
| | - Cheng-Zhong Zhang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA.
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
15
|
Lin JH, Chen LC, Yu SC, Huang YT. LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants. Bioinformatics 2022; 38:1816-1822. [PMID: 35104333 DOI: 10.1093/bioinformatics/btac058] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/04/2022] [Accepted: 01/26/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Long-read phasing has been used for reconstructing diploid genomes, improving variant calling and resolving microbial strains in metagenomics. However, the phasing blocks of existing methods are broken by large Structural Variations (SVs), and the efficiency is unsatisfactory for population-scale phasing. RESULTS This article presents a novel algorithm, LongPhase, which can simultaneously phase single nucleotide polymorphisms (SNPs) and SVs of a human genome in 10-20 min, 10× faster than the state-of-the-art WhatsHap, HapCUT2 and Margin. In particular, co-phasing SNPs and SVs produces much larger haplotype blocks (N50 = 25 Mbp) than those of existing methods (N50 = 10-15 Mbp). We show that LongPhase combined with Nanopore ultra-long reads is a cost-effective and highly contiguous solution, which can produce between one and 26 blocks per chromosome arm without the need for additional trios, chromosome-conformation and strand-seq data. AVAILABILITYAND IMPLEMENTATION LongPhase is freely available at https://github.com/twolinin/LongPhase/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jyun-Hong Lin
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621301, Taiwan
| | - Liang-Chi Chen
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621301, Taiwan
| | - Shu-Chi Yu
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621301, Taiwan
| | - Yao-Ting Huang
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621301, Taiwan
| |
Collapse
|
16
|
Cheng H, Jarvis ED, Fedrigo O, Koepfli KP, Urban L, Gemmell NJ, Li H. Haplotype-resolved assembly of diploid genomes without parental data. Nat Biotechnol 2022; 40:1332-1335. [PMID: 35332338 PMCID: PMC9464699 DOI: 10.1038/s41587-022-01261-x] [Citation(s) in RCA: 139] [Impact Index Per Article: 69.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 02/14/2022] [Indexed: 12/29/2022]
Abstract
Routine haplotype-resolved genome assembly from single samples remains an unresolved problem. Here we describe an algorithm that combines PacBio HiFi reads and Hi-C chromatin interaction data to produce a haplotype-resolved assembly without the sequencing of parents. Applied to human and other vertebrate samples, our algorithm consistently outperforms existing single-sample assembly pipelines and generates assemblies of similar quality to the best pedigree-based assemblies.
Collapse
Affiliation(s)
- Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Erich D. Jarvis
- The Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065,Howard Hughes Medical Institute, Chevy Chase, MD, 20815
| | - Olivier Fedrigo
- The Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA,Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, D.C., 20008, USA,ITMO University, Computer Technologies Laboratory, St. Petersburg 197101, Russia
| | - Lara Urban
- Department of Anatomy, University of Otago, Dunedin 9016, New Zealand
| | - Neil J. Gemmell
- Department of Anatomy, University of Otago, Dunedin 9016, New Zealand
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA. .,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
17
|
Somatic structural variant formation is guided by and influences genome architecture. Genome Res 2022; 32:643-655. [PMID: 35177558 PMCID: PMC8997353 DOI: 10.1101/gr.275790.121] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 02/11/2022] [Indexed: 11/25/2022]
Abstract
The occurrence and formation of genomic structural variants (SVs) is known to be influenced by the 3D chromatin architecture, but the extent and magnitude have been challenging to study. Here, we apply Hi-C to study chromatin organization before and after induction of chromothripsis in human cells. We use Hi-C to manually assemble the derivative chromosomes following the occurrence of massive complex rearrangements, which allows us to study the sources of SV formation and their consequences on gene regulation. We observe an action–reaction interplay whereby the 3D chromatin architecture directly impacts the location and formation of SVs. In turn, the SVs reshape the chromatin organization to alter the local topologies, replication timing, and gene regulation in cis. We show that SVs have a strong tendency to occur between similar chromatin compartments and replication timing regions. Moreover, we find that SVs frequently occur at 3D loop anchors, that SVs can cause a switch in chromatin compartments and replication timing, and that this is a major source of SV-mediated effects on nearby gene expression changes. Finally, we provide evidence for a general mechanistic bias of the 3D chromatin on SV occurrence using data from more than 2700 patient-derived cancer genomes.
Collapse
|
18
|
Zverinova S, Guryev V. Variant calling: Considerations, practices, and developments. Hum Mutat 2021; 43:976-985. [PMID: 34882898 PMCID: PMC9545713 DOI: 10.1002/humu.24311] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 11/02/2021] [Accepted: 12/03/2021] [Indexed: 11/10/2022]
Abstract
The success of many clinical, association, or population genetics studies critically relies on properly performed variant calling step. The variety of modern genomics protocols, techniques, and platforms makes our choices of methods and algorithms difficult and there is no "one size fits all" solution for study design and data analysis. In this review, we discuss considerations that need to be taken into account while designing the study and preparing for the experiments. We outline the variety of variant types that can be detected using sequencing approaches and highlight some specific requirements and basic principles of their detection. Finally, we cover interesting developments that enable variant calling for a broad range of applications in the genomics field. We conclude by discussing technological and algorithmic advances that have the potential to change the ways of calling DNA variants in the nearest future.
Collapse
Affiliation(s)
- Stepanka Zverinova
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| |
Collapse
|
19
|
Janzen T, Miró Pina V. Estimating the time since admixture from phased and unphased molecular data. Mol Ecol Resour 2021; 22:908-926. [PMID: 34599646 PMCID: PMC9291888 DOI: 10.1111/1755-0998.13519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/26/2022]
Abstract
After admixture, recombination breaks down genomic blocks of contiguous ancestry. The breakdown of these blocks forms a new “molecular clock” that ticks at a much faster rate than the mutation clock, enabling accurate dating of admixture events in the recent past. However, existing theory on the breakdown of these blocks, or the accumulation of delineations between blocks, so‐called “junctions”, has mostly been limited to using regularly spaced markers on phased data. Here, we present an extension to the theory of junctions using the ancestral recombination graph that describes the expected number of junctions for any distribution of markers along the genome. Furthermore, we provide a new framework to infer the time since admixture using unphased data. We demonstrate both the phased and unphased methods on simulated data and show that our new extensions have improved accuracy with respect to previous methods, especially for smaller population sizes and more ancient admixture times. Lastly, we demonstrate the applicability of our method on three empirical data sets, including labcrosses of yeast (Saccharomyces cerevisae) and two case studies of hybridization in swordtail fish and Populus trees.
Collapse
Affiliation(s)
- Thijs Janzen
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands.,Carl von Ossietzky University, Oldenburg, Germany
| | - Verónica Miró Pina
- Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas (IIMAS), Universidad Nacional Autónoma de México (UNAM), México City, México.,Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|