1
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
2
|
Garg P, Jadhav B, Lee W, Rodriguez OL, Martin-Trujillo A, Sharp AJ. A phenome-wide association study identifies effects of copy-number variation of VNTRs and multicopy genes on multiple human traits. Am J Hum Genet 2022; 109:1065-1076. [PMID: 35609568 PMCID: PMC9247821 DOI: 10.1016/j.ajhg.2022.04.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 04/28/2022] [Indexed: 01/04/2023] Open
Abstract
The human genome contains tens of thousands of large tandem repeats and hundreds of genes that show common and highly variable copy-number changes. Due to their large size and repetitive nature, these variable number tandem repeats (VNTRs) and multicopy genes are generally recalcitrant to standard genotyping approaches and, as a result, this class of variation is poorly characterized. However, several recent studies have demonstrated that copy-number variation of VNTRs can modify local gene expression, epigenetics, and human traits, indicating that many have a functional role. Here, using read depth from whole-genome sequencing to profile copy number, we report results of a phenome-wide association study (PheWAS) of VNTRs and multicopy genes in a discovery cohort of ∼35,000 samples, identifying 32 traits associated with copy number of 38 VNTRs and multicopy genes at 1% FDR. We replicated many of these signals in an independent cohort and observed that VNTRs showing trait associations were significantly enriched for expression QTLs with nearby genes, providing strong support for our results. Fine-mapping studies indicated that in the majority (∼90%) of cases, the VNTRs and multicopy genes we identified represent the causal variants underlying the observed associations. Furthermore, several lie in regions where prior SNV-based GWASs have failed to identify any significant associations with these traits. Our study indicates that copy number of VNTRs and multicopy genes contributes to diverse human traits and suggests that complex structural variants potentially explain some of the so-called "missing heritability" of SNV-based GWASs.
Collapse
Affiliation(s)
- Paras Garg
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - William Lee
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - Oscar L Rodriguez
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - Alejandro Martin-Trujillo
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA.
| |
Collapse
|
3
|
The structure, function and evolution of a complete human chromosome 8. Nature 2021; 593:101-107. [PMID: 33828295 PMCID: PMC8099727 DOI: 10.1038/s41586-021-03420-7] [Citation(s) in RCA: 179] [Impact Index Per Article: 59.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 03/04/2021] [Indexed: 02/07/2023]
Abstract
The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Collapse
|
4
|
Song XH, Hsu HK, Su MT, Chang TS, Su PY, Chen M, Kuo PL. Euchromatic variants of 8q21.2 in twins. Taiwan J Obstet Gynecol 2017; 56:227-229. [PMID: 28420513 DOI: 10.1016/j.tjog.2016.07.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2016] [Indexed: 10/19/2022] Open
Abstract
OBJECTIVE Euchromatic variants (EVs) of 8q21.2 are extremely rare chromosomal abnormalities. So, far there have only been two reports on EVs of 8q21.2. Here, we report an 8q21.2 EV detected in cultured amniotic-fluid cells of twins. It was later found to be inherited from the mother, who did not present with abnormal phenotypes. CASE REPORT A pregnant woman underwent amniocentesis at 16 weeks of gestation because of advanced maternal age. This pregnancy was monozygotic twins conceived naturally. A cytogenetic analysis of cultured amniocytes revealed 46,XY,?dup(8)(q21.2). Chromosomal microarray revealed no abnormalities. C-banding and fluorescent in situ hybridization using chromosome 8 painting probe suggested euchromatic nature of the extra chromosomal band. Karyotyping of the parents showed that the EV was inherited from the mother. CONCLUSION Many, but not all, EVs are clinically innocuous. This is the first case of 8q21.2 EV reported in the ethnic Han. More cases are needed to clarify whether 8q21.2 duplication as a bona fide EV.
Collapse
Affiliation(s)
- Xiao-Hui Song
- Department of Obstetrics and Gynecology, Maternal and Child Health Hospital of Weihai City, Shandong Province, China
| | - Hui-Kuo Hsu
- Department of Obstetrics and Gynecology, National Cheng Kung University Hospital and College of Medicine, Tainan, Taiwan
| | - Mei-Tsz Su
- Department of Obstetrics and Gynecology, National Cheng Kung University Hospital and College of Medicine, Tainan, Taiwan
| | | | | | - Ming Chen
- Department of Genomic Medicine, and Center for Medical Genetics, Changhua Christian Hospital, Changhua, Taiwan
| | - Pao-Lin Kuo
- Department of Obstetrics and Gynecology, National Cheng Kung University Hospital and College of Medicine, Tainan, Taiwan.
| |
Collapse
|
5
|
Barber JCK, Sharp AJ, Hollox EJ, Tyson C. Copy number variation of the REXO1L1 gene cluster; euchromatic deletion variant or susceptibility factor? Eur J Hum Genet 2016; 25:8-9. [PMID: 27485411 DOI: 10.1038/ejhg.2016.104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Affiliation(s)
- John C K Barber
- Department of Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton General Hospital, Southampton, UK.
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York City, NY, USA
| | - Edward J Hollox
- Department of Genetics, University of Leicester, Leicester, UK
| | - Christine Tyson
- Department of Pathology, Cytogenetics Laboratory, Royal Columbian Hospital, New Westminster, British Columbia, Canada
| |
Collapse
|
6
|
Paull D, Sevilla A, Zhou H, Hahn AK, Kim H, Napolitano C, Tsankov A, Shang L, Krumholz K, Jagadeesan P, Woodard CM, Sun B, Vilboux T, Zimmer M, Forero E, Moroziewicz DN, Martinez H, Malicdan MCV, Weiss KA, Vensand LB, Dusenberry CR, Polus H, Sy KTL, Kahler DJ, Gahl WA, Solomon SL, Chang S, Meissner A, Eggan K, Noggle SA. Automated, high-throughput derivation, characterization and differentiation of induced pluripotent stem cells. Nat Methods 2015; 12:885-92. [PMID: 26237226 DOI: 10.1038/nmeth.3507] [Citation(s) in RCA: 173] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 06/25/2015] [Indexed: 12/16/2022]
Abstract
Induced pluripotent stem cells (iPSCs) are an essential tool for modeling how causal genetic variants impact cellular function in disease, as well as an emerging source of tissue for regenerative medicine. The preparation of somatic cells, their reprogramming and the subsequent verification of iPSC pluripotency are laborious, manual processes limiting the scale and reproducibility of this technology. Here we describe a modular, robotic platform for iPSC reprogramming enabling automated, high-throughput conversion of skin biopsies into iPSCs and differentiated cells with minimal manual intervention. We demonstrate that automated reprogramming and the pooled selection of polyclonal pluripotent cells results in high-quality, stable iPSCs. These lines display less line-to-line variation than either manually produced lines or lines produced through automation followed by single-colony subcloning. The robotic platform we describe will enable the application of iPSCs to population-scale biomedical problems including the study of complex genetic diseases and the development of personalized medicines.
Collapse
Affiliation(s)
- Daniel Paull
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Ana Sevilla
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Hongyan Zhou
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Aana Kim Hahn
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Hesed Kim
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | | | - Alexander Tsankov
- The Broad Institute, Cambridge, Massachusetts, USA.,The Harvard Stem Cell Institute, Harvard University, Cambridge, Massachusetts, USA.,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Linshan Shang
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Katie Krumholz
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | | | - Chris M Woodard
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Bruce Sun
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Thierry Vilboux
- Section on Human Biochemical Genetics, Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA.,Division of Medical Genomics, Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, USA
| | - Matthew Zimmer
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Eliana Forero
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | | | - Hector Martinez
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - May Christine V Malicdan
- Section on Human Biochemical Genetics, Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Keren A Weiss
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Lauren B Vensand
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Carmen R Dusenberry
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Hannah Polus
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Karla Therese L Sy
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - David J Kahler
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - William A Gahl
- Section on Human Biochemical Genetics, Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA.,NIH Undiagnosed Diseases Program, Common Fund, Office of the Director, National Institute of Health and National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, USA
| | - Susan L Solomon
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Stephen Chang
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| | - Alexander Meissner
- The Broad Institute, Cambridge, Massachusetts, USA.,The Harvard Stem Cell Institute, Harvard University, Cambridge, Massachusetts, USA.,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Kevin Eggan
- The Broad Institute, Cambridge, Massachusetts, USA.,The Harvard Stem Cell Institute, Harvard University, Cambridge, Massachusetts, USA.,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA.,The Howard Hughes Medical Institute, Cambridge, Massachusetts, USA
| | - Scott A Noggle
- The New York Stem Cell Foundation Research Institute, New York, New York, USA
| |
Collapse
|
7
|
Abstract
MOTIVATION Resolving tandemly repeated genomic sequences is a necessary step in improving our understanding of the human genome. Short tandem repeats (TRs), or microsatellites, are often used as molecular markers in genetics, and clinically, variation in microsatellites can lead to genetic disorders like Huntington's diseases. Accurately resolving repeats, and in particular TRs, remains a challenging task in genome alignment, assembly and variation calling. Though tools have been developed for detecting microsatellites in short-read sequencing data, these are limited in the size and types of events they can resolve. Single-molecule sequencing technologies may potentially resolve a broader spectrum of TRs given their increased length, but require new approaches given their significantly higher raw error profiles. However, due to inherent error profiles of the single-molecule technologies, these reads presents a unique challenge in terms of accurately identifying and estimating the TRs. RESULTS Here we present PacmonSTR, a reference-based probabilistic approach, to identify the TR region and estimate the number of these TR elements in long DNA reads. We present a multistep approach that requires as input, a reference region and the reference TR element. Initially, the TR region is identified from the long DNA reads via a 3-stage modified Smith-Waterman approach and then, expected number of TR elements is calculated using a pair-Hidden Markov Models-based method. Finally, TR-based genotype selection (or clustering: homozygous/heterozygous) is performed with Gaussian mixture models, using the Akaike information criteria, and coverage expectations.
Collapse
Affiliation(s)
- Ajay Ummat
- Department of Genetics and Genomic Science and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ali Bashir
- Department of Genetics and Genomic Science and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
8
|
Brahmachary M, Guilmatre A, Quilez J, Hasson D, Borel C, Warburton P, Sharp AJ. Digital genotyping of macrosatellites and multicopy genes reveals novel biological functions associated with copy number variation of large tandem repeats. PLoS Genet 2014; 10:e1004418. [PMID: 24945355 PMCID: PMC4063668 DOI: 10.1371/journal.pgen.1004418] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 04/22/2014] [Indexed: 11/30/2022] Open
Abstract
Tandem repeats are common in eukaryotic genomes, but due to difficulties in assaying them remain poorly studied. Here, we demonstrate the utility of Nanostring technology as a targeted approach to perform accurate measurement of tandem repeats even at extremely high copy number, and apply this technology to genotype 165 HapMap samples from three different populations and five species of non-human primates. We observed extreme variability in copy number of tandemly repeated genes, with many loci showing 5–10 fold variation in copy number among humans. Many of these loci show hallmarks of genome assembly errors, and the true copy number of many large tandem repeats is significantly under-represented even in the high quality ‘finished’ human reference assembly. Importantly, we demonstrate that most large tandem repeat variations are not tagged by nearby SNPs, and are therefore essentially invisible to SNP-based GWAS approaches. Using association analysis we identify many cis correlations of large tandem repeat variants with nearby gene expression and DNA methylation levels, indicating that variations of tandem repeat length are associated with functional effects on the local genomic environment. This includes an example where expansion of a macrosatellite repeat is associated with increased DNA methylation and suppression of nearby gene expression, suggesting a mechanism termed “repeat induced gene silencing”, which has previously been observed only in transgenic organisms. We also observed multiple signatures consistent with altered selective pressures at tandemly repeated loci, suggesting important biological functions. Our studies show that tandemly repeated loci represent a highly variable fraction of the genome that have been systematically ignored by most previous studies, copy number variation of which can exert functionally significant effects. We suggest that future studies of tandem repeat loci will lead to many novel insights into their role in modulating both genomic and phenotypic diversity. Here we utilize Nanostring digital assays and show their utility for estimating copy number of 186 multicopy genes and tandem repeats. By analyzing patterns of single nucleotide variation around these variants, we show that copy number variation at the vast majority of tandem repeat variations is not effectively tagged by nearby SNPs, and thus standard genome-wide association studies that focus on SNPs provide little or no information about such variants. By comparing patterns of tandem repeat copy number with variation in local gene expression and DNA methylation, we also identify extensive functional effects on local genome function. This includes an example of a non-coding macrosatellite repeat, expansion of which exerts a repressive effect on a nearby gene accompanied by accumulations of local DNA methylation. Finally, comparison of diverse human populations with a number of primate genomes shows that many of these sequences have undergone extreme changes in copy number during recent human and primate evolution, and show signatures that suggest possible selective effects. Overall, we conclude that multicopy genes and macrosatellites represent a highly variable fraction of the genome with important functional effects that has been systematically ignored by previous studies.
Collapse
Affiliation(s)
- Manisha Brahmachary
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Audrey Guilmatre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Javier Quilez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Dan Hasson
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Christelle Borel
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Peter Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Andrew J. Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- * E-mail:
| |
Collapse
|