1
|
Zhou B, Purmann C, Guo H, Shin G, Huang Y, Pattni R, Meng Q, Greer SU, Roychowdhury T, Wood RN, Ho M, zu Dohna H, Abyzov A, Hallmayer JF, Wong WH, Ji HP, Urban AE. Resolving the 22q11.2 deletion using CTLR-Seq reveals chromosomal rearrangement mechanisms and individual variance in breakpoints. Proc Natl Acad Sci U S A 2024; 121:e2322834121. [PMID: 39042694 PMCID: PMC11295037 DOI: 10.1073/pnas.2322834121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 06/15/2024] [Indexed: 07/25/2024] Open
Abstract
We developed a generally applicable method, CRISPR/Cas9-targeted long-read sequencing (CTLR-Seq), to resolve, haplotype-specifically, the large and complex regions in the human genome that had been previously impenetrable to sequencing analysis, such as large segmental duplications (SegDups) and their associated genome rearrangements. CTLR-Seq combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to isolate intact large (i.e., up to 2,000 kb) genomic regions that encompass previously unresolvable genomic sequences. These targets are then sequenced (amplification-free) at high on-target coverage using long-read sequencing, allowing for their complete sequence assembly. We applied CTLR-Seq to the SegDup-mediated rearrangements that constitute the boundaries of, and give rise to, the 22q11.2 Deletion Syndrome (22q11DS), the most common human microdeletion disorder. We then performed de novo assembly to resolve, at base-pair resolution, the full sequence rearrangements and exact chromosomal breakpoints of 22q11.2DS (including all common subtypes). Across multiple patients, we found a high degree of variability for both the rearranged SegDup sequences and the exact chromosomal breakpoint locations, which coincide with various transposons within the 22q11.2 SegDups, suggesting that 22q11DS can be driven by transposon-mediated genome recombination. Guided by CTLR-Seq results from two 22q11DS patients, we performed three-dimensional chromosomal folding analysis for the 22q11.2 SegDups from patient-derived neurons and astrocytes and found chromosome interactions anchored within the SegDups to be both cell type-specific and patient-specific. Lastly, we demonstrated that CTLR-Seq enables cell-type specific analysis of DNA methylation patterns within the deletion haplotype of 22q11DS.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA94305
- Stanford Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA94305
| | - Carolin Purmann
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA94305
- Stanford Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA94305
- Department of Genetics, Stanford University School of Medicine, Stanford, CA94305
| | - Hanmin Guo
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA94305
- Stanford Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA94305
- Department of Genetics, Stanford University School of Medicine, Stanford, CA94305
- Department of Statistics, Stanford University, Stanford, CA94305
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305
| | - GiWon Shin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA94305
| | - Yiling Huang
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA94305
- Department of Genetics, Stanford University School of Medicine, Stanford, CA94305
| | - Reenal Pattni
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA94305
- Department of Genetics, Stanford University School of Medicine, Stanford, CA94305
| | - Qingxi Meng
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA94305
| | - Stephanie U. Greer
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA94305
| | - Tanmoy Roychowdhury
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN55905
| | - Raegan N. Wood
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA94305
| | - Marcus Ho
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA94305
- Department of Genetics, Stanford University School of Medicine, Stanford, CA94305
| | - Heinrich zu Dohna
- Department of Biology, American University of Beirut, Beirut1107 2020, Lebanon
| | - Alexej Abyzov
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN55905
| | - Joachim F. Hallmayer
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA94305
| | - Wing H. Wong
- Department of Statistics, Stanford University, Stanford, CA94305
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA94305
| | - Alexander E. Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA94305
- Stanford Maternal and Child Health Research Institute, Stanford University School of Medicine, Stanford, CA94305
- Department of Genetics, Stanford University School of Medicine, Stanford, CA94305
- Program on Genetics of Brain Function, Stanford Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA94305
| |
Collapse
|
2
|
Jeong H, Dishuck PC, Yoo D, Harvey WT, Munson KM, Lewis AP, Kordosky J, Garcia GH, Yilmaz F, Hallast P, Lee C, Pastinen T, Eichler EE. Structural polymorphism and diversity of human segmental duplications. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.04.597452. [PMID: 38895457 PMCID: PMC11185583 DOI: 10.1101/2024.06.04.597452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Segmental duplications (SDs) contribute significantly to human disease, evolution, and diversity yet have been difficult to resolve at the sequence level. We present a population genetics survey of SDs by analyzing 170 human genome assemblies where the majority of SDs are fully resolved using long-read sequence assembly. Excluding the acrocentric short arms, we identify 173.2 Mbp of duplicated sequence (47.4 Mbp not present in the telomere-to-telomere reference) distinguishing fixed from structurally polymorphic events. We find that intrachromosomal SDs are among the most variable with rare events mapping near their progenitor sequences. African genomes harbor significantly more intrachromosomal SDs and are more likely to have recently duplicated gene families with higher copy number when compared to non-African samples. A comparison to a resource of 563 million full-length Iso-Seq reads identifies 201 novel, potentially protein-coding genes corresponding to these copy number polymorphic SDs.
Collapse
Affiliation(s)
- Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Altos Labs, San Diego, CA, USA
| | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Gage H. Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Tomi Pastinen
- Children’s Mercy Hospital and University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
3
|
Laudanski K, Elmadhoun O, Mathew A, Kahn-Pascual Y, Kerfeld MJ, Chen J, Sisniega DC, Gomez F. Anesthetic Considerations for Patients with Hereditary Neuropathy with Liability to Pressure Palsies: A Narrative Review. Healthcare (Basel) 2024; 12:858. [PMID: 38667620 PMCID: PMC11050561 DOI: 10.3390/healthcare12080858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 03/28/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024] Open
Abstract
Hereditary neuropathy with liability to pressure palsies (HNPP) is an autosomal dominant demyelinating neuropathy characterized by an increased susceptibility to peripheral nerve injury from trauma, compression, or shear forces. Patients with this condition are unique, necessitating distinct considerations for anesthesia and surgical teams. This review describes the etiology, prevalence, clinical presentation, and management of HNPP and presents contemporary evidence and recommendations for optimal care for HNPP patients in the perioperative period. While the incidence of HNPP is reported at 7-16:100,000, this figure may be an underestimation due to underdiagnosis, further complicating medicolegal issues. With the subtle nature of symptoms associated with HNPP, patients with this condition may remain unrecognized during the perioperative period, posing significant risks. Several aspects of caring for this population, including anesthetic choices, intraoperative positioning, and monitoring strategy, may deviate from standard practices. As such, a tailored approach to caring for this unique population, coupled with meticulous preoperative planning, is crucial and requires a multidisciplinary approach.
Collapse
Affiliation(s)
- Krzysztof Laudanski
- Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN 55902, USA; (K.L.); (O.E.); (M.J.K.); (J.C.)
| | - Omar Elmadhoun
- Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN 55902, USA; (K.L.); (O.E.); (M.J.K.); (J.C.)
| | - Amal Mathew
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA 19104, USA;
| | - Yul Kahn-Pascual
- St George’s University Hospitals NHS Foundation Trust, London SW17 0QT, UK;
| | - Mitchell J. Kerfeld
- Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN 55902, USA; (K.L.); (O.E.); (M.J.K.); (J.C.)
| | - James Chen
- Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN 55902, USA; (K.L.); (O.E.); (M.J.K.); (J.C.)
| | - Daniella C. Sisniega
- Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, USA;
| | - Francisco Gomez
- Department of Neurology, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
4
|
Meng A, Li X, Li Z, Miao F, Ma L, Li S, Sun W, Huang J, Yang G. Genome assembly of Melilotus officinalis provides a new reference genome for functional genomics. BMC Genom Data 2024; 25:37. [PMID: 38637749 PMCID: PMC11025269 DOI: 10.1186/s12863-024-01224-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 04/10/2024] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND Sweet yellow clover (Melilotus officinalis) is a diploid plant (2n = 16) that is native to Europe. It is an excellent legume forage. It can both fix nitrogen and serve as a medicine. A genome assembly of Melilotus officinalis that was collected from Best corporation in Beijing is available based on Nanopore sequencing. The genome of Melilotus officinalis was sequenced, assembled, and annotated. RESULTS The latest PacBio third generation HiFi assembly and sequencing strategies were used to produce a Melilotus officinalis genome assembly size of 1,066 Mbp, contig N50 = 5 Mbp, scaffold N50 = 130 Mbp, and complete benchmarking universal single-copy orthologs (BUSCOs) = 96.4%. This annotation produced 47,873 high-confidence gene models, which will substantially aid in our research on molecular breeding. A collinear analysis showed that Melilotus officinalis and Medicago truncatula shared conserved synteny. The expansion and contraction of gene families showed that Melilotus officinalis expanded by 565 gene families and shrank by 56 gene families. The contacted gene families were associated with response to stimulus, nucleotide binding, and small molecule binding. Thus, it is related to a family of genes associated with peptidase activity, which could lead to better stress tolerance in plants. CONCLUSIONS In this study, the latest PacBio technology was used to assemble and sequence the genome of the Melilotus officinalis and annotate its protein-coding genes. These results will expand the genomic resources available for Melilotus officinalis and should assist in subsequent research on sweet yellow clover plants.
Collapse
Affiliation(s)
- Aoran Meng
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Xinru Li
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Zhiguang Li
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Fuhong Miao
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Lichao Ma
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Shuo Li
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | - Wenfei Sun
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China
| | | | - Guofeng Yang
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, College of Grassland Science, Qingdao Agricultural University, 266109, Qingdao, China.
| |
Collapse
|
5
|
Bukhman YV, Morin PA, Meyer S, Chu LF, Jacobsen JK, Antosiewicz-Bourget J, Mamott D, Gonzales M, Argus C, Bolin J, Berres ME, Fedrigo O, Steill J, Swanson SA, Jiang P, Rhie A, Formenti G, Phillippy AM, Harris RS, Wood JMD, Howe K, Kirilenko BM, Munegowda C, Hiller M, Jain A, Kihara D, Johnston JS, Ionkov A, Raja K, Toh H, Lang A, Wolf M, Jarvis ED, Thomson JA, Chaisson MJP, Stewart R. A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography. Mol Biol Evol 2024; 41:msae036. [PMID: 38376487 PMCID: PMC10919930 DOI: 10.1093/molbev/msae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 01/11/2024] [Accepted: 01/22/2024] [Indexed: 02/21/2024] Open
Abstract
The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.
Collapse
Affiliation(s)
- Yury V Bukhman
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Phillip A Morin
- Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration (NOAA), La Jolla, CA 92037, USA
| | - Susanne Meyer
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Li-Fang Chu
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
- Department of Comparative Biology and Experimental Medicine, University of Calgary, Calgary, Canada
| | | | | | - Daniel Mamott
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Maylie Gonzales
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Cara Argus
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Jennifer Bolin
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Mark E Berres
- University of Wisconsin Biotechnology Center, Bioinformatics Resource Center, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
| | - John Steill
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Scott A Swanson
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Peng Jiang
- Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH, USA
- Department of Biological, Geological and Environmental Sciences, Cleveland State University, Cleveland, OH, USA
- Center for RNA Science and Therapeutics, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Arang Rhie
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD 20892, USA
| | - Giulio Formenti
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, New York, NY 10065, USA
| | - Adam M Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD 20892, USA
| | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | | | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Bogdan M Kirilenko
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
| | - Chetan Munegowda
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - J Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA
| | - Alexander Ionkov
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Kalpana Raja
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Huishi Toh
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Aimee Lang
- Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration (NOAA), La Jolla, CA 92037, USA
| | - Magnus Wolf
- Institute for Evolution and Biodiversity (IEB), University of Muenster, 48149, Muenster, Germany
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, New York, NY 10065, USA
| | - James A Thomson
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
- Department of Molecular, Cellular and Developmental Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA
- Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI 53726, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA
| | - Ron Stewart
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| |
Collapse
|
6
|
Yang Y, Wu Z, Wu Z, Li T, Shen Z, Zhou X, Wu X, Li G, Zhang Y. A near-complete assembly of asparagus bean provides insights into anthocyanin accumulation in pods. PLANT BIOTECHNOLOGY JOURNAL 2023; 21:2473-2489. [PMID: 37558431 PMCID: PMC10651155 DOI: 10.1111/pbi.14142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 07/11/2023] [Accepted: 07/23/2023] [Indexed: 08/11/2023]
Abstract
Asparagus bean (Vigna unguiculata ssp. sesquipedialis), a subspecies of V. unguiculata, is a vital legume crop widely cultivated in Asia for its tender pods consumed as vegetables. However, the existing asparagus bean assemblies still contain numerous gaps and unanchored sequences, which presents challenges to functional genomics research. Here, we present an improved reference genome sequence of an elite asparagus bean variety, Fengchan 6, achieved through the integration of nanopore ultra-long reads, PacBio high-fidelity reads, and Hi-C technology. The improved assembly is 521.3 Mb in length and demonstrates several enhancements, including a higher N50 length (46.4 Mb), an anchor ratio of 99.8%, and the presence of only one gap. Furthermore, we successfully assembled 14 telomeres and all 11 centromeres, including four telomere-to-telomere chromosomes. Remarkably, the centromeric regions cover a total length of 38.1 Mb, providing valuable insights into the complex architecture of centromeres. Among the 30 594 predicted protein-coding genes, we identified 2356 genes that are tandemly duplicated in segmental duplication regions. These findings have implications for defence responses and may contribute to evolutionary processes. By utilizing the reference genome, we were able to effectively identify the presence of the gene VuMYB114, which regulates the accumulation of anthocyanins, thereby controlling the purple coloration of the pods. This discovery holds significant implications for understanding the underlying mechanisms of color determination and the breeding process. Overall, the highly improved reference genome serves as crucial resource and lays a solid foundation for asparagus bean genomic studies and genetic improvement efforts.
Collapse
Affiliation(s)
- Yi Yang
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic CenterSun Yat‐Sen UniversityGuangzhouChina
| | - Zengxiang Wu
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Tinyao Li
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Zhuo Shen
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Xuan Zhou
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Xinyi Wu
- Institute of VegetableZhejiang Academy of Agricultural SciencesHangzhouChina
| | - Guojing Li
- Institute of VegetableZhejiang Academy of Agricultural SciencesHangzhouChina
| | - Yan Zhang
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| |
Collapse
|
7
|
Tan KT, Slevin MK, Leibowitz ML, Garrity-Janger M, Li H, Meyerson M. Neotelomeres and Telomere-Spanning Chromosomal Arm Fusions in Cancer Genomes Revealed by Long-Read Sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569101. [PMID: 38077026 PMCID: PMC10705422 DOI: 10.1101/2023.11.30.569101] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Alterations in the structure and location of telomeres are key events in cancer genome evolution. However, previous genomic approaches, unable to span long telomeric repeat arrays, could not characterize the nature of these alterations. Here, we applied both long-read and short-read genome sequencing to assess telomere repeat-containing structures in cancers and cancer cell lines. Using long-read genome sequences that span telomeric repeat arrays, we defined four types of telomere repeat variations in cancer cells: neotelomeres where telomere addition heals chromosome breaks, chromosomal arm fusions spanning telomere repeats, fusions of neotelomeres, and peri-centromeric fusions with adjoined telomere and centromere repeats. Analysis of lung adenocarcinoma genome sequences identified somatic neotelomere and telomere-spanning fusion alterations. These results provide a framework for systematic study of telomeric repeat arrays in cancer genomes, that could serve as a model for understanding the somatic evolution of other repetitive genomic elements.
Collapse
Affiliation(s)
- Kar-Tong Tan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02215, USA
| | - Michael K. Slevin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Center for Cancer Genomics, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Mitchell L. Leibowitz
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02215, USA
| | - Max Garrity-Janger
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02215, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, USA
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02215, USA
- Center for Cancer Genomics, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Lead contact
| |
Collapse
|
8
|
Wang S, Shen Y, Lin Z, Miao Y, Wang C, Zhang W, Zhang Y. New genes driven by segmental duplications share a testis-specific expression pattern in the chromosome-level genome assembly of tree sparrow. Integr Zool 2023. [PMID: 38014459 DOI: 10.1111/1749-4877.12789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Based on a chromosome-level genome assembly, a burst of new genes with different structures but a similar testis-specific expression pattern was detected in tree sparrow.
Collapse
Affiliation(s)
- Shengnan Wang
- Gansu Key Laboratory of Biomonitoring and Bioremediation for Environmental Pollution, School of Life Science, Lanzhou University, Lanzhou, China
| | - Yue Shen
- Gansu Key Laboratory of Biomonitoring and Bioremediation for Environmental Pollution, School of Life Science, Lanzhou University, Lanzhou, China
| | - Zhaocun Lin
- Gansu Key Laboratory of Biomonitoring and Bioremediation for Environmental Pollution, School of Life Science, Lanzhou University, Lanzhou, China
| | - Yuquan Miao
- Gansu Key Laboratory of Biomonitoring and Bioremediation for Environmental Pollution, School of Life Science, Lanzhou University, Lanzhou, China
| | - Chengqi Wang
- Gansu Key Laboratory of Biomonitoring and Bioremediation for Environmental Pollution, School of Life Science, Lanzhou University, Lanzhou, China
| | - Wenya Zhang
- Gansu Key Laboratory of Biomonitoring and Bioremediation for Environmental Pollution, School of Life Science, Lanzhou University, Lanzhou, China
| | - Yingmei Zhang
- Gansu Key Laboratory of Biomonitoring and Bioremediation for Environmental Pollution, School of Life Science, Lanzhou University, Lanzhou, China
| |
Collapse
|
9
|
Klussmeier A, Putke K, Klasberg S, Kohler M, Sauter J, Schefzyk D, Schöfl G, Massalski C, Schäfer G, Schmidt AH, Roers A, Lange V. High population frequencies of MICA copy number variations originate from independent recombination events. Front Immunol 2023; 14:1297589. [PMID: 38035108 PMCID: PMC10684724 DOI: 10.3389/fimmu.2023.1297589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 10/24/2023] [Indexed: 12/02/2023] Open
Abstract
MICA is a stress-induced ligand of the NKG2D receptor that stimulates NK and T cell responses and was identified as a key determinant of anti-tumor immunity. The MICA gene is located inside the MHC complex and is in strong linkage disequilibrium with HLA-B. While an HLA-B*48-linked MICA deletion-haplotype was previously described in Asian populations, little is known about other MICA copy number variations. Here, we report the genotyping of more than two million individuals revealing high frequencies of MICA duplications (1%) and MICA deletions (0.4%). Their prevalence differs between ethnic groups and can rise to 2.8% (Croatia) and 9.2% (Mexico), respectively. Targeted sequencing of more than 70 samples indicates that these copy number variations originate from independent nonallelic homologous recombination events between segmental duplications upstream of MICA and MICB. Overall, our data warrant further investigation of disease associations and consideration of MICA copy number data in oncological study protocols.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Axel Roers
- Institute for Immunology, Medical Faculty Carl Gustav Carus, University of Technology (TU) Dresden, Dresden, Germany
- Institute for Immunology, University Hospital Heidelberg, Heidelberg, Germany
| | | |
Collapse
|
10
|
Feng LY, Lin PF, Xu RJ, Kang HQ, Gao LZ. Comparative Genomic Analysis of Asian Cultivated Rice and Its Wild Progenitor ( Oryza rufipogon) Has Revealed Evolutionary Innovation of the Pentatricopeptide Repeat Gene Family through Gene Duplication. Int J Mol Sci 2023; 24:16313. [PMID: 38003501 PMCID: PMC10671101 DOI: 10.3390/ijms242216313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/10/2023] [Accepted: 11/12/2023] [Indexed: 11/26/2023] Open
Abstract
The pentatricopeptide repeat (PPR) gene family is one of the largest gene families in land plants. However, current knowledge about the evolution of the PPR gene family remains largely limited. In this study, we performed a comparative genomic analysis of the PPR gene family in O. sativa and its wild progenitor, O. rufipogon, and outlined a comprehensive landscape of gene duplications. Our findings suggest that the majority of PPR genes originated from dispersed duplications. Although segmental duplications have only expanded approximately 11.30% and 13.57% of the PPR gene families in the O. sativa and O. rufipogon genomes, we interestingly obtained evidence that segmental duplication promotes the structural diversity of PPR genes through incomplete gene duplications. In the O. sativa and O. rufipogon genomes, 10 (~33.33%) and 22 pairs of gene duplications (~45.83%) had non-PPR paralogous genes through incomplete gene duplication. Segmental duplications leading to incomplete gene duplications might result in the acquisition of domains, thus promoting functional innovation and structural diversification of PPR genes. This study offers a unique perspective on the evolution of PPR gene structures and underscores the potential role of segmental duplications in PPR gene structural diversity.
Collapse
Affiliation(s)
- Li-Ying Feng
- Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou 510642, China; (L.-Y.F.); (P.-F.L.)
| | - Pei-Fan Lin
- Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou 510642, China; (L.-Y.F.); (P.-F.L.)
| | - Rong-Jing Xu
- Tropical Biodiversity and Genomics Research Center, Hainan University, Haikou 570228, China; (R.-J.X.); (H.-Q.K.)
| | - Hai-Qi Kang
- Tropical Biodiversity and Genomics Research Center, Hainan University, Haikou 570228, China; (R.-J.X.); (H.-Q.K.)
| | - Li-Zhi Gao
- Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou 510642, China; (L.-Y.F.); (P.-F.L.)
- Tropical Biodiversity and Genomics Research Center, Hainan University, Haikou 570228, China; (R.-J.X.); (H.-Q.K.)
| |
Collapse
|
11
|
Sun M, Yao C, Shu Q, He Y, Chen G, Yang G, Xu S, Liu Y, Xue Z, Wu J. Telomere-to-telomere pear ( Pyrus pyrifolia) reference genome reveals segmental and whole genome duplication driving genome evolution. HORTICULTURE RESEARCH 2023; 10:uhad201. [PMID: 38023478 PMCID: PMC10681005 DOI: 10.1093/hr/uhad201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 10/01/2023] [Indexed: 12/01/2023]
Abstract
Previously released pear genomes contain a plethora of gaps and unanchored genetic regions. Here, we report a telomere-to-telomere (T2T) gap-free genome for the red-skinned pear, 'Yunhong No. 1' (YH1; Pyrus pyrifolia), which is mainly cultivated in Yunnan Province (southwest China), the pear's primary region of origin. The YH1 genome is 501.20 Mb long with a contig N50 length of 29.26 Mb. All 17 chromosomes were assembled to the T2T level with 34 characterized telomeres. The 17 centromeres were predicted and mainly consist of centromeric-specific monomers (CEN198) and long terminal repeat (LTR) Gypsy elements (≥74.73%). By filling all unclosed gaps, the integrity of YH1 is markedly improved over previous P. pyrifolia genomes ('Cuiguan' and 'Nijisseiki'). A total of 1531 segmental duplication (SD) driven duplicated genes were identified and enriched in stress response pathways. Intrachromosomal SDs drove the expansion of disease resistance genes, suggesting the potential of SDs in adaptive pear evolution. A large proportion of duplicated gene pairs exhibit dosage effects or sub-/neo-functionalization, which may affect agronomic traits like stone cell content, sugar content, and fruit skin russet. Furthermore, as core regulators of anthocyanin biosynthesis, we found that MYB10 and MYB114 underwent various gene duplication events. Multiple copies of MYB10 and MYB114 displayed obvious dosage effects, indicating role differentiation in the formation of red-skinned pear fruit. In summary, the T2T gap-free pear genome provides invaluable resources for genome evolution and functional genomics.
Collapse
Affiliation(s)
- Manyi Sun
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Chenjie Yao
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Qun Shu
- Institute of Horticulture, Yunnan Academy of Agricultural Sciences, Kunming 650205, China
| | - Yingyun He
- Institute of Horticulture, Yunnan Academy of Agricultural Sciences, Kunming 650205, China
| | - Guosong Chen
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Guangyan Yang
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Shaozhuo Xu
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Yueyuan Liu
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Zhaolong Xue
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Jun Wu
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| |
Collapse
|
12
|
Hanssen R, Auwerx C, Jõeloo M, Sadler MC, Henning E, Keogh J, Bounds R, Smith M, Firth HV, Kutalik Z, Farooqi IS, Reymond A, Lawler K. Chromosomal deletions on 16p11.2 encompassing SH2B1 are associated with accelerated metabolic disease. Cell Rep Med 2023; 4:101155. [PMID: 37586323 PMCID: PMC10439272 DOI: 10.1016/j.xcrm.2023.101155] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/08/2023] [Accepted: 07/18/2023] [Indexed: 08/18/2023]
Abstract
New approaches are needed to treat people whose obesity and type 2 diabetes (T2D) are driven by specific mechanisms. We investigate a deletion on chromosome 16p11.2 (breakpoint 2-3 [BP2-3]) encompassing SH2B1, a mediator of leptin and insulin signaling. Phenome-wide association scans in the UK (N = 502,399) and Estonian (N = 208,360) biobanks show that deletion carriers have increased body mass index (BMI; p = 1.3 × 10-10) and increased rates of T2D. Compared with BMI-matched controls, deletion carriers have an earlier onset of T2D, with poorer glycemic control despite higher medication usage. Cystatin C, a biomarker of kidney function, is significantly elevated in deletion carriers, suggesting increased risk of renal impairment. In a Mendelian randomization study, decreased SH2B1 expression increases T2D risk (p = 8.1 × 10-6). We conclude that people with 16p11.2 BP2-3 deletions have early, complex obesity and T2D and may benefit from therapies that enhance leptin and insulin signaling.
Collapse
Affiliation(s)
- Ruth Hanssen
- University of Cambridge Metabolic Research Laboratories, Wellcome-MRC Institute of Metabolic Science and NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK
| | - Chiara Auwerx
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; University Center for Primary Care and Public Health, 1010 Lausanne, Switzerland
| | - Maarja Jõeloo
- Institute of Molecular and Cell Biology, University of Tartu, 51010 Tartu, Estonia; Estonian Genome Centre, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
| | - Marie C Sadler
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; University Center for Primary Care and Public Health, 1010 Lausanne, Switzerland
| | - Elana Henning
- University of Cambridge Metabolic Research Laboratories, Wellcome-MRC Institute of Metabolic Science and NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK
| | - Julia Keogh
- University of Cambridge Metabolic Research Laboratories, Wellcome-MRC Institute of Metabolic Science and NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK
| | - Rebecca Bounds
- University of Cambridge Metabolic Research Laboratories, Wellcome-MRC Institute of Metabolic Science and NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK
| | - Miriam Smith
- University of Cambridge Metabolic Research Laboratories, Wellcome-MRC Institute of Metabolic Science and NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK
| | - Helen V Firth
- Department of Clinical Genetics, Cambridge University Hospitals NHS Foundation Trust & Wellcome Sanger Institute, Cambridge, UK
| | - Zoltán Kutalik
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; University Center for Primary Care and Public Health, 1010 Lausanne, Switzerland
| | - I Sadaf Farooqi
- University of Cambridge Metabolic Research Laboratories, Wellcome-MRC Institute of Metabolic Science and NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.
| | - Katherine Lawler
- University of Cambridge Metabolic Research Laboratories, Wellcome-MRC Institute of Metabolic Science and NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.
| |
Collapse
|
13
|
Zhang X, Li J, Cao Y, Huang J, Duan Q. Genome-Wide Identification and Expression Analysis under Abiotic Stress of BrAHL Genes in Brassica rapa. Int J Mol Sci 2023; 24:12447. [PMID: 37569822 PMCID: PMC10420281 DOI: 10.3390/ijms241512447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 07/31/2023] [Accepted: 08/01/2023] [Indexed: 08/13/2023] Open
Abstract
The AT-hook motif nuclear localized (AHL) gene family is a highly conserved transcription factor critical for the growth, development, and stress tolerance of plants. However, the function of the AHL gene family in Brassica rapa (B. rapa) remains unclear. In this study, 42 AHL family members were identified from the B. rapa genome and mapped to nine B. rapa chromosomes. Two clades have formed in the evolution of the AHL gene family. The results showed that most products encoded by AHL family genes are located in the nucleus. Gene duplication was common and expanded the BrAHL gene family. According to the analysis of cis-regulatory elements, the genes interact with stress responses (osmotic, cold, and heavy metal stress), major hormones (abscisic acid), and light responses. In addition, the expression profiles revealed that BrAHL genes are widely expressed in different tissues. BrAHL16 was upregulated at 4 h under drought stress, highly expressed under cadmium conditions, and downregulated in response to cold conditions. BrAHL02 and BrAHL24 were upregulated at the initial time point and peaked at 12 h under cold and cadmium stress, respectively. Notably, the interactions between AHL genes and proteins under drought, cold, and heavy metal stresses were observed when predicting the protein-protein interaction network.
Collapse
Affiliation(s)
| | | | | | - Jiabao Huang
- College of Horticulture Science and Engineering, Shandong Agricultural University, Tai’an 271018, China; (X.Z.); (J.L.); (Y.C.)
| | - Qiaohong Duan
- College of Horticulture Science and Engineering, Shandong Agricultural University, Tai’an 271018, China; (X.Z.); (J.L.); (Y.C.)
| |
Collapse
|
14
|
Soto DC, Uribe-Salazar JM, Shew CJ, Sekar A, McGinty S, Dennis MY. Genomic structural variation: A complex but important driver of human evolution. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2023; 181 Suppl 76:118-144. [PMID: 36794631 PMCID: PMC10329998 DOI: 10.1002/ajpa.24713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 01/21/2023] [Accepted: 02/05/2023] [Indexed: 02/17/2023]
Abstract
Structural variants (SVs)-including duplications, deletions, and inversions of DNA-can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single-nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well-documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single-nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever-expanding SV compendium propelled by biotechnology advancements.
Collapse
Affiliation(s)
- Daniela C. Soto
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - José M. Uribe-Salazar
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Aarthi Sekar
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Sean McGinty
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| |
Collapse
|
15
|
Leggatt G, Cheng G, Narain S, Briseño-Roa L, Annereau JP, Gast C, Gilbert RD, Ennis S. A genotype-to-phenotype approach suggests under-reporting of single nucleotide variants in nephrocystin-1 (NPHP1) related disease (UK 100,000 Genomes Project). Sci Rep 2023; 13:9369. [PMID: 37296294 PMCID: PMC10256716 DOI: 10.1038/s41598-023-32169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 03/23/2023] [Indexed: 06/12/2023] Open
Abstract
Autosomal recessive whole gene deletions of nephrocystin-1 (NPHP1) result in abnormal structure and function of the primary cilia. These deletions can result in a tubulointerstitial kidney disease known as nephronophthisis and retinal (Senior-Løken syndrome) and neurological (Joubert syndrome) diseases. Nephronophthisis is a common cause of end-stage kidney disease (ESKD) in children and up to 1% of adult onset ESKD. Single nucleotide variants (SNVs) and small insertions and deletions (Indels) have been less well characterised. We used a gene pathogenicity scoring system (GenePy) and a genotype-to-phenotype approach on individuals recruited to the UK Genomics England (GEL) 100,000 Genomes Project (100kGP) (n = 78,050). This approach identified all participants with NPHP1-related diseases reported by NHS Genomics Medical Centres and an additional eight participants. Extreme NPHP1 gene scores, often underpinned by clear recessive inheritance, were observed in patients from diverse recruitment categories, including cancer, suggesting the possibility of a more widespread disease than previously appreciated. In total, ten participants had homozygous CNV deletions with eight homozygous or compound heterozygous with SNVs. Our data also reveals strong in-silico evidence that approximately 44% of NPHP1 related disease may be due to SNVs with AlphaFold structural modelling evidence for a significant impact on protein structure. This study suggests historical under-reporting of SNVS in NPHP1 related diseases compared with CNVs.
Collapse
Affiliation(s)
- Gary Leggatt
- University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road Shirley, Southampton, SO16 6YD, UK.
- Wessex Kidney Centre, Portsmouth Hospitals University NHS Trust, Southwick Hill Road, Cosham, Portsmouth, PO6 3LY, UK.
- University Hospital Southampton NHS Foundation Trust, Southampton General Hospital, Tremona Road Shirley, Southampton, SO16 6YD, UK.
| | - Guo Cheng
- University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road Shirley, Southampton, SO16 6YD, UK
| | - Sumit Narain
- University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road Shirley, Southampton, SO16 6YD, UK
| | - Luis Briseño-Roa
- Medetia, Imagine Institute for Genetic Diseases, 24 Boulevard du Montparnasse, 75015, Paris, France
| | - Jean-Philippe Annereau
- Medetia, Imagine Institute for Genetic Diseases, 24 Boulevard du Montparnasse, 75015, Paris, France
| | - Christine Gast
- University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road Shirley, Southampton, SO16 6YD, UK
- Wessex Kidney Centre, Portsmouth Hospitals University NHS Trust, Southwick Hill Road, Cosham, Portsmouth, PO6 3LY, UK
| | - Rodney D Gilbert
- University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road Shirley, Southampton, SO16 6YD, UK
- Southampton Children's Hospital, Southampton General Hospital, Tremona Road Shirley, Southampton, SO16 6YD, UK
| | - Sarah Ennis
- University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road Shirley, Southampton, SO16 6YD, UK
| |
Collapse
|
16
|
Ferraj A, Audano PA, Balachandran P, Czechanski A, Flores JI, Radecki AA, Mosur V, Gordon DS, Walawalkar IA, Eichler EE, Reinholdt LG, Beck CR. Resolution of structural variation in diverse mouse genomes reveals chromatin remodeling due to transposable elements. CELL GENOMICS 2023; 3:100291. [PMID: 37228752 PMCID: PMC10203049 DOI: 10.1016/j.xgen.2023.100291] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 02/03/2023] [Accepted: 03/10/2023] [Indexed: 05/25/2023]
Abstract
Diverse inbred mouse strains are important biomedical research models, yet genome characterization of many strains is fundamentally lacking in comparison with humans. In particular, catalogs of structural variants (SVs) (variants ≥ 50 bp) are incomplete, limiting the discovery of causative alleles for phenotypic variation. Here, we resolve genome-wide SVs in 20 genetically distinct inbred mice with long-read sequencing. We report 413,758 site-specific SVs affecting 13% (356 Mbp) of the mouse reference assembly, including 510 previously unannotated coding variants. We substantially improve the Mus musculus transposable element (TE) callset, and we find that TEs comprise 39% of SVs and account for 75% of altered bases. We further utilize this callset to investigate how TE heterogeneity affects mouse embryonic stem cells and find multiple TE classes that influence chromatin accessibility. Our work provides a comprehensive analysis of SVs found in diverse mouse genomes and illustrates the role of TEs in epigenetic differences.
Collapse
Affiliation(s)
- Ardian Ferraj
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Peter A. Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | | | | | - Jacob I. Flores
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Alexander A. Radecki
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Varun Mosur
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - David S. Gordon
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Isha A. Walawalkar
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Evan E. Eichler
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | - Christine R. Beck
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
17
|
Vollger MR, Dishuck PC, Harvey WT, DeWitt WS, Guitart X, Goldberg ME, Rozanski AN, Lucas J, Asri M, Munson KM, Lewis AP, Hoekzema K, Logsdon GA, Porubsky D, Paten B, Harris K, Hsieh P, Eichler EE. Increased mutation and gene conversion within human segmental duplications. Nature 2023; 617:325-334. [PMID: 37165237 PMCID: PMC10172114 DOI: 10.1038/s41586-023-05895-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.
Collapse
Affiliation(s)
- Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William S DeWitt
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael E Goldberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
18
|
Xiong Y, Zhang H, Zhou S, Ma L, Xiao W, Wu Y, Yuan YJ. Structural Variations and Adaptations of Synthetic Chromosome Ends Driven by SCRaMbLE in Haploid and Diploid Yeasts. ACS Synth Biol 2023; 12:689-699. [PMID: 36821394 DOI: 10.1021/acssynbio.2c00424] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Variations and adaptations of chromosome ends play an important role in eukaryotic karyotype evolution. Traditional experimental studies of the adaptations of chromosome ends mainly rely on the strategy of introducing defects; thus, the adaptation methods of survivors may vary depending on the initial defects. Here, using the SCRaMbLE strategy, we obtained a library of haploid and diploid synthetic strains with variations in chromosome ends. Analysis of the SCRaMbLEd survivors revealed four routes of adaptation: homologous recombination between nonhomologous chromosome arms (haploids) or homologous chromosome arms (diploids), site-specific recombination between intra- or interchromosomal ends, circularization of chromosomes, and loss of whole chromosomes (diploids). We also found that circularization of synthetic chromosomes can be generated by SCRaMbLE. Our study of various adaptation routes of chromosome ends provides insight into eukaryotic karyotype evolution from the viewpoint of synthetic genomics.
Collapse
Affiliation(s)
- Yao Xiong
- Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, China.,Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Hui Zhang
- Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, China.,Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Sijie Zhou
- Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, China.,Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Lu Ma
- Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, China.,Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Wenhai Xiao
- Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, China.,Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Yi Wu
- Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, China.,Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Ying-Jin Yuan
- Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, China.,Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| |
Collapse
|
19
|
Sun YH, Cui H, Song C, Shen JT, Zhuo X, Wang RH, Yu X, Ndamba R, Mu Q, Gu H, Wang D, Murthy GG, Li P, Liang F, Liu L, Tao Q, Wang Y, Orlowski S, Xu Q, Zhou H, Jagne J, Gokcumen O, Anthony N, Zhao X, Li XZ. Amniotes co-opt intrinsic genetic instability to protect germ-line genome integrity. Nat Commun 2023; 14:812. [PMID: 36781861 PMCID: PMC9925758 DOI: 10.1038/s41467-023-36354-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 01/27/2023] [Indexed: 02/15/2023] Open
Abstract
Unlike PIWI-interacting RNA (piRNA) in other species that mostly target transposable elements (TEs), >80% of piRNAs in adult mammalian testes lack obvious targets. However, mammalian piRNA sequences and piRNA-producing loci evolve more rapidly than the rest of the genome for unknown reasons. Here, through comparative studies of chickens, ducks, mice, and humans, as well as long-read nanopore sequencing on diverse chicken breeds, we find that piRNA loci across amniotes experience: (1) a high local mutation rate of structural variations (SVs, mutations ≥ 50 bp in size); (2) positive selection to suppress young and actively mobilizing TEs commencing at the pachytene stage of meiosis during germ cell development; and (3) negative selection to purge deleterious SV hotspots. Our results indicate that genetic instability at pachytene piRNA loci, while producing certain pathogenic SVs, also protects genome integrity against TE mobilization by driving the formation of rapid-evolving piRNA sequences.
Collapse
Affiliation(s)
- Yu H Sun
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Hongxiao Cui
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Chi Song
- College of Public Health, Division of Biostatistics, The Ohio State University, Columbus, OH, 43210, USA
| | - Jiafei Teng Shen
- International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, 322000, China
| | - Xiaoyu Zhuo
- Department of Genetics, The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Ruoqiao Huiyi Wang
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Xiaohui Yu
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Rudo Ndamba
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Qian Mu
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Hanwen Gu
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Duolin Wang
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Gayathri Guru Murthy
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Pidong Li
- Grandomics Biosciences Co., Ltd, Beijing, 102206, China
| | - Fan Liang
- Grandomics Biosciences Co., Ltd, Beijing, 102206, China
| | - Lei Liu
- Grandomics Biosciences Co., Ltd, Beijing, 102206, China
| | - Qing Tao
- Grandomics Biosciences Co., Ltd, Beijing, 102206, China
| | - Ying Wang
- Department of Animal Science, University of California, Davis, CA, 95616, USA
| | - Sara Orlowski
- Department of Poultry Science, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Qi Xu
- Department of Animal Science, McGill University, Quebec, H9X 3V9, Canada
| | - Huaijun Zhou
- Department of Animal Science, University of California, Davis, CA, 95616, USA
| | - Jarra Jagne
- Animal Health Diagnostic Center, Cornell University College of Veterinary Medicine, Ithaca, NY, 14850, USA
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, State University of New York, Buffalo, NY, 14260, USA
| | - Nick Anthony
- Department of Poultry Science, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Xin Zhao
- Department of Animal Science, McGill University, Quebec, H9X 3V9, Canada.
| | - Xin Zhiguo Li
- Center for RNA Biology: From Genome to Therapeutics, Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY, 14642, USA.
| |
Collapse
|
20
|
Wang Y, Yu J, Jiang M, Lei W, Zhang X, Tang H. Sequencing and Assembly of Polyploid Genomes. Methods Mol Biol 2023; 2545:429-458. [PMID: 36720827 DOI: 10.1007/978-1-0716-2561-3_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Polyploidy has been observed throughout major eukaryotic clades and has played a vital role in the evolution of angiosperms. Recent polyploidizations often result in highly complex genome structures, posing challenges to genome assembly and phasing. Recent advances in sequencing technologies and genome assembly algorithms have enabled high-quality, near-complete chromosome-level assemblies of polyploid genomes. Advances in novel sequencing technologies include highly accurate single-molecule sequencing with HiFi reads, chromosome conformation capture with Hi-C technique, and linked reads sequencing. Additionally, new computational approaches have also significantly improved the precision and reliability of polyploid genome assembly and phasing, such as HiCanu, hifiasm, ALLHiC, and PolyGembler. Herein, we review recently published polyploid genomes and compare the various sequencing, assembly, and phasing approaches that are utilized in these genome studies. Finally, we anticipate that accurate and telomere-to-telomere chromosome-level assembly of polyploid genomes could ultimately become a routine procedure in the near future.
Collapse
Affiliation(s)
- Yibin Wang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Jiaxin Yu
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Mengwei Jiang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Wenlong Lei
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xingtan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Haibao Tang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
21
|
Serrano C, Lopes-Marques M, Amorim A, João Prata M, Azevedo L. A partial duplication of an X-linked gene exclusive of a primate lineage (Macaca). Gene 2023; 851:146997. [DOI: 10.1016/j.gene.2022.146997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/12/2022] [Accepted: 10/18/2022] [Indexed: 11/04/2022]
|
22
|
Li C, Fan X, Guo X, Liu Y, Wang M, Zhao XC, Wu P, Yan Q, Sun L. Accuracy benchmark of the GeneMind GenoLab M sequencing platform for WGS and WES analysis. BMC Genomics 2022; 23:533. [PMID: 35869426 PMCID: PMC9308344 DOI: 10.1186/s12864-022-08775-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 07/18/2022] [Indexed: 11/23/2022] Open
Abstract
Background GenoLab M is a recently developed next-generation sequencing (NGS) platform from GeneMind Biosciences. To establish the performance of GenoLab M, we present the first report to benchmark and compare the WGS and WES sequencing data of the GenoLab M sequencer to NovaSeq 6000 and NextSeq 550 platform in various types of analysis. For WGS, thirty-fold sequencing from Illumina NovaSeq platform and processed by GATK pipeline is currently considered as the golden standard. Thus this dataset is generated as a benchmark reference in this study. Results GenoLab M showed an average of 94.62% of Q20 percentage for base quality, while the NovaSeq was slightly higher at 96.97%. However, GenoLab M outperformed NovaSeq or NextSeq at a duplication rate, suggesting more usable data after deduplication. For WGS short variant calling, GenoLab M showed significant accuracy improvement over the same depth dataset from NovaSeq, and reached similar accuracy to NovaSeq 33X dataset with 22x depth. For 100X WES, the F-score and Precision in GenoLab M were higher than NovaSeq or NextSeq, especially for InDel calling. Conclusions GenoLab M is a promising NGS platform for high-performance WGS and WES applications. For WGS, 22X depth in the GenoLab M sequencing platform offers a cost-effective alternative to the current mainstream 33X depth on Illumina.
Collapse
|
23
|
Khera AV, Wang M, Chaffin M, Emdin CA, Samani NJ, Schunkert H, Watkins H, McPherson R, Elosua R, Boerwinkle E, Ardissino D, Butterworth AS, Di Angelantonio E, Naheed A, Danesh J, Chowdhury R, Krumholz HM, Sheu WHH, Rich SS, Rotter JI, Chen YDI, Gabriel S, Lander ES, Saleheen D, Kathiresan S. Gene Sequencing Identifies Perturbation in Nitric Oxide Signaling as a Nonlipid Molecular Subtype of Coronary Artery Disease. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2022; 15:e003598. [PMID: 36215124 PMCID: PMC9771961 DOI: 10.1161/circgen.121.003598] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 06/24/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND A key goal of precision medicine is to disaggregate common, complex diseases into discrete molecular subtypes. Rare coding variants in the low-density lipoprotein receptor gene (LDLR) are identified in 1% to 2% of coronary artery disease (CAD) patients, defining a molecular subtype with risk driven by hypercholesterolemia. METHODS To search for additional subtypes, we compared the frequency of rare, predicted loss-of-function and damaging missense variants aggregated within a given gene in 41 081 CAD cases versus 217 115 controls. RESULTS Rare variants in LDLR were most strongly associated with CAD, present in 1% of cases and associated with 4.4-fold increased CAD risk. A second subtype was characterized by variants in endothelial nitric oxide synthase gene (NOS3), a key enzyme regulating vascular tone, endothelial function, and platelet aggregation. A rare predicted loss-of-function or damaging missense variants in NOS3 was present in 0.6% of cases and associated with 2.42-fold increased risk of CAD (95% CI, 1.80-3.26; P=5.50×10-9). These variants were associated with higher systolic blood pressure (+3.25 mm Hg; [95% CI, 1.86-4.65]; P=5.00×10-6) and increased risk of hypertension (adjusted odds ratio 1.31; [95% CI, 1.14-1.51]; P=2.00×10-4) but not circulating cholesterol concentrations, suggesting that, beyond lipid pathways, nitric oxide synthesis is a key nonlipid driver of CAD risk. CONCLUSIONS Beyond LDLR, we identified an additional nonlipid molecular subtype of CAD characterized by rare variants in the NOS3 gene.
Collapse
Affiliation(s)
- Amit V. Khera
- Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA
- Ctr for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Dept of Medicine, Harvard Medical School, Boston, MA
- Cardiology Division, Dept of Medicine, Massachusetts General Hospital, Boston, MA
| | - Minxian Wang
- Ctr for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA
- CAS Key Laboratory of Genome Sciences & Information, Beijing Inst of Genomics, Chinese Academy of Sciences & China National Ctr for Bioinformation, Beijing, China
| | - Mark Chaffin
- Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA
| | - Connor A. Emdin
- Ctr for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Dept of Medicine, Harvard Medical School, Boston, MA
- Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA
| | - Nilesh J. Samani
- Dept of Cardiovascular Sciences, Univ of Leicester, Leicester, UK
- NIHR Leicester Biomedical Research Ctr, Glenfield Hospital, Leicester, UK
| | - Heribert Schunkert
- Dept of Cardiology, German Heart Ctr Munich, Technical Univ of Munich, Munich, Germany
- DZHK (German Ctr for Cardiovascular Research), Partner site Munich, Munich Heart Alliance, Munich, Germany
| | - Hugh Watkins
- Division of Cardiovascular Medicine, Radcliffe Dept of Medicine, Univ of Oxford, Headington, UK
- Wellcome Trust Ctr for Human Genetics, Univ of Oxford, Oxford, UK
| | - Ruth McPherson
- Inst for Cardiogenetics, Univ of Lübeck, Lübeck, Schleswig-Holstein, Germany
- German Research Ctr for Cardiovascular Research, Partner Site Hamburg/Lübeck/Kiel & Univ Heart Center Lübeck (J.E.), Berlin, Brandenburg, Germany
- Depts of Medicine & Biochemistry, Univ of Ottawa Heart Inst, Ottawa, ON, Canada
| | - Roberto Elosua
- Cardiovascular Epidemiology & Genetics, Hospital del Mar Research Inst, Barcelona, Spain
- CIBER Enfermedades Cardiovasculares, Barcelona, Spain
- Facultat de Medicina, Universitat de Vic-Central de Cataluña, Barcelona, Spain
| | - Eric Boerwinkle
- Ctr for Human Genetics & Dept. of Epidemiology, Univ of Texas Health Science Ctr School of Public Health, Houston, TX
| | - Diego Ardissino
- Cardiology, Azienda Ospedaliero-Universitaria di Parma, Univ of Parma, Parma, Italy
- Associazione per lo Studio Della Trombosi in Cardiologia, Pavia, Italy
| | - Adam S. Butterworth
- British Heart Foundation Cardiovascular Epidemiology Unit, Dept of Public Health & Primary Care, Univ of Cambridge, Cambridge, UK
- National Inst for Health Research Blood & Transplant Research Unit in Donor Health & Genomics, Univ of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus & Univ of Cambridge, Cambridge, UK
| | - Emanuele Di Angelantonio
- British Heart Foundation Cardiovascular Epidemiology Unit, Dept of Public Health & Primary Care, Univ of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus & Univ of Cambridge, Cambridge, UK
- NIHR Blood & Transplant Research Unit in Donor Health & Genomics, Univ of Cambridge, Cambridge, UK
- BHF Ctr of Research Excellence, School of Clinical Medicine, Addenbrooke’s Hospital, Univ of Cambridge, Cambridge, UK
- Health Data Science Research Ctr, Human Technopole, Milan, Italy
| | - Aliya Naheed
- Initiative for Noncommunicable Bangladesh, Diseases, Health Systems & Population Studies Division, International Ctr for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | - John Danesh
- British Heart Foundation Cardiovascular Epidemiology Unit, Dept of Public Health & Primary Care, Univ of Cambridge, Cambridge, UK
- National Inst for Health Research Blood & Transplant Research Unit in Donor Health & Genomics, Univ of Cambridge, Cambridge, UK
- British Heart Foundation Ctr of Research Excellence, Univ of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus & Univ of Cambridge, Cambridge, UK
- Dept of Human Genetics, Wellcome Sanger Inst, Hinxton, UK
| | - Rajiv Chowdhury
- British Heart Foundation Cardiovascular Epidemiology Unit, Dept of Public Health & Primary Care, Univ of Cambridge, Cambridge, UK
- Centre for Non-Communicable Disease Research, Dhaka, Bangladesh
| | - Harlan M. Krumholz
- Section of Cardiovascular Medicine, Dept of Medicine, Yale Univ, New Haven, CT
- Ctr for Outcomes Research & Evaluation, Yale-New Haven Hospital, New Haven, CT
| | - Wayne H-H Sheu
- Cardiovascular Research Ctr, Dept of Medicine, National Yang Ming Univ School of Medicine, Taipei, Taiwan
| | - Stephen S. Rich
- Ctr for Public Health Genomics, Univ of Virginia, Charlottesville, VA
| | - Jerome I. Rotter
- The Inst for Translational Genomics & Population Sciences, Dept of Pediatrics, The Lundquist Inst for Biomedical Innovation at Harbor-UCLA Medical Ctr, Torrance, CA
| | - Yii-der Ida Chen
- The Inst for Translational Genomics & Population Sciences, Dept of Pediatrics, The Lundquist Inst for Biomedical Innovation at Harbor-UCLA Medical Ctr, Torrance, CA
| | - Stacey Gabriel
- Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA
| | - Eric S. Lander
- Program in Medical & Population Genetics, Broad Inst of MIT & Harvard, Cambridge, MA
- Dept of Biology, MIT, Cambridge, MA
- Dept of Systems Biology, Harvard Medical School, Boston, MA
| | - Danish Saleheen
- Dept of Medicine, Columbia Univ, New York, NY
- Ctr for Non-Communicable Diseases, Karachi, Sindh, Pakistan
| | - Sekar Kathiresan
- Ctr for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Dept of Medicine, Harvard Medical School, Boston, MA
- Cardiology Division, Dept of Medicine, Massachusetts General Hospital, Boston, MA
- Verve Therapeutics, Cambridge, MA
| |
Collapse
|
24
|
Tan R, Shen Y. Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning. Nucleic Acids Res 2022; 50:e123. [PMID: 36124672 PMCID: PMC9756945 DOI: 10.1093/nar/gkac788] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/08/2022] [Accepted: 09/01/2022] [Indexed: 12/24/2022] Open
Abstract
Exome sequencing is widely used in genetic studies of human diseases and clinical genetic diagnosis. Accurate detection of copy number variants (CNVs) is important to fully utilize exome sequencing data. However, exome data are noisy. None of the existing methods alone can achieve both high precision and recall rate. A common practice is to perform heuristic filtration followed by manual inspection of read depth of putative CNVs. This approach does not scale in large studies. To address this issue, we developed a transfer learning method, CNV-espresso, for in silico confirming rare CNVs from exome sequencing data. CNV-espresso encodes candidate CNVs from exome data as images and uses pretrained convolutional neural network models to classify copy number states. We trained CNV-espresso using an offspring-parents trio exome sequencing dataset, with inherited CNVs as positives and CNVs with Mendelian errors as negatives. We evaluated the performance using additional samples that have both exome and whole-genome sequencing (WGS) data. Assuming the CNVs detected from WGS data as a proxy of ground truth, CNV-espresso significantly improves precision while keeping recall almost intact, especially for CNVs that span a small number of exons. CNV-espresso can effectively replace manual inspection of CNVs in large-scale exome sequencing studies.
Collapse
Affiliation(s)
- Renjie Tan
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY 10032, USA
| |
Collapse
|
25
|
Vervoort L, Vermeesch JR. The 22q11.2 Low Copy Repeats. Genes (Basel) 2022; 13:2101. [PMID: 36421776 PMCID: PMC9690962 DOI: 10.3390/genes13112101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/19/2022] [Accepted: 10/25/2022] [Indexed: 07/22/2023] Open
Abstract
LCR22s are among the most complex loci in the human genome and are susceptible to nonallelic homologous recombination. This can lead to a variety of genomic disorders, including deletions, duplications, and translocations, of which the 22q11.2 deletion syndrome is the most common in humans. Interrogating these phenomena is difficult due to the high complexity of the LCR22s and the inaccurate representation of the LCRs across different reference genomes. Optical mapping techniques, which provide long-range chromosomal maps, could be used to unravel the complex duplicon structure. These techniques have already uncovered the hypervariability of the LCR22-A haplotype in the human population. Although optical LCR22 mapping is a major step forward, long-read sequencing approaches will be essential to reach nucleotide resolution of the LCR22s and map the crossover sites. Accurate maps and sequences are needed to pinpoint potential predisposing alleles and, most importantly, allow for genotype-phenotype studies exploring the role of the LCR22s in health and disease. In addition, this research might provide a paradigm for the study of other rare genomic disorders.
Collapse
|
26
|
Agrawal S, Wang M, Klarqvist MDR, Smith K, Shin J, Dashti H, Diamant N, Choi SH, Jurgens SJ, Ellinor PT, Philippakis A, Claussnitzer M, Ng K, Udler MS, Batra P, Khera AV. Inherited basis of visceral, abdominal subcutaneous and gluteofemoral fat depots. Nat Commun 2022; 13:3771. [PMID: 35773277 PMCID: PMC9247093 DOI: 10.1038/s41467-022-30931-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 05/25/2022] [Indexed: 12/11/2022] Open
Abstract
For any given level of overall adiposity, individuals vary considerably in fat distribution. The inherited basis of fat distribution in the general population is not fully understood. Here, we study up to 38,965 UK Biobank participants with MRI-derived visceral (VAT), abdominal subcutaneous (ASAT), and gluteofemoral (GFAT) adipose tissue volumes. Because these fat depot volumes are highly correlated with BMI, we additionally study six local adiposity traits: VAT adjusted for BMI and height (VATadj), ASATadj, GFATadj, VAT/ASAT, VAT/GFAT, and ASAT/GFAT. We identify 250 independent common variants (39 newly-identified) associated with at least one trait, with many associations more pronounced in female participants. Rare variant association studies extend prior evidence for PDE3B as an important modulator of fat distribution. Local adiposity traits (1) highlight depot-specific genetic architecture and (2) enable construction of depot-specific polygenic scores that have divergent associations with type 2 diabetes and coronary artery disease. These results - using MRI-derived, BMI-independent measures of local adiposity - confirm fat distribution as a highly heritable trait with important implications for cardiometabolic health outcomes.
Collapse
Affiliation(s)
- Saaket Agrawal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Minxian Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | | | - Kirk Smith
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Joseph Shin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Hesam Dashti
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nathaniel Diamant
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Seung Hoan Choi
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Sean J Jurgens
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Experimental Cardiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Patrick T Ellinor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Anthony Philippakis
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Melina Claussnitzer
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Kenney Ng
- Center for Computational Health, IBM Research, Cambridge, MA, USA
| | - Miriam S Udler
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Puneet Batra
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Amit V Khera
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
- Verve Therapeutics, Cambridge, MA, USA.
| |
Collapse
|
27
|
Prodanov T, Bansal V. Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing. Nat Commun 2022; 13:3221. [PMID: 35680869 PMCID: PMC9184528 DOI: 10.1038/s41467-022-30930-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/20/2022] [Indexed: 11/09/2022] Open
Abstract
The human genome contains hundreds of low-copy repeats (LCRs) that are challenging to analyze using short-read sequencing technologies due to extensive copy number variation and ambiguity in read mapping. Copy number and sequence variants in more than 150 duplicated genes that overlap LCRs have been implicated in monogenic and complex human diseases. We describe a computational tool, Parascopy, for estimating the aggregate and paralog-specific copy number of duplicated genes using whole-genome sequencing (WGS). Parascopy is an efficient method that jointly analyzes reads mapped to different repeat copies without the need for global realignment. It leverages multiple samples to mitigate sequencing bias and to identify reliable paralogous sequence variants (PSVs) that differentiate repeat copies. Analysis of WGS data for 2504 individuals from diverse populations showed that Parascopy is robust to sequencing bias, has higher accuracy compared to existing methods and enables prioritization of pathogenic copy number changes in duplicated genes.
Collapse
Affiliation(s)
- Timofey Prodanov
- Bioinformatics and Systems Biology Graduate Program, University of California, La Jolla, San Diego, CA, 92093, USA
| | - Vikas Bansal
- Department of Pediatrics, School of Medicine, University of California, La Jolla, San Diego, CA, 92093, USA.
| |
Collapse
|
28
|
Sagath L, Lehtokari VL, Wallgren-Pettersson C, Pelin K, Kiiski K. A custom ddPCR method for the detection of copy number variations in the nebulin triplicate region. PLoS One 2022; 17:e0267793. [PMID: 35576196 PMCID: PMC9109913 DOI: 10.1371/journal.pone.0267793] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 04/15/2022] [Indexed: 11/21/2022] Open
Abstract
The human genome contains repetitive regions, such as segmental duplications, known to be prone to copy number variation. Segmental duplications are highly identical and homologous sequences, posing a specific challenge for most mutation detection methods. The giant nebulin gene is expressed in skeletal muscle. It harbors a large segmental duplication region composed of eight exons repeated three times, the so-called triplicate region. Mutations in nebulin are known to cause nemaline myopathy and other congenital myopathies. Using our custom targeted Comparative Genomic Hybridization arrays, we have previously shown that copy number variations in the nebulin triplicate region are pathogenic when the copy number of the segmental duplication block deviates two or more copies from the normal number, which is three per allele. To complement our Comparative Genomic Hybridization arrays, we have established a custom Droplet Digital PCR method for the detection of copy number variations within the nebulin triplicate region. The custom Droplet Digital PCR assays allow sensitive, rapid, high-throughput, and cost-effective detection of copy number variations within this region and is ready for implementation a screening method for disease-causing copy number variations of the nebulin triplicate region. We suggest that Droplet Digital PCR may also be used in the study and diagnostics of other segmental duplication regions of the genome.
Collapse
Affiliation(s)
- Lydia Sagath
- Folkhälsan Research Center, Helsinki, Finland
- Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
- * E-mail: , (LS); (KK)
| | - Vilma-Lotta Lehtokari
- Folkhälsan Research Center, Helsinki, Finland
- Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Carina Wallgren-Pettersson
- Folkhälsan Research Center, Helsinki, Finland
- Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Katarina Pelin
- Folkhälsan Research Center, Helsinki, Finland
- Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
- Molecular and Integrative Biosciences Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
| | - Kirsi Kiiski
- Folkhälsan Research Center, Helsinki, Finland
- Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
- * E-mail: , (LS); (KK)
| |
Collapse
|
29
|
Array Comparative Genomic Hybridisation and Droplet Digital PCR Uncover Recurrent Copy Number Variation of the TTN Segmental Duplication Region. Genes (Basel) 2022; 13:genes13050905. [PMID: 35627290 PMCID: PMC9142044 DOI: 10.3390/genes13050905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 05/13/2022] [Accepted: 05/17/2022] [Indexed: 02/04/2023] Open
Abstract
Intragenic segmental duplication regions are potential hotspots for recurrent copy number variation and possible pathogenic aberrations. Two large sarcomeric genes, nebulin and titin, both contain such segmental duplication regions. Using our custom Comparative Genomic Hybridisation array, we have previously shown that a gain or loss of more than one copy of the repeated block of the nebulin triplicate region constitutes a recessive pathogenic mutation. Using targeted array-CGH, similar copy number variants can be detected in the segmental duplication region of titin. Due to the limitations of the array-CGH methodology and the repetitiveness of the region, the exact copy numbers of the blocks could not be determined. Therefore, we developed complementary custom Droplet Digital PCR assays for the titin segmental duplication region to confirm true variation. Our combined methods show that the titin segmental duplication region is subject to recurrent copy number variation. Gains and losses were detected in samples from healthy individuals as well as in samples from patients with different muscle disorders. The copy number variation observed in our cohort is likely benign, but pathogenic copy number variants in the segmental duplication region of titin cannot be excluded. Further investigations are needed, however, this region should no longer be neglected in genetic analyses.
Collapse
|
30
|
Damert A. SVA retrotransposons and a low copy repeat in humans and great apes: a mobile connection. Mol Biol Evol 2022; 39:6586216. [PMID: 35574660 PMCID: PMC9132208 DOI: 10.1093/molbev/msac103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Segmental duplications (SDs) constitute a considerable fraction of primate genomes. They contribute to genetic variation and provide raw material for evolution. Groups of SDs are characterized by the presence of shared core duplicons. One of these core duplicons, low copy repeat (lcr)16a, has been shown to be particularly active in the propagation of interspersed SDs in primates. The underlying mechanisms are, however, only partially understood. Alu short interspersed elements (SINEs) are frequently found at breakpoints and have been implicated in the expansion of SDs. Detailed analysis of lcr16a-containing SDs shows that the hominid-specific SVA (SINE-R-VNTR-Alu) retrotransposon is an integral component of the core duplicon in Asian and African great apes. In orang-utan, it provides breakpoints and contributes to both interchromosomal and intrachromosomal lcr16a mobility by inter-element recombination. Furthermore, the data suggest that in hominines (human, chimpanzee, gorilla) SVA recombination-mediated integration of a circular intermediate is the founding event of a lineage-specific lcr16a expansion. One of the hominine lcr16a copies displays large flanking direct repeats, a structural feature shared by other SDs in the human genome. Taken together, the results obtained extend the range of SVAs’ contribution to genome evolution from RNA-mediated transduction to DNA-based recombination. In addition, they provide further support for a role of circular intermediates in SD mobilization.
Collapse
Affiliation(s)
- Annette Damert
- Infection Biology Unit and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
| |
Collapse
|
31
|
Olson ND, Wagner J, McDaniel J, Stephens SH, Westreich ST, Prasanna AG, Johanson E, Boja E, Maier EJ, Serang O, Jáspez D, Lorenzo-Salazar JM, Muñoz-Barrera A, Rubio-Rodríguez LA, Flores C, Kyriakidis K, Malousi A, Shafin K, Pesout T, Jain M, Paten B, Chang PC, Kolesnikov A, Nattestad M, Baid G, Goel S, Yang H, Carroll A, Eveleigh R, Bourgey M, Bourque G, Li G, Ma C, Tang L, Du Y, Zhang S, Morata J, Tonda R, Parra G, Trotta JR, Brueffer C, Demirkaya-Budak S, Kabakci-Zorlu D, Turgut D, Kalay Ö, Budak G, Narcı K, Arslan E, Brown R, Johnson IJ, Dolgoborodov A, Semenyuk V, Jain A, Tetikol HS, Jain V, Ruehle M, Lajoie B, Roddey C, Catreux S, Mehio R, Ahsan MU, Liu Q, Wang K, Ebrahim Sahraeian SM, Fang LT, Mohiyuddin M, Hung C, Jain C, Feng H, Li Z, Chen L, Sedlazeck FJ, Zook JM. PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. CELL GENOMICS 2022; 2:S2666-979X(22)00058-1. [PMID: 35720974 PMCID: PMC9205427 DOI: 10.1016/j.xgen.2022.100129] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 11/01/2021] [Accepted: 04/08/2022] [Indexed: 11/19/2022]
Abstract
The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
Collapse
Affiliation(s)
- Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | | | | | | | - Elaine Johanson
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Emily Boja
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Ezekiel J. Maier
- Booz Allen Hamilton, 8283 Greensboro Drive, Mclean, VA 22102, USA
| | - Omar Serang
- DNAnexus, Inc., 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
- Research Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
- Instituto de Tecnologías Biomédicas (ITB), Universidad de La Laguna, 38200 San Cristóbal de La Laguna, Spain
| | - Konstantinos Kyriakidis
- School of Pharmacy, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
| | - Andigoni Malousi
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
- Laboratory of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | | | - Maria Nattestad
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Gunjan Baid
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Sidharth Goel
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Howard Yang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Robert Eveleigh
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Mathieu Bourgey
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Guillaume Bourque
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Gen Li
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ChouXian Ma
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - LinQi Tang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - YuanPing Du
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ShaoWei Zhang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - Jordi Morata
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Raúl Tonda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Genís Parra
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jean-Rémi Trotta
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Christian Brueffer
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | | | | | - Deniz Turgut
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Özem Kalay
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Gungor Budak
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Kübra Narcı
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Elif Arslan
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | - Amit Jain
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | | | | | | | - Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA 95050, USA
| | | | | | - Chirag Jain
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| |
Collapse
|
32
|
Vollger MR, Guitart X, Dishuck PC, Mercuri L, Harvey WT, Gershman A, Diekhans M, Sulovari A, Munson KM, Lewis AP, Hoekzema K, Porubsky D, Li R, Nurk S, Koren S, Miga KH, Phillippy AM, Timp W, Ventura M, Eichler EE. Segmental duplications and their variation in a complete human genome. Science 2022; 376:eabj6965. [PMID: 35357917 PMCID: PMC8979283 DOI: 10.1126/science.abj6965] [Citation(s) in RCA: 104] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.
Collapse
Affiliation(s)
- Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ludovica Mercuri
- Department of Biology, University of Bari, Aldo Moro, Bari 70125, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Aldo Moro, Bari 70125, Italy
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
33
|
Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, Avdeyev P, Taylor DJ, Shafin K, Shumate A, Xiao C, Wagner J, McDaniel J, Olson ND, Sauria MEG, Vollger MR, Rhie A, Meredith M, Martin S, Lee J, Koren S, Rosenfeld JA, Paten B, Layer R, Chin CS, Sedlazeck FJ, Hansen NF, Miller DE, Phillippy AM, Miga KH, McCoy RC, Dennis MY, Zook JM, Schatz MC. A complete reference genome improves analysis of human genetic variation. Science 2022; 376:eabl3533. [PMID: 35357935 DOI: 10.1126/science.abl3533] [Citation(s) in RCA: 121] [Impact Index Per Article: 60.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Stephanie M Yan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela C Soto
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, CA, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel Avdeyev
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Justin Wagner
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Jennifer McDaniel
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D Olson
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Melissa Meredith
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Skylar Martin
- Department of Computer Science and Biofrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | - Sergey Koren
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Ryan Layer
- Department of Computer Science and Biofrontiers Institute, University of Colorado, Boulder, CO, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Nancy F Hansen
- Comparative Genomics Analysis Unit, National Human Genome Research Institute, Rockville, MD, USA
| | - Danny E Miller
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.,Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y Dennis
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, CA, USA
| | - Justin M Zook
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Department of Biology, Johns Hopkins University, Baltimore, MD, USA.,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
34
|
Išerić H, Alkan C, Hach F, Numanagić I. Fast characterization of segmental duplication structure in multiple genome assemblies. Algorithms Mol Biol 2022; 17:4. [PMID: 35303886 PMCID: PMC8932185 DOI: 10.1186/s13015-022-00210-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 02/08/2022] [Indexed: 11/29/2022] Open
Abstract
Motivation The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time local alignment algorithms that are impractical due to the size of most genomes. Additionally, to perform evolutionary analysis, one needs to characterize SDs in multiple genomes and find relations between those SDs and unique (non-duplicated) segments in other genomes. A naïve approach consisting of multiple sequence alignment would make the optimal solution to this problem even more impractical. Thus there is a need for fast and accurate algorithms to characterize SD structure in multiple genome assemblies to better understand the evolutionary forces that shaped the genomes of today. Results Here we introduce a new approach, BISER, to quickly detect SDs in multiple genomes and identify elementary SDs and core duplicons that drive the formation of such SDs. BISER improves earlier tools by (i) scaling the detection of SDs with low homology to multiple genomes while introducing further 7–33\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\times$$\end{document}× speed-ups over the existing tools, and by (ii) characterizing elementary SDs and detecting core duplicons to help trace the evolutionary history of duplications to as far as 300 million years. Availability and implementation BISER is implemented in Seq programming language and is publicly available at https://github.com/0xTCG/biser.
Collapse
|
35
|
Karaoglanoglu F, Chauve C, Hach F. Genion, an accurate tool to detect gene fusion from long transcriptomics reads. BMC Genomics 2022; 23:129. [PMID: 35164688 PMCID: PMC8842519 DOI: 10.1186/s12864-022-08339-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Accepted: 01/27/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The advent of next-generation sequencing technologies empowered a wide variety of transcriptomics studies. A widely studied topic is gene fusion which is observed in many cancer types and suspected of having oncogenic properties. Gene fusions are the result of structural genomic events that bring two genes closely located and result in a fused transcript. This is different from fusion transcripts created during or after the transcription process. These chimeric transcripts are also known as read-through and trans-splicing transcripts. Gene fusion discovery with short reads is a well-studied problem, and many methods have been developed. But the sensitivity of these methods is limited by the technology, especially the short read length. Advances in long-read sequencing technologies allow the generation of long transcriptomics reads at a low cost. Transcriptomic long-read sequencing presents unique opportunities to overcome the shortcomings of short-read technologies for gene fusion detection while introducing new challenges. RESULTS We present Genion, a sensitive and fast gene fusion detection method that can also detect read-through events. We compare Genion against a recently introduced long-read gene fusion discovery method, LongGF, both on simulated and real datasets. On simulated data, Genion accurately identifies the gene fusions and its clustering accuracy for detecting fusion reads is better than LongGF. Furthermore, our results on the breast cancer cell line MCF-7 show that Genion correctly identifies all the experimentally validated gene fusions. CONCLUSIONS Genion is an accurate gene fusion caller. Genion is implemented in C++ and is available at https://github.com/vpc-ccg/genion .
Collapse
Affiliation(s)
- Fatih Karaoglanoglu
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada.,Vancouver Prostate Centre, Vancouver, BC, Canada
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada.
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC, Canada. .,Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
36
|
Dong Z, Wang Y, Yin D, Hang X, Pu L, Zhang J, Geng J, Chang L. Advanced techniques for gene heterogeneity research: Single‐cell sequencing and on‐chip gene analysis systems. VIEW 2022. [DOI: 10.1002/viw.20210011] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Affiliation(s)
- Zaizai Dong
- Key Laboratory of Biomechanics and Mechanobiology, Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering Beihang University Beijing China
| | - Yu Wang
- Department of Laboratory Medicine State Key Laboratory of Biotherapy and Cancer Center West China Hospital Sichuan University/Collaborative Innovation Center Chengdu China
| | - Dedong Yin
- Key Laboratory of Biomechanics and Mechanobiology, Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering Beihang University Beijing China
| | - Xinxin Hang
- Key Laboratory of Biomechanics and Mechanobiology, Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering Beihang University Beijing China
| | - Lei Pu
- Department of Laboratory Medicine State Key Laboratory of Biotherapy and Cancer Center West China Hospital Sichuan University/Collaborative Innovation Center Chengdu China
| | - Jianfu Zhang
- Department of Laboratory Medicine State Key Laboratory of Biotherapy and Cancer Center West China Hospital Sichuan University/Collaborative Innovation Center Chengdu China
| | - Jia Geng
- Department of Laboratory Medicine State Key Laboratory of Biotherapy and Cancer Center West China Hospital Sichuan University/Collaborative Innovation Center Chengdu China
| | - Lingqian Chang
- Key Laboratory of Biomechanics and Mechanobiology, Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering Beihang University Beijing China
| |
Collapse
|
37
|
Mirzaei G, Petreaca RC. Distribution of copy number variations and rearrangement endpoints in human cancers with a review of literature. Mutat Res 2022; 824:111773. [PMID: 35091282 DOI: 10.1016/j.mrfmmm.2021.111773] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 12/08/2021] [Accepted: 12/10/2021] [Indexed: 12/13/2022]
Abstract
Copy number variations (CNVs) which include deletions, duplications, inversions, translocations, and other forms of chromosomal re-arrangements are common to human cancers. In this report we investigated the pattern of these variations with the goal of understanding whether there exist specific cancer signatures. We used re-arrangement endpoint data deposited on the Catalogue of Somatic Mutations in Cancers (COSMIC) for our analysis. Indeed, we find that human cancers are characterized by specific patterns of chromosome rearrangements endpoints which in turn result in cancer specific CNVs. A review of the literature reveals tissue specific mutations which either drive these CNVs or appear as a consequence of CNVs because they confer an advantage to the cancer cell. We also identify several rearrangement endpoints hotspots that were not previously reported. Our analysis suggests that in addition to local chromosomal architecture, CNVs are driven by the internal cellular or nuclear physiology of each cancer tissue.
Collapse
Affiliation(s)
- Golrokh Mirzaei
- Department of Computer Science and Engineering, The Ohio State University at Marion, Marion, OH, 43302, USA
| | - Ruben C Petreaca
- Department of Molecular Genetics, The Ohio State University at Marion, Marion, OH, 43302, USA; Cancer Biology Program, The Ohio State University James Comprehensive Cancer Center, Columbus, OH, 43210, USA.
| |
Collapse
|
38
|
Quantitative assessment reveals the dominance of duplicated sequences in germline-derived extrachromosomal circular DNA. Proc Natl Acad Sci U S A 2021; 118:2102842118. [PMID: 34789574 PMCID: PMC8617514 DOI: 10.1073/pnas.2102842118] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2021] [Indexed: 01/08/2023] Open
Abstract
Extrachromosomal circular DNA (eccDNA) plays a role in human diseases such as cancer, but little is known about the impact of eccDNA in healthy human biology. Since eccDNA is a tiny fraction of nuclear DNA, artificial amplification has been employed to increase eccDNA amounts, resulting in the loss of native compositions. We developed an approach to enrich eccDNA populations at the native state (naïve small circular DNA, nscDNA) and investigated their origins in the human genome. We found that, in human sperm, the vast majority of nscDNA came from high-copy genomic regions, including the most variable regions between individuals. Because eccDNA can be incorporated back into chromosomes, eccDNA may promote human genetic variation. Extrachromosomal circular DNA (eccDNA) originates from linear chromosomal DNA in various human tissues under physiological and disease conditions. The genomic origins of eccDNA have largely been investigated using in vitro–amplified DNA. However, in vitro amplification obscures quantitative information by skewing the total population stoichiometry. In addition, the analyses have focused on eccDNA stemming from single-copy genomic regions, leaving eccDNA from multicopy regions unexamined. To address these issues, we isolated eccDNA without in vitro amplification (naïve small circular DNA, nscDNA) and assessed the populations quantitatively by integrated genomic, molecular, and cytogenetic approaches. nscDNA of up to tens of kilobases were successfully enriched by our approach and were predominantly derived from multicopy genomic regions including segmental duplications (SDs). SDs, which account for 5% of the human genome and are hotspots for copy number variations, were significantly overrepresented in sperm nscDNA, with three times more sequencing reads derived from SDs than from the entire single-copy regions. SDs were also overrepresented in mouse sperm nscDNA, which we estimated to comprise 0.2% of nuclear DNA. Considering that eccDNA can be integrated into chromosomes, germline-derived nscDNA may be a mediator of genome diversity.
Collapse
|
39
|
Li K, Jiang W, Hui Y, Kong M, Feng LY, Gao LZ, Li P, Lu S. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution. MOLECULAR PLANT 2021; 14:1745-1756. [PMID: 34171481 DOI: 10.1016/j.molp.2021.06.017] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/18/2021] [Accepted: 06/22/2021] [Indexed: 05/04/2023]
Abstract
The ultimate goal of genome assembly is a high-accuracy gapless genome. Here, we report a new assembly pipeline that is used to produce a gapless genome for the indica rice cultivar Minghui 63. The resulting 397.71-Mb final assembly is composed of 12 contigs with a contig N50 size of 31.93 Mb. Each chromosome is represented by a single contig and the genomic sequences of all chromosomes are gapless. Quality evaluation of this gapless genome assembly showed that gene regions in our assembly have the highest completeness compared with the other 15 reported high-quality rice genomes. Further comparison with the japonica rice genome revealed that the gapless indica genome assembly contains more transposable elements (TEs) and segmental duplications (SDs), the latter of which produce many duplicated genes that can affect agronomic traits through dose effect or sub-/neo-functionalization. The insertion of TEs can also affect the expression of duplicated genes, which may drive the evolution of these genes. Furthermore, we found the expansion of nucleotide-binding site with leucine-rich repeat disease-resistance genes and cis-zeatin-O-glucosyltransferase growth-related genes in SDs in the gapless indica genome assembly, suggesting that SDs contribute to the adaptive evolution of rice disease resistance and developmental processes. Collectively, our findings suggest that active TEs and SDs synergistically contribute to rice genome evolution.
Collapse
Affiliation(s)
- Kui Li
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Wenkai Jiang
- Novogene Bioinformatics Institute, Building 301, Zone A10 Jiuxianqiao North Road, Chaoyang District, Beijing 100083, China
| | - Yuanyuan Hui
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Mengjuan Kong
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Li-Ying Feng
- Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou 510642, China
| | - Li-Zhi Gao
- Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou 510642, China.
| | - Pengfu Li
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China.
| | - Shan Lu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China; Shenzhen Research Institute of Nanjing University, Shenzhen 518000, China.
| |
Collapse
|
40
|
Belyeu JR, Brand H, Wang H, Zhao X, Pedersen BS, Feusier J, Gupta M, Nicholas TJ, Brown J, Baird L, Devlin B, Sanders SJ, Jorde LB, Talkowski ME, Quinlan AR. De novo structural mutation rates and gamete-of-origin biases revealed through genome sequencing of 2,396 families. Am J Hum Genet 2021; 108:597-607. [PMID: 33675682 PMCID: PMC8059337 DOI: 10.1016/j.ajhg.2021.02.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 02/12/2021] [Indexed: 01/05/2023] Open
Abstract
Each human genome includes de novo mutations that arose during gametogenesis. While these germline mutations represent a fundamental source of new genetic diversity, they can also create deleterious alleles that impact fitness. Whereas the rate and patterns of point mutations in the human germline are now well understood, far less is known about the frequency and features that impact de novo structural variants (dnSVs). We report a family-based study of germline mutations among 9,599 human genomes from 33 multigenerational CEPH-Utah families and 2,384 families from the Simons Foundation Autism Research Initiative. We find that de novo structural mutations detected by alignment-based, short-read WGS occur at an overall rate of at least 0.160 events per genome in unaffected individuals, and we observe a significantly higher rate (0.206 per genome) in ASD-affected individuals. In both probands and unaffected samples, nearly 73% of de novo structural mutations arose in paternal gametes, and we predict most de novo structural mutations to be caused by mutational mechanisms that do not require sequence homology. After multiple testing correction, we did not observe a statistically significant correlation between parental age and the rate of de novo structural variation in offspring. These results highlight that a spectrum of mutational mechanisms contribute to germline structural mutations and that these mechanisms most likely have markedly different rates and selective pressures than those leading to point mutations.
Collapse
Affiliation(s)
- Jonathan R Belyeu
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA
| | - Harold Wang
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Julie Feusier
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA
| | - Meenal Gupta
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Thomas J Nicholas
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Joseph Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Lisa Baird
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Stephan J Sanders
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA; Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA.
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA; Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA.
| |
Collapse
|
41
|
Cantsilieris S, Sunkin SM, Johnson ME, Anaclerio F, Huddleston J, Baker C, Dougherty ML, Underwood JG, Sulovari A, Hsieh P, Mao Y, Catacchio CR, Malig M, Welch AE, Sorensen M, Munson KM, Jiang W, Girirajan S, Ventura M, Lamb BT, Conlon RA, Eichler EE. An evolutionary driver of interspersed segmental duplications in primates. Genome Biol 2020; 21:202. [PMID: 32778141 PMCID: PMC7419210 DOI: 10.1186/s13059-020-02074-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 06/08/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP). RESULTS Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis. CONCLUSIONS LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.
Collapse
Affiliation(s)
- Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Present Address: Centre for Eye Research Australia, Department of Surgery (Ophthalmology), University of Melbourne, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, 3002, Australia
| | | | - Matthew E Johnson
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Fabio Anaclerio
- Department of Biology-Genetics, University of Bari, Bari, Italy
| | - John Huddleston
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, 98195, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Max L Dougherty
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Jason G Underwood
- Pacific Biosciences (PacBio) of California, Incorporated, Menlo Park, CA, 94025, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | | | - Maika Malig
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Present Address: Department of Molecular and Cellular Biology, University of California, Davis, CA, 95616, USA
- Present Address: Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, 95616, USA
| | - AnneMarie E Welch
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Present Address: Brain and Mitochondrial Research, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, VIC, Australia
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Weihong Jiang
- Case Transgenic and Targeting Facility, Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Department of Anthropology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Mario Ventura
- Department of Biology-Genetics, University of Bari, Bari, Italy
| | - Bruce T Lamb
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Ronald A Conlon
- Case Transgenic and Targeting Facility, Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
- Howard Hughes Medical Institute, University of Washington School of Medicine, 3720 15th Ave NE, S413C, Box 355065, Seattle, WA, 98195-5065, USA.
| |
Collapse
|
42
|
Shajii A, Numanagić I, Baghdadi R, Berger B, Amarasinghe S. Seq: A High-Performance Language for Bioinformatics. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES 2019; 3:125. [PMID: 35775031 PMCID: PMC9241673 DOI: 10.1145/3360551] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100-a factor of over 106-and the amount of data to be analyzed has increased proportionally. Yet, as Moore's Law continues to slow, computational biologists can no longer rely on computing hardware to compensate for the ever-increasing size of biological datasets. In a field where many researchers are primarily focused on biological analysis over computational optimization, the unfortunate solution to this problem is often to simply buy larger and faster machines. Here, we introduce Seq, the first language tailored specifically to bioinformatics, which marries the ease and productivity of Python with C-like performance. Seq starts with a subset of Python-and is in many cases a drop-in replacement-yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. We evaluated Seq on several standard computational genomics tasks like reverse complementation, k-mer manipulation, sequence pattern matching and large genomic index queries. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 160× improvement once domain-specific language features and optimizations are used. With parallelism, we demonstrate up to a 650× improvement. Compared to optimized C++ code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2× improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.
Collapse
Affiliation(s)
- Ariya Shajii
- MIT CSAIL, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | | | | | - Bonnie Berger
- MIT CSAIL, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | | |
Collapse
|
43
|
Parks MM, Raphael BJ, Lawrence CE. Using controls to limit false discovery in the era of big data. BMC Bioinformatics 2018; 19:323. [PMID: 30217148 PMCID: PMC6137876 DOI: 10.1186/s12859-018-2356-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 09/03/2018] [Indexed: 12/04/2022] Open
Abstract
Background Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserved tails of the null distribution. Both of these intermediate steps are challenging and can compromise the reliability of the results. Results We present a general method for controlling the FDR that capitalizes on the large amount of control data often found in big data studies to avoid these frequently problematic intermediate steps. The method utilizes control data to empirically construct the distribution of the test statistic under the null hypothesis and directly compares this distribution to the empirical distribution of the test data. By not relying on p-values, our control data-based empirical FDR procedure more closely follows the foundational principles of the scientific method: that inference is drawn by comparing test data to control data. The method is demonstrated through application to a problem in structural genomics. Conclusions The method described here provides a general statistical framework for controlling the FDR that is specifically tailored for the big data setting. By relying on empirically constructed distributions and control data, it forgoes potentially problematic modeling steps and extrapolation into the unknown tails of the null distribution. This procedure is broadly applicable insofar as controlled experiments or internal negative controls are available, as is increasingly common in the big data setting. Electronic supplementary material The online version of this article (10.1186/s12859-018-2356-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Matthew M Parks
- Department of Physiology and Biophysics, Weill Cornell Medicine, 1300 York Ave, New York, NY, 10065, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, 08540, USA
| | - Charles E Lawrence
- Center for Computational Molecular Biology, Brown University, 115 Waterman Street, Providence, RI, 02912, USA. .,Division of Applied Mathematics, Brown University, 182 George Street, Providence, RI, 02912, USA.
| |
Collapse
|
44
|
Nguyen HT, Boocock J, Merriman TR, Black MA. SRBreak: A Read-Depth and Split-Read Framework to Identify Breakpoints of Different Events Inside Simple Copy-Number Variable Regions. Front Genet 2016; 7:160. [PMID: 27695476 PMCID: PMC5023681 DOI: 10.3389/fgene.2016.00160] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Accepted: 08/24/2016] [Indexed: 12/28/2022] Open
Abstract
Copy-number variation (CNV) has been associated with increased risk of complex diseases. High-throughput sequencing (HTS) technologies facilitate the detection of copy-number variable regions (CNVRs) and their breakpoints. This helps in understanding genome structure as well as their evolution process. Various approaches have been proposed for detecting CNV breakpoints, but currently it is still challenging for tools based on a single analysis method to identify breakpoints of CNVs. It has been shown, however, that pipelines which integrate multiple approaches are able to report more reliable breakpoints. Here, based on HTS data, we have developed a pipeline to identify approximate breakpoints (±10 bp) relating to different ancestral events within a specific CNVR. The pipeline combines read-depth and split-read information to infer breakpoints, using information from multiple samples to allow an imputation approach to be taken. The main steps involve using a normal mixture model to cluster samples into different groups, followed by simple kernel-based approaches to maximize information obtained from read-depth and split-read approaches, after which common breakpoints of groups are inferred. The pipeline uses split-read information directly from CIGAR strings of BAM files, without using a re-alignment step. On simulated data sets, it was able to report breakpoints for very low-coverage samples including those for which only single-end reads were available. When applied to three loci from existing human resequencing data sets (NEGR1, LCE3, IRGM) the pipeline obtained good concordance with results from the 1000 Genomes Project (92, 100, and 82%, respectively). The package is available at https://github.com/hoangtn/SRBreak, and also as a docker-based application at https://registry.hub.docker.com/u/hoangtn/srbreak/.
Collapse
Affiliation(s)
- Hoang T Nguyen
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand; Department of Psychiatry, Mount Sinai School of Medicine, New YorkNY, USA; Department of Mathematics, Cao Thang College of TechnologyHo Chi Minh City, Vietnam
| | - James Boocock
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand; Department of Psychiatry, Mount Sinai School of Medicine, New YorkNY, USA
| | - Tony R Merriman
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand
| | - Michael A Black
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand
| |
Collapse
|
45
|
Integrated view of genome structure and sequence of a single DNA molecule in a nanofluidic device. Proc Natl Acad Sci U S A 2013; 110:4893-8. [PMID: 23479649 DOI: 10.1073/pnas.1214570110] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We show how a bird's-eye view of genomic structure can be obtained at ∼1-kb resolution from long (∼2 Mb) DNA molecules extracted from whole chromosomes in a nanofluidic laboratory-on-a-chip. We use an improved single-molecule denaturation mapping approach to detect repetitive elements and known as well as unique structural variation. Following its mapping, a molecule of interest was rescued from the chip; amplified and localized to a chromosome by FISH; and interrogated down to 1-bp resolution with a commercial sequencer, thereby reconciling haplotype-phased chromosome substructure with sequence.
Collapse
|
46
|
Abstract
We report the genome sequence of melon, an important horticultural crop worldwide. We assembled 375 Mb of the double-haploid line DHL92, representing 83.3% of the estimated melon genome. We predicted 27,427 protein-coding genes, which we analyzed by reconstructing 22,218 phylogenetic trees, allowing mapping of the orthology and paralogy relationships of sequenced plant genomes. We observed the absence of recent whole-genome duplications in the melon lineage since the ancient eudicot triplication, and our data suggest that transposon amplification may in part explain the increased size of the melon genome compared with the close relative cucumber. A low number of nucleotide-binding site-leucine-rich repeat disease resistance genes were annotated, suggesting the existence of specific defense mechanisms in this species. The DHL92 genome was compared with that of its parental lines allowing the quantification of sequence variability in the species. The use of the genome sequence in future investigations will facilitate the understanding of evolution of cucurbits and the improvement of breeding strategies.
Collapse
|
47
|
Kukita Y, Yahara K, Tahira T, Higasa K, Sonoda M, Yamamoto K, Kato K, Wake N, Hayashi K. A definitive haplotype map as determined by genotyping duplicated haploid genomes finds a predominant haplotype preference at copy-number variation events. Am J Hum Genet 2010; 86:918-28. [PMID: 20537301 DOI: 10.1016/j.ajhg.2010.05.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Revised: 04/13/2010] [Accepted: 05/07/2010] [Indexed: 10/19/2022] Open
Abstract
The majority of complete hydatidiform moles (CHMs) harbor duplicated haploid genomes that originate from sperm. This makes CHMs more advantageous than conventional diploid cells for determining haplotypes of SNPs and copy-number variations (CNVs), because all of the genetic variants in a CHM genome are homozygous. Here we report SNP and CNV haplotype structures determined by analysis of 100 CHMs from Japanese subjects via high-density DNA arrays. The obtained haplotype map should be useful as a reference for the haplotype structure of Asian populations. We resolved common CNV regions (merged CNV segments across the examined samples) into CNV events (clusters of CNV segments) on the basis of mutual overlap and found that the haplotype backgrounds of different CNV events within the same CNV region were predominantly similar, perhaps because of inherent structural instability.
Collapse
|
48
|
Evolution in health and medicine Sackler colloquium: Genomic disorders: a window into human gene and genome evolution. Proc Natl Acad Sci U S A 2010; 107 Suppl 1:1765-71. [PMID: 20080665 DOI: 10.1073/pnas.0906222107] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Gene duplications alter the genetic constitution of organisms and can be a driving force of molecular evolution in humans and the great apes. In this context, the study of genomic disorders has uncovered the essential role played by the genomic architecture, especially low copy repeats (LCRs) or segmental duplications (SDs). In fact, regardless of the mechanism, LCRs can mediate or stimulate rearrangements, inciting genomic instability and generating dynamic and unstable regions prone to rapid molecular evolution. In humans, copy-number variation (CNV) has been implicated in common traits such as neuropathy, hypertension, color blindness, infertility, and behavioral traits including autism and schizophrenia, as well as disease susceptibility to HIV, lupus nephritis, and psoriasis among many other clinical phenotypes. The same mechanisms implicated in the origin of genomic disorders may also play a role in the emergence of segmental duplications and the evolution of new genes by means of genomic and gene duplication and triplication, exon shuffling, exon accretion, and fusion/fission events.
Collapse
|
49
|
Ponjavic J, Oliver PL, Lunter G, Ponting CP. Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain. PLoS Genet 2009. [PMID: 19696892 DOI: 10.1371/journal/pgen.1000617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
Besides protein-coding mRNAs, eukaryotic transcriptomes include many long non-protein-coding RNAs (ncRNAs) of unknown function that are transcribed away from protein-coding loci. Here, we have identified 659 intergenic long ncRNAs whose genomic sequences individually exhibit evolutionary constraint, a hallmark of functionality. Of this set, those expressed in the brain are more frequently conserved and are significantly enriched with predicted RNA secondary structures. Furthermore, brain-expressed long ncRNAs are preferentially located adjacent to protein-coding genes that are (1) also expressed in the brain and (2) involved in transcriptional regulation or in nervous system development. This led us to the hypothesis that spatiotemporal co-expression of ncRNAs and nearby protein-coding genes represents a general phenomenon, a prediction that was confirmed subsequently by in situ hybridisation in developing and adult mouse brain. We provide the full set of constrained long ncRNAs as an important experimental resource and present, for the first time, substantive and predictive criteria for prioritising long ncRNA and mRNA transcript pairs when investigating their biological functions and contributions to development and disease.
Collapse
Affiliation(s)
- Jasmina Ponjavic
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | | | | | | |
Collapse
|
50
|
Paterson AH, Bowers JE, Feltus FA, Tang H, Lin L, Wang X. Comparative genomics of grasses promises a bountiful harvest. PLANT PHYSIOLOGY 2009; 149:125-31. [PMID: 19126703 PMCID: PMC2613718 DOI: 10.1104/pp.108.129262] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2008] [Accepted: 11/05/2008] [Indexed: 05/18/2023]
Affiliation(s)
- Andrew H Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia 30602, USA.
| | | | | | | | | | | |
Collapse
|