1
|
Luo M, Chen Y, Yin X, Li J. Cribriform-morular Thyroid Carcinoma Arising in a Medulloblastoma Survivor: Two Metachronous Tumors Shared with the Activation of the Wnt Signaling Pathway. Int J Surg Pathol 2024; 32:1531-1536. [PMID: 38377966 DOI: 10.1177/10668969241231971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Wnt signaling pathway activation is involved in the pathogenesis of a series of malignant tumors and is characterized by the nuclear accumulation of β-catenin protein. The occurrence of two or more Wnt pathway-associated tumors in a single individual is uncommon and generally attributed to inherited cancer syndrome, especially familial adenomatous polyposis (FAP). Herein, we presented a rare case of a child who suffered from the occurrence of Wnt-activated medulloblastoma and cribriform-morular thyroid carcinoma (CMTC) within a 9-year interval. She had no history of FAP and harbored an unexpected somatic mutation of the APC gene in the CMTC tumor. The potential agents involved in the pathogenesis of the two molecular-linked tumors other than FAP were discussed in this report.
Collapse
Affiliation(s)
- Minghua Luo
- Department of Pathology, Peking University Shenzhen Hospital, Shenzhen, Guangdong Province, China
| | - Yaoli Chen
- Department of Pathology, Peking University Shenzhen Hospital, Shenzhen, Guangdong Province, China
| | - Xiaomin Yin
- Department of Pathology, Peking University Shenzhen Hospital, Shenzhen, Guangdong Province, China
| | - Jian Li
- Department of Pathology, Peking University Shenzhen Hospital, Shenzhen, Guangdong Province, China
- State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen, Guangdong Province, China
| |
Collapse
|
2
|
Qian J, Ge L, Lu C, Han X, Li M, Bian Z. LINC00665 aggravates the malignant phenotypes in chondrosarcoma cells through miR-665/FGF9 pathway. Int J Biol Macromol 2024; 280:135727. [PMID: 39293617 DOI: 10.1016/j.ijbiomac.2024.135727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 09/10/2024] [Accepted: 09/14/2024] [Indexed: 09/20/2024]
Abstract
Long non-coding RNAs (lncRNAs) have been demonstrated to participate in a variety of physiological and pathological processes, including tumor initiation and development. Nevertheless, few of them have been investigated in chondrosarcoma. Here, we were intended to unveil the role of long intergenic non-protein coding RNA 665 (LINC00665) in chondrosarcoma. RT-qPCR was adopted for gene expression detection. The biological processes in chondrosarcoma cells were detected by CCK-8, EdU, TUNEL, Transwell and wound healing assays. The relationships between genes in chondrosarcoma cells were evaluated by a series of mechanism experiments including RIP, luciferase reporter assays and so on.LINC00665 expressed at a high level in chondrosarcoma cell lines. LINC00665 interference suppressed cell proliferation, migration and invasion in chondrosarcoma. Besides, LINC00665 interacted with microRNA-665 (miR-665), which was then verified to be down-regulated in chondrosarcoma cells. Additionally, LINC00665 and miR-665 were mutually inhibited by each other in chondrosarcoma cells. Importantly, LINC00665 stimulated fibroblast growth factor 9 (FGF9) expression in chondrosarcoma cells via sponging miR-665. Furthermore, FGF9 participated in the regulation of LINC00665-promoted chondrosarcoma development. CONCLUSION: LINC00665 facilitates chondrosarcoma progression via miR-665/FGF9 axis, which might indicate a new path for the treatment of chondrosarcoma.
Collapse
Affiliation(s)
- Jin Qian
- Department of Orthopedics, Affiliated Hangzhou First People's Hospital, WestLake University School of Medicine, Hangzhou 310006, Zhejiang Province, China
| | - Lujie Ge
- Department of Orthopedics, Affiliated Hangzhou First People's Hospital, WestLake University School of Medicine, Hangzhou 310006, Zhejiang Province, China
| | - Congcong Lu
- Department of Orthopedics, Affiliated Hangzhou First People's Hospital, WestLake University School of Medicine, Hangzhou 310006, Zhejiang Province, China
| | - Xiao Han
- Department of Orthopedics, Affiliated Hangzhou First People's Hospital, WestLake University School of Medicine, Hangzhou 310006, Zhejiang Province, China
| | - Maoqiang Li
- Department of Orthopedics, Affiliated Hangzhou First People's Hospital, WestLake University School of Medicine, Hangzhou 310006, Zhejiang Province, China.
| | - Zhenyu Bian
- Department of Orthopedics, Affiliated Hangzhou First People's Hospital, WestLake University School of Medicine, Hangzhou 310006, Zhejiang Province, China.
| |
Collapse
|
3
|
Perez-Becerril C, Burghel GJ, Hartley C, Rowlands CF, Evans DG, Smith MJ. Improved sensitivity for detection of pathogenic variants in familial NF2-related schwannomatosis. J Med Genet 2024; 61:452-458. [PMID: 38302265 DOI: 10.1136/jmg-2023-109586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 12/07/2023] [Indexed: 02/03/2024]
Abstract
PURPOSE To determine the impact of additional genetic screening techniques on the rate of detection of pathogenic variants leading to familial NF2-related schwannomatosis. METHODS We conducted genetic screening of a cohort of 168 second-generation individuals meeting the clinical criteria for NF2-related schwannomatosis. In addition to the current clinical screening techniques, targeted next-generation sequencing (NGS) and multiplex ligation-dependent probe amplification analysis, we applied additional genetic screening techniques, including karyotype and RNA analysis. For characterisation of a complex structural variant, we also performed long-read sequencing analysis. RESULTS Additional genetic analysis resulted in increased sensitivity of detection of pathogenic variants from 87% to 95% in our second-generation NF2-related schwannomatosis cohort. A number of pathogenic variants identified through extended analysis had been previously observed after NGS analysis but had been overlooked or classified as variants of uncertain significance. CONCLUSION Our study indicates there is added value in performing additional genetic analysis for detection of pathogenic variants that are difficult to identify with current clinical genetic screening methods. In particular, RNA analysis is valuable for accurate classification of non-canonical splicing variants. Karyotype analysis and whole genome sequencing analysis are of particular value for identification of large and/or complex structural variants, with additional advantages in the use of long-read sequencing techniques.
Collapse
Affiliation(s)
- Cristina Perez-Becerril
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - George J Burghel
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - Claire Hartley
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
| | - Charles F Rowlands
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - D Gareth Evans
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - Miriam J Smith
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, The University of Manchester, Manchester, UK
| |
Collapse
|
4
|
Wu Z, Li T, Jiang Z, Zheng J, Gu Y, Liu Y, Liu Y, Xie Z. Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles. Nucleic Acids Res 2024; 52:2212-2230. [PMID: 38364871 PMCID: PMC10954445 DOI: 10.1093/nar/gkae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/18/2024] [Accepted: 01/27/2024] [Indexed: 02/18/2024] Open
Abstract
Nonreference sequences (NRSs) are DNA sequences present in global populations but absent in the current human reference genome. However, the extent and functional significance of NRSs in the human genomes and populations remains unclear. Here, we de novo assembled 539 genomes from five genetically divergent human populations using long-read sequencing technology, resulting in the identification of 5.1 million NRSs. These were merged into 45284 unique NRSs, with 29.7% being novel discoveries. Among these NRSs, 38.7% were common across the five populations, and 35.6% were population specific. The use of a graph-based pangenome approach allowed for the detection of 565 transcript expression quantitative trait loci on NRSs, with 426 of these being novel findings. Moreover, 26 NRS candidates displayed evidence of adaptive selection within human populations. Genes situated in close proximity to or intersecting with these candidates may be associated with metabolism and type 2 diabetes. Genome-wide association studies revealed 14 NRSs to be significantly associated with eight phenotypes. Additionally, 154 NRSs were found to be in strong linkage disequilibrium with 258 phenotype-associated SNPs in the GWAS catalogue. Our work expands the understanding of human NRSs and provides novel insights into their functions, facilitating evolutionary and biomedical researches.
Collapse
Affiliation(s)
- Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehang Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Jingjing Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yizhou Gu
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
- University of Wisconsin-Madison, WI, USA
| | - Yizhi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yun Liu
- MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences and Shanghai Xuhui Central Hospital, Fudan University, Shanghai, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
5
|
Liu M, Song Y, Zhang S, Yu L, Yuan Z, Yang H, Zhang M, Zhou Z, Seim I, Liu S, Fan G, Yang H. A chromosome-level genome of electric catfish ( Malapterurus electricus) provided new insights into order Siluriformes evolution. MARINE LIFE SCIENCE & TECHNOLOGY 2024; 6:1-14. [PMID: 38433969 PMCID: PMC10901758 DOI: 10.1007/s42995-023-00197-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 09/22/2023] [Indexed: 03/05/2024]
Abstract
The electric catfish (Malapterurus electricus), belonging to the family Malapteruridae, order Siluriformes (Actinopterygii: Ostariophysi), is one of the six branches that has independently evolved electrical organs. We assembled a 796.75 Mb M. electricus genome and anchored 88.72% sequences into 28 chromosomes. Gene family analysis revealed 295 expanded gene families that were enriched on functions related to glutamate receptors. Convergent evolutionary analyses of electric organs among different lineage of electric fishes further revealed that the coding gene of rho guanine nucleotide exchange factor 4-like (arhgef4), which is associated with G-protein coupled receptor (GPCR) signaling pathway, underwent adaptive parallel evolution. Gene identification suggests visual degradation in catfishes, and an important role for taste in environmental adaptation. Our findings fill in the genomic data for a branch of electric fish and provide a relevant genetic basis for the adaptive evolution of Siluriformes. Supplementary Information The online version contains supplementary material available at 10.1007/s42995-023-00197-8.
Collapse
Affiliation(s)
- Meiru Liu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049 China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555 China
- BGI-Shenzhen, Shenzhen, 518083 China
| | - Yue Song
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555 China
- BGI-Shenzhen, Shenzhen, 518083 China
| | - Suyu Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049 China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555 China
- BGI-Shenzhen, Shenzhen, 518083 China
| | - Lili Yu
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555 China
| | - Zengbao Yuan
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049 China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555 China
- BGI-Shenzhen, Shenzhen, 518083 China
| | - Hengjia Yang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555 China
| | - Mengqi Zhang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555 China
| | - Zhuocheng Zhou
- Professional Committee of Native Aquatic Organisms and Water Ecosystem of China Fisheries Association, Beijing, 100125 China
| | - Inge Seim
- Integrative Biology Laboratory, College of Life Sciences, Nanjing Normal University, Nanjing, 210023 China
| | - Shanshan Liu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555 China
- BGI-Shenzhen, Shenzhen, 518083 China
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083 China
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen, 518083 China
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083 China
| |
Collapse
|
6
|
Höjer P, Frick T, Siga H, Pourbozorgi P, Aghelpasand H, Martin M, Ahmadian A. BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies. Nucleic Acids Res 2023; 51:e114. [PMID: 37941142 PMCID: PMC10711428 DOI: 10.1093/nar/gkad1010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 10/04/2023] [Accepted: 10/18/2023] [Indexed: 11/10/2023] Open
Abstract
Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. We introduce Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10× Genomics, TELL-seq and stLFR. Running BLR on DBS linked-reads yielded megabase-scale phasing with low (<0.2%) switch error rates. Of 13616 protein-coding genes phased in the GIAB benchmark set (v4.2.1), 98.6% matched the BLR phasing. In addition, large structural variants showed concordance with HPRC-HG002 reference assembly calls. Compared to diploid assembly with PacBio HiFi reads, BLR phasing was more continuous when considering switch errors. We further show that integrating long reads at low coverage (∼10×) can improve phasing contiguity and reduce switch errors in tandem repeats. When compared to Long Ranger on 10× Genomics data, BLR showed an increase in phase block N50 with low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications. In conclusion, BLR provides a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.
Collapse
Affiliation(s)
- Pontus Höjer
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Tobias Frick
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Humam Siga
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Parham Pourbozorgi
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Hooman Aghelpasand
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Marcel Martin
- Stockholm University, Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Afshin Ahmadian
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| |
Collapse
|
7
|
Hu Y, Wang L, Yang G, Wang S, Guo M, Lu H, Zhang T. VDR promotes testosterone synthesis in mouse Leydig cells via regulation of cholesterol side chain cleavage cytochrome P450 (Cyp11a1) expression. Genes Genomics 2023; 45:1377-1387. [PMID: 37747642 DOI: 10.1007/s13258-023-01444-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 09/30/2022] [Indexed: 09/26/2023]
Abstract
BACKGROUND The vitamin D receptor (VDR) mediates the pleiotropic biological actions that include osteoporosis, immune responses and androgen synthesis.VDR is widely expressed in testis cells such as Leydig cells, Sertoli cells, and sperm. The levels of steroids are critical for sexual development. In the early stage of steroidogenesis, cholesterol is converted to pregnenolone (precursor of most steroid hormones) by cholesterol side-chain lyase (CYP11A1), which eventually synthesizes the male hormone testosterone. OBJECTIVE This study aims to reveal how VDR regulates CYP11A1 expression and affects testosterone synthesis in murine Leydig cells. METHODS The levels of VDR, CYP11A1 were determined by quantitative real-time polymerase chain reaction (RT-qPCR) or western blot. Targeted relationship between VDR and Cyp11a1 was evaluated by dual-luciferase reporter assay. The levels of testosterone concentrations in cell culture media serum by enzyme-linked immunosorbent assay (ELISA). RESULTS Phylogenetic and motif analysis showed that the Cyp11a1 family had sequence loss, which may have special biological functions during evolution. The results of promoter prediction showed that vitamin D response element (VDRE) existed in the upstream promoter region of murine Cyp11a1. Dual-luciferase assay confirmed that VDR could bind candidate VDREs in upstream region of Cyp11a1, and enhance gene expression. Tissue distribution and localizatio analysis showed that Cyp11a1 was mainly expressed in testis, and dominantly existed in murine Leydig cells. Furthermore, over-expression VDR and CYP11A1 significantly increased testosterone synthesis in mice Leydig cells. CONCLUSIONS Active vitamin D3 (VD3) and Vdr interference treatment showed that VD3/VDR had a positive regulatory effect on Cyp11a1 expression and testosterone secretion. VDR promotes testosterone synthesis in male mice by up-regulating Cyp11a1 expression, which played an important role for male reproduction.
Collapse
Affiliation(s)
- Yuanyuan Hu
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Ling Wang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
- Shaanxi Province Key Laboratory of Bio-Resources, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Ge Yang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Shanshan Wang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Miaomiao Guo
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Hongzhao Lu
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
- Qinba State Key Laboratory of Biological Resources and Ecological Environment, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Tao Zhang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China.
- QinLing-Bashan Mountains Bioresources Comprehensive Development C. I. C., Shaanxi University of Technology, Hanzhong, 723001, China.
- Qinba State Key Laboratory of Biological Resources and Ecological Environment, Shaanxi University of Technology, Hanzhong, 723001, China.
| |
Collapse
|
8
|
O'Donnell S, Yue JX, Saada OA, Agier N, Caradec C, Cokelaer T, De Chiara M, Delmas S, Dutreux F, Fournier T, Friedrich A, Kornobis E, Li J, Miao Z, Tattini L, Schacherer J, Liti G, Fischer G. Telomere-to-telomere assemblies of 142 strains characterize the genome structural landscape in Saccharomyces cerevisiae. Nat Genet 2023; 55:1390-1399. [PMID: 37524789 PMCID: PMC10412453 DOI: 10.1038/s41588-023-01459-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 06/26/2023] [Indexed: 08/02/2023]
Abstract
Pangenomes provide access to an accurate representation of the genetic diversity of species, both in terms of sequence polymorphisms and structural variants (SVs). Here we generated the Saccharomyces cerevisiae Reference Assembly Panel (ScRAP) comprising reference-quality genomes for 142 strains representing the species' phylogenetic and ecological diversity. The ScRAP includes phased haplotype assemblies for several heterozygous diploid and polyploid isolates. We identified circa (ca.) 4,800 nonredundant SVs that provide a broad view of the genomic diversity, including the dynamics of telomere length and transposable elements. We uncovered frequent cases of complex aneuploidies where large chromosomes underwent large deletions and translocations. We found that SVs can impact gene expression near the breakpoints and substantially contribute to gene repertoire evolution. We also discovered that horizontally acquired regions insert at chromosome ends and can generate new telomeres. Overall, the ScRAP demonstrates the benefit of a pangenome in understanding genome evolution at population scale.
Collapse
Affiliation(s)
- Samuel O'Donnell
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Jia-Xing Yue
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Nicolas Agier
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Claudia Caradec
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Thomas Cokelaer
- Biomics Technological Platform, Center for Technological Resources and Research (C2RT), Institut Pasteur, Paris, France
- Bioinformatics and Biostatistics Hub, Computational Biology Department, Institut Pasteur, Paris, France
| | | | - Stéphane Delmas
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Fabien Dutreux
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Téo Fournier
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Etienne Kornobis
- Biomics Technological Platform, Center for Technological Resources and Research (C2RT), Institut Pasteur, Paris, France
- Bioinformatics and Biostatistics Hub, Computational Biology Department, Institut Pasteur, Paris, France
| | - Jing Li
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Zepu Miao
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
| | | | | | - Gianni Liti
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France.
| | - Gilles Fischer
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France.
| |
Collapse
|
9
|
Chan AP, Siddique A, Desplat Y, Choi Y, Ranganathan S, Choudhary KS, Khalid MF, Diaz J, Bezney J, DeAscanis D, George Z, Wong S, Selleck W, Bowers J, Zismann V, Reining L, Highlander S, Brown K, Armstrong JR, Hakak Y, Schork NJ. A CRISPR-enhanced metagenomic NGS test to improve pandemic preparedness. CELL REPORTS METHODS 2023; 3:100463. [PMID: 37323571 PMCID: PMC10110940 DOI: 10.1016/j.crmeth.2023.100463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 12/22/2022] [Accepted: 04/10/2023] [Indexed: 06/17/2023]
Abstract
The lack of preparedness for detecting and responding to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pathogen (i.e., COVID-19) has caused enormous harm to public health and the economy. Testing strategies deployed on a population scale at day zero, i.e., the time of the first reported case, would be of significant value. Next-generation sequencing (NGS) has such capabilities; however, it has limited detection sensitivity for low-copy-number pathogens. Here, we leverage the CRISPR-Cas9 system to effectively remove abundant sequences not contributing to pathogen detection and show that NGS detection sensitivity of SARS-CoV-2 approaches that of RT-qPCR. The resulting sequence data can also be used for variant strain typing, co-infection detection, and individual human host response assessment, all in a single molecular and analysis workflow. This NGS work flow is pathogen agnostic and, therefore, has the potential to transform how large-scale pandemic response and focused clinical infectious disease testing are pursued in the future.
Collapse
Affiliation(s)
- Agnes P. Chan
- The Translational Genomics Research Institute (TGen), An Affiliate of the City of Hope National Medical Center, Phoenix, AZ 85004, USA
| | | | | | - Yongwook Choi
- The Translational Genomics Research Institute (TGen), An Affiliate of the City of Hope National Medical Center, Phoenix, AZ 85004, USA
| | | | | | | | - Josh Diaz
- Jumpcode Genomics, San Diego, CA 92121, USA
| | - Jon Bezney
- Jumpcode Genomics, San Diego, CA 92121, USA
| | | | | | - Shukmei Wong
- The Translational Genomics Research Institute (TGen), An Affiliate of the City of Hope National Medical Center, Phoenix, AZ 85004, USA
| | - William Selleck
- The Translational Genomics Research Institute (TGen), An Affiliate of the City of Hope National Medical Center, Phoenix, AZ 85004, USA
| | - Jolene Bowers
- The Translational Genomics Research Institute (TGen), An Affiliate of the City of Hope National Medical Center, Phoenix, AZ 85004, USA
| | - Victoria Zismann
- The Translational Genomics Research Institute (TGen), An Affiliate of the City of Hope National Medical Center, Phoenix, AZ 85004, USA
| | - Lauren Reining
- The Translational Genomics Research Institute (TGen), An Affiliate of the City of Hope National Medical Center, Phoenix, AZ 85004, USA
| | - Sarah Highlander
- The Translational Genomics Research Institute (TGen), An Affiliate of the City of Hope National Medical Center, Phoenix, AZ 85004, USA
| | | | | | | | - Nicholas J. Schork
- The Translational Genomics Research Institute (TGen), An Affiliate of the City of Hope National Medical Center, Phoenix, AZ 85004, USA
- The University of California, San Diego, San Diego, CA 92093, USA
- The Scripps Research Institute, San Diego, CA 92037, USA
| |
Collapse
|
10
|
Alkaloid production and response to natural adverse conditions in Peganum harmala: in silico transcriptome analyses. BIOTECHNOLOGIA 2022; 103:355-384. [PMID: 36685700 PMCID: PMC9837557 DOI: 10.5114/bta.2022.120706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 07/25/2022] [Accepted: 09/16/2022] [Indexed: 01/06/2023] Open
Abstract
Peganum harmala is a valuable wild plant that grows and survives under adverse conditions and produces pharmaceutical alkaloid metabolites. Using different assemblers to develop a transcriptome improves the quality of assembled transcriptome. In this study, a concrete and accurate method for detecting stress-responsive transcripts by comparing stress-related gene ontology (GO) terms and public domains was designed. An integrated transcriptome for P. harmala including 42 656 coding sequences was created by merging de novo assembled transcriptomes. Around 35 000 transcripts were annotated with more than 90% resemblance to three closely related species of Citrus, which confirmed the robustness of the assembled transcriptome; 4853 stress-responsive transcripts were identified. CYP82 involved in alkaloid biosynthesis showed a higher number of transcripts in P. harmala than in other plants, indicating its diverse alkaloid biosynthesis attributes. Transcription factors (TFs) and regulatory elements with 3887 transcripts comprised 9% of the transcriptome. Among the TFs of the integrated transcriptome, cystein2/histidine2 (C2H2) and WD40 repeat families were the most abundant. The Kyoto Encyclopedia of Genes and Genomes (KEGG) MAPK (mitogen-activated protein kinase) signaling map and the plant hormone signal transduction map showed the highest assigned genes to these pathways, suggesting their potential stress resistance. The P. harmala whole-transcriptome survey provides important resources and paves the way for functional and comparative genomic studies on this plant to discover stress-tolerance-related markers and response mechanisms in stress physiology, phytochemistry, ecology, biodiversity, and evolution. P. harmala can be a potential model for studying adverse environmental cues and metabolite biosynthesis and a major source for the production of various alkaloids.
Collapse
|
11
|
Pokrovac I, Pezer Ž. Recent advances and current challenges in population genomics of structural variation in animals and plants. Front Genet 2022; 13:1060898. [PMID: 36523759 PMCID: PMC9745067 DOI: 10.3389/fgene.2022.1060898] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/15/2022] [Indexed: 05/02/2024] Open
Abstract
The field of population genomics has seen a surge of studies on genomic structural variation over the past two decades. These studies witnessed that structural variation is taxonomically ubiquitous and represent a dominant form of genetic variation within species. Recent advances in technology, especially the development of long-read sequencing platforms, have enabled the discovery of structural variants (SVs) in previously inaccessible genomic regions which unlocked additional structural variation for population studies and revealed that more SVs contribute to evolution than previously perceived. An increasing number of studies suggest that SVs of all types and sizes may have a large effect on phenotype and consequently major impact on rapid adaptation, population divergence, and speciation. However, the functional effect of the vast majority of SVs is unknown and the field generally lacks evidence on the phenotypic consequences of most SVs that are suggested to have adaptive potential. Non-human genomes are heavily under-represented in population-scale studies of SVs. We argue that more research on other species is needed to objectively estimate the contribution of SVs to evolution. We discuss technical challenges associated with SV detection and outline the most recent advances towards more representative reference genomes, which opens a new era in population-scale studies of structural variation.
Collapse
Affiliation(s)
| | - Željka Pezer
- Laboratory for Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
| |
Collapse
|
12
|
Meleshko D, Yang R, Marks P, Williams S, Hajirasouliha I. Efficient detection and assembly of non-reference DNA sequences with synthetic long reads. Nucleic Acids Res 2022; 50:e108. [PMID: 35924489 PMCID: PMC9561269 DOI: 10.1093/nar/gkac653] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/10/2022] [Accepted: 08/01/2022] [Indexed: 11/14/2022] Open
Abstract
Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion's share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, NY 10021, USA.,Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA
| | - Rui Yang
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, NY 10021, USA.,Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA
| | - Patrick Marks
- 10x Genomics Inc., Stoneridge Mall Road, Pleasanton, CA 94566, USA
| | - Stephen Williams
- 10x Genomics Inc., Stoneridge Mall Road, Pleasanton, CA 94566, USA
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA.,Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, NY 10021, USA
| |
Collapse
|
13
|
Dong W, Wong KHY, Liu Y, Levy-Sakin M, Hung WC, Li M, Li B, Jin SC, Choi J, Lopez-Giraldez F, Vaka D, Poon A, Chu C, Lao R, Balamir M, Movsesyan I, Malloy MJ, Zhao H, Kwok PY, Kane JP, Lifton RP, Pullinger CR. Whole-exome sequencing reveals damaging gene variants associated with hypoalphalipoproteinemia. J Lipid Res 2022; 63:100209. [PMID: 35460704 PMCID: PMC9126845 DOI: 10.1016/j.jlr.2022.100209] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 04/12/2022] [Accepted: 04/13/2022] [Indexed: 12/02/2022] Open
Abstract
Low levels of high density lipoprotein-cholesterol (HDL-C) are associated with an elevated risk of arteriosclerotic coronary heart disease. Heritability of HDL-C levels is high. In this research discovery study, we used whole-exome sequencing to identify damaging gene variants that may play significant roles in determining HDL-C levels. We studied 204 individuals with a mean HDL-C level of 27.8 ± 6.4 mg/dl (range: 4-36 mg/dl). Data were analyzed by statistical gene burden testing and by filtering against candidate gene lists. We found 120 occurrences of probably damaging variants (116 heterozygous; four homozygous) among 45 of 104 recognized HDL candidate genes. Those with the highest prevalence of damaging variants were ABCA1 (n = 20), STAB1 (n = 9), OSBPL1A (n = 8), CPS1 (n = 8), CD36 (n = 7), LRP1 (n = 6), ABCA8 (n = 6), GOT2 (n = 5), AMPD3 (n = 5), WWOX (n = 4), and IRS1 (n = 4). Binomial analysis for damaging missense or loss-of-function variants identified the ABCA1 and LDLR genes at genome-wide significance. In conclusion, whole-exome sequencing of individuals with low HDL-C showed the burden of damaging rare variants in the ABCA1 and LDLR genes is particularly high and revealed numerous occurrences in HDL candidate genes, including many genes identified in genome-wide association study reports. Many of these genes are involved in cancer biology, which accords with epidemiologic findings of the association of HDL deficiency with increased risk of cancer, thus presenting a new area of interest in HDL genomics.
Collapse
Affiliation(s)
- Weilai Dong
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Karen H Y Wong
- Cardiovascular Research Institute, University of California, San Francisco, CA, USA
| | - Youbin Liu
- Department of Cardiology, The Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Michal Levy-Sakin
- Cardiovascular Research Institute, University of California, San Francisco, CA, USA
| | - Wei-Chien Hung
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Boyang Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Sheng Chih Jin
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA; Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Jungmin Choi
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA; Department of Biomedical Sciences, Korea University College of Medicine, Seoul, Korea
| | | | - Dedeepya Vaka
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Annie Poon
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Catherine Chu
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Richard Lao
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Melek Balamir
- Department of Internal Medicine, Istanbul University, Istanbul, Turkey
| | - Irina Movsesyan
- Cardiovascular Research Institute, University of California, San Francisco, CA, USA
| | - Mary J Malloy
- Cardiovascular Research Institute, University of California, San Francisco, CA, USA; Department of Medicine, University of California, San Francisco, CA, USA; Department of Pediatrics, University of California, San Francisco, CA, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Pui-Yan Kwok
- Cardiovascular Research Institute, University of California, San Francisco, CA, USA; Department of Medicine, University of California, San Francisco, CA, USA; Department of Dermatology, University of California, San Francisco, CA, USA
| | - John P Kane
- Cardiovascular Research Institute, University of California, San Francisco, CA, USA; Department of Medicine, University of California, San Francisco, CA, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA
| | - Richard P Lifton
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Clive R Pullinger
- Cardiovascular Research Institute, University of California, San Francisco, CA, USA; Physiological Nursing, University of California, San Francisco, CA, USA.
| |
Collapse
|
14
|
Li M, Sun C, Xu N, Bian P, Tian X, Wang X, Wang Y, Jia X, Heller R, Wang M, Wang F, Dai X, Luo R, Guo Y, Wang X, Yang P, Hu D, Liu Z, Fu W, Zhang S, Li X, Wen C, Lan F, Siddiki AZ, Suwannapoom C, Zhao X, Nie Q, Hu X, Jiang Y, Yang N. De Novo Assembly of 20 Chicken Genomes Reveals the Undetectable Phenomenon for Thousands of Core Genes on Microchromosomes and Subtelomeric Regions. Mol Biol Evol 2022; 39:msac066. [PMID: 35325213 PMCID: PMC9021737 DOI: 10.1093/molbev/msac066] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The gene numbers and evolutionary rates of birds were assumed to be much lower than those of mammals, which is in sharp contrast to the huge species number and morphological diversity of birds. It is, therefore, necessary to construct a complete avian genome and analyze its evolution. We constructed a chicken pan-genome from 20 de novo assembled genomes with high sequencing depth, and identified 1,335 protein-coding genes and 3,011 long noncoding RNAs not found in GRCg6a. The majority of these novel genes were detected across most individuals of the examined transcriptomes but were seldomly measured in each of the DNA sequencing data regardless of Illumina or PacBio technology. Furthermore, different from previous pan-genome models, most of these novel genes were overrepresented on chromosomal subtelomeric regions and microchromosomes, surrounded by extremely high proportions of tandem repeats, which strongly blocks DNA sequencing. These hidden genes were proved to be shared by all chicken genomes, included many housekeeping genes, and enriched in immune pathways. Comparative genomics revealed the novel genes had 3-fold elevated substitution rates than known ones, updating the knowledge about evolutionary rates in birds. Our study provides a framework for constructing a better chicken genome, which will contribute toward the understanding of avian evolution and the improvement of poultry breeding.
Collapse
Affiliation(s)
- Ming Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Congjiao Sun
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Naiyi Xu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Peipei Bian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xiaomeng Tian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xihong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Yuzhe Wang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing 100193, China
| | - Xinzheng Jia
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA
- School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Rasmus Heller
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N 2200, Denmark
| | - Mingshan Wang
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Fei Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xuelei Dai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Rongsong Luo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Yingwei Guo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xiangnan Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Peng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Dexiang Hu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Zhenyu Liu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Weiwei Fu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Shunjin Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xiaochang Li
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Chaoliang Wen
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Fangren Lan
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Amam Zonaed Siddiki
- Department of Pathology and Parasitology, Faculty of Veterinary Medicine, Chittagong Veterinary and Animal Sciences University, Chittagong 4202, Bangladesh
| | | | - Xin Zhao
- Department of Animal Science, McGill University, Montreal, QC, Canada
| | - Qinghua Nie
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China
| | - Xiaoxiang Hu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
- Center for Functional Genomics, Institute of Future Agriculture, Northwest A&F University, China
| | - Ning Yang
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| |
Collapse
|
15
|
Chrisman BS, Paskov KM, He C, Jung JY, Stockham N, Washington PY, Wall DP. A Method for Localizing Non-Reference Sequences to the Human Genome. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2022; 27:313-324. [PMID: 34890159 PMCID: PMC8730539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
As the last decade of human genomics research begins to bear the fruit of advancements in precision medicine, it is important to ensure that genomics' improvements in human health are distributed globally and equitably. An important step to ensuring health equity is to improve the human reference genome to capture global diversity by including a wide variety of alternative haplotypes, sequences that are not currently captured on the reference genome.We present a method that localizes 100 basepair (bp) long sequences extracted from short-read sequencing that can ultimately be used to identify what regions of the human genome non-reference sequences belong to.We extract reads that don't align to the reference genome, and compute the population's distribution of 100-mers found within the unmapped reads. We use genetic data from families to identify shared genetic material between siblings and match the distribution of unmapped k-mers to these inheritance patterns to determine the the most likely genomic region of a k-mer. We perform this localization with two highly interpretable methods of artificial intelligence: a computationally tractable Hidden Markov Model coupled to a Maximum Likelihood Estimator. Using a set of alternative haplotypes with known locations on the genome, we show that our algorithm is able to localize 96% of k-mers with over 90% accuracy and less than 1Mb median resolution. As the collection of sequenced human genomes grows larger and more diverse, we hope that this method can be used to improve the human reference genome, a critical step in addressing precision medicine's diversity crisis.
Collapse
Affiliation(s)
| | - Kelley M Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Chloe He
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Jae-Yoon Jung
- Department of Pediatrics, Stanford University, Stanford, CA 94305, USA
| | - Nate Stockham
- Department of Neuroscience, Stanford University, Stanford, CA 94305, USA
| | | | - Dennis Paul Wall
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
16
|
Li C, Yang X, Shao L, Zhang R, Liu Q, Zhang M, Liu S, Pan S, Xue W, Wang C, Mao C, Zhang H, Fan G. Bicolor angelfish ( Centropyge bicolor) provides the first chromosome-level genome of the Pomacanthidae family. GIGABYTE 2021; 2021:gigabyte32. [PMID: 36824335 PMCID: PMC9650296 DOI: 10.46471/gigabyte.32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 10/26/2021] [Indexed: 11/09/2022] Open
Abstract
The Bicolor Angelfish, Centropyge bicolor, is a tropical coral reef fish. It is named for its striking two-color body. However, a lack of high-quality genomic data means little is known about the genome of this species. Here, we present a chromosome-level C. bicolor genome constructed using Hi-C data. The assembled genome is 650 Mbp in size, with a scaffold N50 value of 4.4 Mbp, and a contig N50 value of 114 Kbp. Protein-coding genes numbering 21,774 were annotated. Our analysis will help others to choose the most appropriate de novo genome sequencing strategy based on resources and target applications. To the best of our knowledge, this is the first chromosome-level genome for the Pomacanthidae family, which might contribute to further studies exploring coral reef fish evolution, diversity and conservation.
Collapse
Affiliation(s)
- Chunhua Li
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - Xianwei Yang
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Libin Shao
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - Rui Zhang
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - Qun Liu
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - Mengqi Zhang
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - Shanshan Liu
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - Shanshan Pan
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - Weizhen Xue
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - Congyan Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - Chunyan Mao
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| | - He Zhang
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
- Department of Biology, Hong Kong Baptist University, Hong Kong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao 26655-5, China
| |
Collapse
|
17
|
Krannich T, White WTJ, Niehus S, Holley G, Halldórsson BV, Kehr B. Population-scale detection of non-reference sequence variants using colored de Bruijn graphs. Bioinformatics 2021; 38:604-611. [PMID: 34726732 PMCID: PMC8756200 DOI: 10.1093/bioinformatics/btab749] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 09/27/2021] [Accepted: 10/28/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION With the increasing throughput of sequencing technologies, structural variant (SV) detection has become possible across tens of thousands of genomes. Non-reference sequence (NRS) variants have drawn less attention compared with other types of SVs due to the computational complexity of detecting them. When using short-read data, the detection of NRS variants inevitably involves a de novo assembly which requires high-quality sequence data at high coverage. Previous studies have demonstrated how sequence data of multiple genomes can be combined for the reliable detection of NRS variants. However, the algorithms proposed in these studies have limited scalability to larger sets of genomes. RESULTS We introduce PopIns2, a tool to discover and characterize NRS variants in many genomes, which scales to considerably larger numbers of genomes than its predecessor PopIns. In this article, we briefly outline the PopIns2 workflow and highlight our novel algorithmic contributions. We developed an entirely new approach for merging contig assemblies of unaligned reads from many genomes into a single set of NRS using a colored de Bruijn graph. Our tests on simulated data indicate that the new merging algorithm ranks among the best approaches in terms of quality and reliability and that PopIns2 shows the best precision for a growing number of genomes processed. Results on the Polaris Diversity Cohort and a set of 1000 Icelandic human genomes demonstrate unmatched scalability for the application on population-scale datasets. AVAILABILITY AND IMPLEMENTATION The source code of PopIns2 is available from https://github.com/kehrlab/PopIns2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Sebastian Niehus
- Regensburg Center for Interventional Immunology (RCI), 93053 Regensburg, Germany
| | | | - Bjarni V Halldórsson
- deCODE Genetics, Reykjavík 102, Iceland,Department of Engineering, School of Technology, Reykjavík University, Reykjavík 102, Iceland
| | - Birte Kehr
- To whom correspondence should be addressed. or
| |
Collapse
|
18
|
Yuan Y, Bayer PE, Batley J, Edwards D. Current status of structural variation studies in plants. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:2153-2163. [PMID: 34101329 PMCID: PMC8541774 DOI: 10.1111/pbi.13646] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 05/23/2023]
Abstract
Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
- School of Life Sciences and State Key Laboratory for AgrobiotechnologyThe Chinese University of Hong KongHong Kong SARChina
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - David Edwards
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| |
Collapse
|
19
|
Shieh JT, Penon-Portmann M, Wong KHY, Levy-Sakin M, Verghese M, Slavotinek A, Gallagher RC, Mendelsohn BA, Tenney J, Beleford D, Perry H, Chow SK, Sharo AG, Brenner SE, Qi Z, Yu J, Klein OD, Martin D, Kwok PY, Boffelli D. Application of full-genome analysis to diagnose rare monogenic disorders. NPJ Genom Med 2021; 6:77. [PMID: 34556655 PMCID: PMC8460793 DOI: 10.1038/s41525-021-00241-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 10/21/2020] [Indexed: 11/30/2022] Open
Abstract
Current genetic tests for rare diseases provide a diagnosis in only a modest proportion of cases. The Full-Genome Analysis method, FGA, combines long-range assembly and whole-genome sequencing to detect small variants, structural variants with breakpoint resolution, and phasing. We built a variant prioritization pipeline and tested FGA’s utility for diagnosis of rare diseases in a clinical setting. FGA identified structural variants and small variants with an overall diagnostic yield of 40% (20 of 50 cases) and 35% in exome-negative cases (8 of 23 cases), 4 of these were structural variants. FGA detected and mapped structural variants that are missed by short reads, including non-coding duplication, and phased variants across long distances of more than 180 kb. With the prioritization algorithm, longer DNA technologies could replace multiple tests for monogenic disorders and expand the range of variants detected. Our study suggests that genomes produced from technologies like FGA can improve variant detection and provide higher resolution genome maps for future application.
Collapse
Affiliation(s)
- Joseph T Shieh
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA. .,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA.
| | - Monica Penon-Portmann
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Karen H Y Wong
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
| | - Michal Levy-Sakin
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
| | - Michelle Verghese
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
| | - Anne Slavotinek
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Renata C Gallagher
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Bryce A Mendelsohn
- Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Jessica Tenney
- Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Daniah Beleford
- Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Hazel Perry
- Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA
| | - Stephen K Chow
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
| | - Andrew G Sharo
- Biophysics Graduate Group, University of California Berkeley, Berkeley, CA, USA
| | - Steven E Brenner
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
| | - Zhongxia Qi
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Jingwei Yu
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Ophir D Klein
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Division of Medical Genetics, Pediatrics, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA, USA.,Craniofacial Biology and Department of Orofacial Sciences, University of California San Francisco, San Francisco, CA, USA
| | - David Martin
- Children's Hospital Oakland Research Institute, Benioff Children's Hospital Oakland, University of California San Francisco, Oakland, CA, USA
| | - Pui-Yan Kwok
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.,Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA.,Department of Dermatology, University of California San Francisco, San Francisco, CA, USA
| | - Dario Boffelli
- Children's Hospital Oakland Research Institute, Benioff Children's Hospital Oakland, University of California San Francisco, Oakland, CA, USA
| |
Collapse
|
20
|
Gao Z, You X, Zhang X, Chen J, Xu T, Huang Y, Lin X, Xu J, Bian C, Shi Q. A chromosome-level genome assembly of the striped catfish (Pangasianodon hypophthalmus). Genomics 2021; 113:3349-3356. [PMID: 34343676 DOI: 10.1016/j.ygeno.2021.07.026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 07/13/2021] [Accepted: 07/28/2021] [Indexed: 12/15/2022]
Abstract
Striped catfish (Pangasianodon hypophthalmus), belonging to the Pangasiidae family, has become an economically important fish with wide cultivation in Southeast Asia. Owing to the high-fat trait, it is always considered as an oily fish. In our present study, a high-quality genome assembly of the striped catfish was generated by integration of Illumina short reads, Nanopore long reads and Hi-C data. A 731.7-Mb genome assembly was finally obtained, with a contig N50 of 3.5 Mb, a scaffold N50 of 29.5 Mb, and anchoring of 98.46% of the assembly onto 30 pseudochromosomes. The genome contained 36.9% repeat sequences, and a total 18,895 protein-coding genes were predicted. Interestingly, we identified a tandem triplication of fatty acid binding protein 1 gene (fabp1; thereby named as fabp1-1, fabp1-2 and fabp1-3 respectively), which may be related to the high fat content in striped catfish. Meanwhile, the FABP1-2 and -3 isoforms differed from FABP1-1 by several missense mutations including R126T, which may affect the fatty acid binding properties. In summary, we report a high-quality chromosome-level genome assembly of the striped catfish, which provides a valuable genetic resource for biomedical studies on the high-fat trait, and lays a solid foundation for practical aquaculture and molecular breeding of this international teleost species.
Collapse
Affiliation(s)
- Zijian Gao
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen 518083, China; Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Xinxin You
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen 518083, China; Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Xinhui Zhang
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Jieming Chen
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen 518083, China; Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Tengfei Xu
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Yu Huang
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Xueqiang Lin
- BGI Marine-Hainan, BGI Marine, BGI, Wenchang 571327, China
| | - Junmin Xu
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Chao Bian
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen 518083, China; Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China
| | - Qiong Shi
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen 518083, China; Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China.
| |
Collapse
|
21
|
Boev AS, Rakitko AS, Usmanov SR, Kobzeva AN, Popov IV, Ilinsky VV, Kiktenko EO, Fedorov AK. Genome assembly using quantum and quantum-inspired annealing. Sci Rep 2021; 11:13183. [PMID: 34162895 PMCID: PMC8222255 DOI: 10.1038/s41598-021-88321-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 04/09/2021] [Indexed: 02/05/2023] Open
Abstract
Recent advances in DNA sequencing open prospects to make whole-genome analysis rapid and reliable, which is promising for various applications including personalized medicine. However, existing techniques for de novo genome assembly, which is used for the analysis of genomic rearrangements, chromosome phasing, and reconstructing genomes without a reference, require solving tasks of high computational complexity. Here we demonstrate a method for solving genome assembly tasks with the use of quantum and quantum-inspired optimization techniques. Within this method, we present experimental results on genome assembly using quantum annealers both for simulated data and the [Formula: see text]X 174 bacteriophage. Our results pave a way for a significant increase in the efficiency of solving bioinformatics problems with the use of quantum computing technologies and, in particular, quantum annealing might be an effective method. We expect that the new generation of quantum annealing devices would outperform existing techniques for de novo genome assembly. To the best of our knowledge, this is the first experimental study of de novo genome assembly problems both for real and synthetic data on quantum annealing devices and quantum-inspired techniques.
Collapse
Affiliation(s)
- A S Boev
- Russian Quantum Center, Skolkovo, Moscow, 143025, Russia
| | | | - S R Usmanov
- Russian Quantum Center, Skolkovo, Moscow, 143025, Russia
| | - A N Kobzeva
- Russian Quantum Center, Skolkovo, Moscow, 143025, Russia
| | - I V Popov
- Genotek ltd., Moscow, 105120, Russia
| | | | - E O Kiktenko
- Russian Quantum Center, Skolkovo, Moscow, 143025, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russia
| | - A K Fedorov
- Russian Quantum Center, Skolkovo, Moscow, 143025, Russia.
- Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russia.
| |
Collapse
|
22
|
Alm Rosenblad M, Abramova A, Lind U, Ólason P, Giacomello S, Nystedt B, Blomberg A. Genomic Characterization of the Barnacle Balanus improvisus Reveals Extreme Nucleotide Diversity in Coding Regions. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2021; 23:402-416. [PMID: 33931810 PMCID: PMC8270832 DOI: 10.1007/s10126-021-10033-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 04/05/2021] [Indexed: 05/11/2023]
Abstract
Barnacles are key marine crustaceans in several habitats, and they constitute a common practical problem by causing biofouling on man-made marine constructions and ships. Despite causing considerable ecological and economic impacts, there is a surprising void of basic genomic knowledge, and a barnacle reference genome is lacking. We here set out to characterize the genome of the bay barnacle Balanus improvisus (= Amphibalanus improvisus) based on short-read whole-genome sequencing and experimental genome size estimation. We show both experimentally (DNA staining and flow cytometry) and computationally (k-mer analysis) that B. improvisus has a haploid genome size of ~ 740 Mbp. A pilot genome assembly rendered a total assembly size of ~ 600 Mbp and was highly fragmented with an N50 of only 2.2 kbp. Further assembly-based and assembly-free analyses revealed that the very limited assembly contiguity is due to the B. improvisus genome having an extremely high nucleotide diversity (π) in coding regions (average π ≈ 5% and average π in fourfold degenerate sites ≈ 20%), and an overall high repeat content (at least 40%). We also report on high variation in the α-octopamine receptor OctA (average π = 3.6%), which might increase the risk that barnacle populations evolve resistance toward antifouling agents. The genomic features described here can help in planning for a future high-quality reference genome, which is urgently needed to properly explore and understand proteins of interest in barnacle biology and marine biotechnology and for developing better antifouling strategies.
Collapse
Affiliation(s)
- Magnus Alm Rosenblad
- Deparment of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg , Sweden
| | - Anna Abramova
- Deparment of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg , Sweden
| | - Ulrika Lind
- Deparment of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg , Sweden
| | - Páll Ólason
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, 752 37, Uppsala, Sweden
| | - Stefania Giacomello
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121, Solna, Sweden
| | - Björn Nystedt
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, 752 37, Uppsala, Sweden
| | - Anders Blomberg
- Deparment of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg , Sweden.
| |
Collapse
|
23
|
Rehder C, Bean LJH, Bick D, Chao E, Chung W, Das S, O'Daniel J, Rehm H, Shashi V, Vincent LM. Next-generation sequencing for constitutional variants in the clinical laboratory, 2021 revision: a technical standard of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2021; 23:1399-1415. [PMID: 33927380 DOI: 10.1038/s41436-021-01139-4] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 02/25/2021] [Accepted: 02/26/2021] [Indexed: 12/17/2022] Open
Abstract
Next-generation sequencing (NGS) technologies are now established in clinical laboratories as a primary testing modality in genomic medicine. These technologies have reduced the cost of large-scale sequencing by several orders of magnitude. It is now cost-effective to analyze an individual with disease-targeted gene panels, exome sequencing, or genome sequencing to assist in the diagnosis of a wide array of clinical scenarios. While clinical validation and use of NGS in many settings is established, there are continuing challenges as technologies and the associated informatics evolve. To assist clinical laboratories with the validation of NGS methods and platforms, the ongoing monitoring of NGS testing to ensure quality results, and the interpretation and reporting of variants found using these technologies, the American College of Medical Genetics and Genomics (ACMG) has developed the following technical standards.
Collapse
Affiliation(s)
| | - Lora J H Bean
- Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - David Bick
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Elizabeth Chao
- Division of Genetics and Genomics, Department of Pediatrics, University of California, Irvine, CA, USA
| | - Wendy Chung
- Departments of Pediatrics and Medicine, Columbia University, New York, NY, USA
| | - Soma Das
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Julianne O'Daniel
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Heidi Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vandana Shashi
- Department of Pediatrics, Duke University, Durham, NC, USA
| | - Lisa M Vincent
- Division of Pathology & Laboratory Medicine, Children's National Health System, Washington, DC, USA.,Departments of Pathology and Pediatrics, George Washington University, Washington, DC, USA
| | | |
Collapse
|
24
|
Takayama J, Tadaka S, Yano K, Katsuoka F, Gocho C, Funayama T, Makino S, Okamura Y, Kikuchi A, Sugimoto S, Kawashima J, Otsuki A, Sakurai-Yageta M, Yasuda J, Kure S, Kinoshita K, Yamamoto M, Tamiya G. Construction and integration of three de novo Japanese human genome assemblies toward a population-specific reference. Nat Commun 2021; 12:226. [PMID: 33431880 PMCID: PMC7801658 DOI: 10.1038/s41467-020-20146-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 11/17/2020] [Indexed: 12/21/2022] Open
Abstract
The complete human genome sequence is used as a reference for next-generation sequencing analyses. However, some ethnic ancestries are under-represented in the reference genome (e.g., GRCh37) due to its bias toward European and African ancestries. Here, we perform de novo assembly of three Japanese male genomes using > 100× Pacific Biosciences long reads and Bionano Genomics optical maps per sample. We integrate the genomes using the major allele for consensus and anchor the scaffolds using genetic and radiation hybrid maps to reconstruct each chromosome. The resulting genome sequence, JG1, is contiguous, accurate, and carries the Japanese major allele at most loci. We adopt JG1 as the reference for confirmatory exome re-analyses of seven rare-disease Japanese families and find that re-analysis using JG1 reduces total candidate variant calls versus GRCh37 while retaining disease-causing variants. These results suggest that integrating multiple genomes from a single population can aid genome analyses of that population.
Collapse
Affiliation(s)
- Jun Takayama
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Shu Tadaka
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Kenji Yano
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Fumiki Katsuoka
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Chinatsu Gocho
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Takamitsu Funayama
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Satoshi Makino
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Yasunobu Okamura
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Atsuo Kikuchi
- Department of Pediatrics, Tohoku University Graduate School of Medicine, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Sachiyo Sugimoto
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Junko Kawashima
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Akihito Otsuki
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Mika Sakurai-Yageta
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Jun Yasuda
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Division of Molecular and Cellular Oncology, Miyagi Cancer Center Research Institute, 47-1, Nodayama, Medeshima-Shiode, Natori, Miyagi, 981-1293, Japan
| | - Shigeo Kure
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Department of Pediatrics, Tohoku University Graduate School of Medicine, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Kengo Kinoshita
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Graduate School of Information Sciences, Tohoku University, 6-3-09 Aramaki Aza-Aoba, Aoba-ku, Sendai, Miyagi, 980-8579, Japan.
| | - Masayuki Yamamoto
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
| | - Gen Tamiya
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan.
- Tohoku University Graduate School of Medicine, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| |
Collapse
|
25
|
Kaye AM, Wasserman WW. The genome atlas: navigating a new era of reference genomes. Trends Genet 2021; 37:807-818. [PMID: 33419587 DOI: 10.1016/j.tig.2020.12.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/03/2020] [Accepted: 12/07/2020] [Indexed: 10/22/2022]
Abstract
The reference genome serves two distinct purposes within the field of genomics. First, it provides a persistent structure against which findings can be reported, allowing for universal knowledge exchange between users. Second, it reduces the computational costs and time required to process genomic data by creating a scaffold that can be relied upon by analysis software. Here, we posit that current efforts to extend the linear reference to a graph-based structure while trying to fulfil both of these purposes concurrently will face a trade-off between comprehensiveness and computational efficiency. In this article, we explore how the reference genome is used and suggest an alternative structure, The Genome Atlas (TGA), to fulfil the bipartite role of the reference genome.
Collapse
Affiliation(s)
- Alice M Kaye
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
26
|
A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions. Genes (Basel) 2020; 11:genes11111350. [PMID: 33202901 PMCID: PMC7697454 DOI: 10.3390/genes11111350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/06/2020] [Accepted: 11/09/2020] [Indexed: 11/26/2022] Open
Abstract
The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-contiguity genome assembly of one Korean (AK1) to show that a part of AK1 genome is missing in GRCh38 and that the missing regions harbored ~1390 putative coding elements. Furthermore, we found that multiple populations shared some certain parts in the missing genome when we analyzed the “unmapped” (to GRCh38) reads of fourteen individuals (five East-Asians, four Europeans, and five Africans), amounting to ~5.3 Mb (~0.2% of AK1) of the total genomic regions. The recovered AK1 regions from the “unmapped reads”, which were the estimated missing regions that did not exist in GRCh38, harbored candidate coding elements. We verified that most of the common (shared by ≥7 individuals) missing regions exist in human and chimpanzee DNA. Moreover, we further identified the occurrence mechanism and ethnic heterogeneity as well as the presence of the common missing regions. This study illuminates a potential advantage of using a pangenome reference and brings up the need for further investigations on the various features of regions globally missed in GRCh38.
Collapse
|
27
|
Lee YG, Lee JY, Kim J, Kim YJ. Insertion variants missing in the human reference genome are widespread among human populations. BMC Biol 2020; 18:167. [PMID: 33187521 PMCID: PMC7666470 DOI: 10.1186/s12915-020-00894-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 10/09/2020] [Indexed: 01/07/2023] Open
Abstract
Background Structural variants comprise diverse genomic arrangements including deletions, insertions, inversions, and translocations, which can generally be detected in humans through sequence comparison to the reference genome. Among structural variants, insertions are the least frequently identified variants, mainly due to ascertainment bias in the reference genome, lack of previous sequence knowledge, and low complexity of typical insertion sequences. Though recent developments in long-read sequencing deliver promise in annotating individual non-reference insertions, population-level catalogues on non-reference insertion variants have not been identified and the possible functional roles of these hidden variants remain elusive. Results To detect non-reference insertion variants, we developed a pipeline, InserTag, which generates non-reference contigs by local de novo assembly and then infers the full-sequence of insertion variants by tracing contigs from non-human primates and other human genome assemblies. Application of the pipeline to data from 2535 individuals of the 1000 Genomes Project helped identify 1696 non-reference insertion variants and re-classify the variants as retention of ancestral sequences or novel sequence insertions based on the ancestral state. Genotyping of the variants showed that individuals had, on average, 0.92-Mbp sequences missing from the reference genome, 92% of the variants were common (allele frequency > 5%) among human populations, and more than half of the variants were major alleles. Among human populations, African populations were the most divergent and had the most non-reference sequences, which was attributed to the greater prevalence of high-frequency insertion variants. The subsets of insertion variants were in high linkage disequilibrium with phenotype-associated SNPs and showed signals of recent continent-specific selection. Conclusions Non-reference insertion variants represent an important type of genetic variation in the human population, and our developed pipeline, InserTag, provides the frameworks for the detection and genotyping of non-reference sequences missing from human populations. Supplementary information Supplementary information accompanies this paper at 10.1186/s12915-020-00894-1.
Collapse
Affiliation(s)
- Young-Gun Lee
- Department of Integrated Omics for Biomedical Science, WCU Graduate School, Yonsei University, Seoul, Republic of Korea
| | - Jin-Young Lee
- Department of Biochemistry, College of Life Science and Technology, Yonsei University, Seoul, Republic of Korea
| | - Junhyong Kim
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Young-Joon Kim
- Department of Integrated Omics for Biomedical Science, WCU Graduate School, Yonsei University, Seoul, Republic of Korea. .,Department of Biochemistry, College of Life Science and Technology, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
28
|
Wong KHY, Ma W, Wei CY, Yeh EC, Lin WJ, Wang EHF, Su JP, Hsieh FJ, Kao HJ, Chen HH, Chow SK, Young E, Chu C, Poon A, Yang CF, Lin DS, Hu YF, Wu JY, Lee NC, Hwu WL, Boffelli D, Martin D, Xiao M, Kwok PY. Towards a reference genome that captures global genetic diversity. Nat Commun 2020; 11:5482. [PMID: 33127893 PMCID: PMC7599213 DOI: 10.1038/s41467-020-19311-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 10/08/2020] [Indexed: 02/05/2023] Open
Abstract
The current human reference genome is predominantly derived from a single individual and it does not adequately reflect human genetic diversity. Here, we analyze 338 high-quality human assemblies of genetically divergent human populations to identify missing sequences in the human reference genome with breakpoint resolution. We identify 127,727 recurrent non-reference unique insertions spanning 18,048,877 bp, some of which disrupt exons and known regulatory elements. To improve genome annotations, we linearly integrate these sequences into the chromosomal assemblies and construct a Human Diversity Reference. Leveraging this reference, an average of 402,573 previously unmapped reads can be recovered for a given genome sequenced to ~40X coverage. Transcriptomic diversity among these non-reference sequences can also be directly assessed. We successfully map tens of thousands of previously discarded RNA-Seq reads to this reference and identify transcription evidence in 4781 gene loci, underlining the importance of these non-reference sequences in functional genomics. Our extensive datasets are important advances toward a comprehensive reference representation of global human genetic diversity. The human reference genome does not fully reflect human genetic diversity. Here, the authors analyse 338 human genome assemblies from diverse populations to identify missing sequences, define non-reference unique insertions and construct a Human Diversity Reference.
Collapse
Affiliation(s)
- Karen H Y Wong
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Walfred Ma
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Chun-Yu Wei
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Erh-Chan Yeh
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Wan-Jia Lin
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Elin H F Wang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Jen-Ping Su
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Feng-Jen Hsieh
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Hsiao-Jung Kao
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Hsiao-Huei Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Stephen K Chow
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Eleanor Young
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, 19104, USA
| | - Catherine Chu
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Annie Poon
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Chi-Fan Yang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Dar-Shong Lin
- Department of Pediatrics, Mackay Memorial Hospital, Taipei, Taiwan.,Department of Medicine, Mackay Medical College, New Taipei, Taiwan
| | - Yu-Feng Hu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.,Department of Internal Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
| | - Jer-Yuarn Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ni-Chung Lee
- Departments of Pediatrics and Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Wuh-Liang Hwu
- Departments of Pediatrics and Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Dario Boffelli
- Children's Hospital Oakland Research Institute, Oakland, CA, 94609, USA
| | - David Martin
- Children's Hospital Oakland Research Institute, Oakland, CA, 94609, USA
| | - Ming Xiao
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, 19104, USA.,Institute of Molecular Medicine and Infectious Disease in the School of Medicine, Drexel University, Philadelphia, PA, 19102, USA
| | - Pui-Yan Kwok
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, CA, 94158, USA. .,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan. .,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, 94143, USA. .,Department of Dermatology, University of California, San Francisco, San Francisco, CA, 94115, USA.
| |
Collapse
|
29
|
Li R, Yang P, Li M, Fang W, Yue X, Nanaei HA, Gan S, Du D, Cai Y, Dai X, Yang Q, Cao C, Deng W, He S, Li W, Ma R, Liu M, Jiang Y. A Hu sheep genome with the first ovine Y chromosome reveal introgression history after sheep domestication. SCIENCE CHINA-LIFE SCIENCES 2020; 64:1116-1130. [PMID: 32997330 DOI: 10.1007/s11427-020-1807-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Accepted: 08/25/2020] [Indexed: 01/21/2023]
Abstract
The Y chromosome plays key roles in male fertility and reflects the evolutionary history of paternal lineages. Here, we present a de novo genome assembly of the Hu sheep with the first draft assembly of ovine Y chromosome (oMSY), using nanopore sequencing and Hi-C technologies. The oMSY that we generated spans 10.6 Mb from which 775 Y-SNPs were identified by applying a large panel of whole genome sequences from worldwide sheep and wild Iranian mouflons. Three major paternal lineages (HY1a, HY1b and HY2) were defined across domestic sheep, of which HY2 was newly detected. Surprisingly, HY2 forms a monophyletic clade with the Iranian mouflons and is highly divergent from both HY1a and HY1b. Demographic analysis of Y chromosomes, mitochondrial and nuclear genomes confirmed that HY2 and the maternal counterpart of lineage C represented a distinct wild mouflon population in Iran that diverge from the direct ancestor of domestic sheep, the wild mouflons in Southeastern Anatolia. Our results suggest that wild Iranian mouflons had introgressed into domestic sheep and thereby introduced this Iranian mouflon specific lineage carrying HY2 to both East Asian and Africa sheep populations.
Collapse
Affiliation(s)
- Ran Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Peng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Ming Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Wenwen Fang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Xiangpeng Yue
- State Key Laboratory of Grassland Agro-ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730020, China
| | - Hojjat Asadollahpour Nanaei
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Shangquan Gan
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Duo Du
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Yudong Cai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Xuelei Dai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Qimeng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Chunna Cao
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Weidong Deng
- Faculty of Animal Science and Technology, Yunan Agricultural University, Kunming, 650201, China
| | - Sangang He
- Key Laboratory of Genetics Breeding and Reproduction of Grass feeding Livestock, Ministry of Agriculture, Animal Biotechnology Research Institute, Xinjiang Academy of Animal Science, Urumqi, 830026, China
| | - Wenrong Li
- Key Laboratory of Genetics Breeding and Reproduction of Grass feeding Livestock, Ministry of Agriculture, Animal Biotechnology Research Institute, Xinjiang Academy of Animal Science, Urumqi, 830026, China
| | - Runlin Ma
- State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Mingjun Liu
- Key Laboratory of Genetics Breeding and Reproduction of Grass feeding Livestock, Ministry of Agriculture, Animal Biotechnology Research Institute, Xinjiang Academy of Animal Science, Urumqi, 830026, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| |
Collapse
|
30
|
Wohlers I, Künstner A, Munz M, Olbrich M, Fähnrich A, Calonga-Solís V, Ma C, Hirose M, El-Mosallamy S, Salama M, Busch H, Ibrahim S. An integrated personal and population-based Egyptian genome reference. Nat Commun 2020; 11:4719. [PMID: 32948767 PMCID: PMC7501257 DOI: 10.1038/s41467-020-17964-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 07/24/2020] [Indexed: 02/05/2023] Open
Abstract
A small number of de novo assembled human genomes have been reported to date, and few have been complemented with population-based genetic variation, which is particularly important for North Africa, a region underrepresented in current genome-wide references. Here, we combine long- and short-read whole-genome sequencing data with recent assembly approaches into a de novo assembly of an Egyptian genome. The assembly demonstrates well-balanced quality metrics and is complemented with variant phasing via linked reads into haploblocks, which we associate with gene expression changes in blood. To construct an Egyptian genome reference, we identify genome-wide genetic variation within a cohort of 110 Egyptian individuals. We show that differences in allele frequencies and linkage disequilibrium between Egyptians and Europeans may compromise the transferability of European ancestry-based genetic disease risk and polygenic scores, substantiating the need for multi-ethnic genome references. Thus, the Egyptian genome reference will be a valuable resource for precision medicine.
Collapse
Affiliation(s)
- Inken Wohlers
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Axel Künstner
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Matthias Munz
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Michael Olbrich
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Anke Fähnrich
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Verónica Calonga-Solís
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
- Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, 81531-990, Curitiba, Brazil
| | - Caixia Ma
- Novogene (UK) Company Limited, 25 Cambridge Science Park, Milton Road, CB4 0FW, Cambridge, UK
| | - Misa Hirose
- Genetics Division, Lübeck Institute of Experimental Dermatology, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Shaaban El-Mosallamy
- Medical Experimental Research Center (MERC), Mansoura University, Elgomhouria St., Dakahlia Governorate, 35516, Mansoura, Egypt
| | - Mohamed Salama
- Medical Experimental Research Center (MERC), Mansoura University, Elgomhouria St., Dakahlia Governorate, 35516, Mansoura, Egypt
- Institute of Global Health and Human Ecology, The American University in Cairo, AUC avenue, 11835, Cairo, Egypt
| | - Hauke Busch
- Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany.
| | - Saleh Ibrahim
- Genetics Division, Lübeck Institute of Experimental Dermatology, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany.
| |
Collapse
|
31
|
Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences. G3-GENES GENOMES GENETICS 2020; 10:2801-2809. [PMID: 32532800 PMCID: PMC7407462 DOI: 10.1534/g3.120.401280] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity.
Collapse
|
32
|
Almarri MA, Bergström A, Prado-Martinez J, Yang F, Fu B, Dunham AS, Chen Y, Hurles ME, Tyler-Smith C, Xue Y. Population Structure, Stratification, and Introgression of Human Structural Variation. Cell 2020; 182:189-199.e15. [PMID: 32531199 PMCID: PMC7369638 DOI: 10.1016/j.cell.2020.05.024] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 03/04/2020] [Accepted: 05/12/2020] [Indexed: 02/07/2023]
Abstract
Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, but they are still understudied. Here we present a comprehensive analysis of structural variation in the Human Genome Diversity panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify, in total, 126,018 variants, 78% of which were not identified in previous global sequencing projects. Some reach high frequency and are private to continental groups or even individual populations, including regionally restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, we discover 1,643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. Our results illustrate the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.
Collapse
Affiliation(s)
| | - Anders Bergström
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK; The Francis Crick Institute, London NW1 1AT, UK
| | | | | | - Beiyuan Fu
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Alistair S Dunham
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK; EMBL-EBI, Hinxton CB10 1SD, UK
| | - Yuan Chen
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | | | | | - Yali Xue
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK.
| |
Collapse
|
33
|
Shumate A, Zimin AV, Sherman RM, Puiu D, Wagner JM, Olson ND, Pertea M, Salit ML, Zook JM, Salzberg SL. Assembly and annotation of an Ashkenazi human reference genome. Genome Biol 2020; 21:129. [PMID: 32487205 PMCID: PMC7265644 DOI: 10.1186/s13059-020-02047-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 05/15/2020] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases. RESULTS Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes. CONCLUSIONS The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.
Collapse
Affiliation(s)
- Alaina Shumate
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Aleksey V Zimin
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Rachel M Sherman
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela Puiu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Justin M Wagner
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D Olson
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Marc L Salit
- Joint Initiative for Metrology in Biology, Stanford University, Stanford, CA, USA
| | - Justin M Zook
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
34
|
Abstract
Since the early days of the genome era, the scientific community has relied on a single 'reference' genome for each species, which is used as the basis for a wide range of genetic analyses, including studies of variation within and across species. As sequencing costs have dropped, thousands of new genomes have been sequenced, and scientists have come to realize that a single reference genome is inadequate for many purposes. By sampling a diverse set of individuals, one can begin to assemble a pan-genome: a collection of all the DNA sequences that occur in a species. Here we review efforts to create pan-genomes for a range of species, from bacteria to humans, and we further consider the computational methods that have been proposed in order to capture, interpret and compare pan-genome data. As scientists continue to survey and catalogue the genomic variation across human populations and begin to assemble a human pan-genome, these efforts will increase our power to connect variation to human diversity, disease and beyond.
Collapse
Affiliation(s)
- Rachel M Sherman
- Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
| | - Steven L Salzberg
- Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
35
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
36
|
Tattini L, Tellini N, Mozzachiodi S, D'Angiolo M, Loeillet S, Nicolas A, Liti G. Accurate Tracking of the Mutational Landscape of Diploid Hybrid Genomes. Mol Biol Evol 2020; 36:2861-2877. [PMID: 31397846 PMCID: PMC6878955 DOI: 10.1093/molbev/msz177] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Mutations, recombinations, and genome duplications may promote genetic diversity and trigger evolutionary processes. However, quantifying these events in diploid hybrid genomes is challenging. Here, we present an integrated experimental and computational workflow to accurately track the mutational landscape of yeast diploid hybrids (MuLoYDH) in terms of single-nucleotide variants, small insertions/deletions, copy-number variants, aneuploidies, and loss-of-heterozygosity. Pairs of haploid Saccharomyces parents were combined to generate ancestor hybrids with phased genomes and varying levels of heterozygosity. These diploids were evolved under different laboratory protocols, in particular mutation accumulation experiments. Variant simulations enabled the efficient integration of competitive and standard mapping of short reads, depending on local levels of heterozygosity. Experimental validations proved the high accuracy and resolution of our computational approach. Finally, applying MuLoYDH to four different diploids revealed striking genetic background effects. Homozygous Saccharomyces cerevisiae showed a ∼4-fold higher mutation rate compared with its closely related species S. paradoxus. Intraspecies hybrids unveiled that a substantial fraction of the genome (∼250 bp per generation) was shaped by loss-of-heterozygosity, a process strongly inhibited in interspecies hybrids by high levels of sequence divergence between homologous chromosomes. In contrast, interspecies hybrids exhibited higher single-nucleotide mutation rates compared with intraspecies hybrids. MuLoYDH provided an unprecedented quantitative insight into the evolutionary processes that mold diploid yeast genomes and can be generalized to other genetic systems.
Collapse
Affiliation(s)
- Lorenzo Tattini
- CNRS UMR7284, INSERM, IRCAN, Université Côte d'Azur, Nice, France
| | - Nicolò Tellini
- CNRS UMR7284, INSERM, IRCAN, Université Côte d'Azur, Nice, France
| | | | | | - Sophie Loeillet
- CNRS UMR3244, Institut Curie, PSL Research University, Paris, France
| | - Alain Nicolas
- CNRS UMR3244, Institut Curie, PSL Research University, Paris, France
| | - Gianni Liti
- CNRS UMR7284, INSERM, IRCAN, Université Côte d'Azur, Nice, France
| |
Collapse
|
37
|
Zhang X, Wu R, Wang Y, Yu J, Tang H. Unzipping haplotypes in diploid and polyploid genomes. Comput Struct Biotechnol J 2019; 18:66-72. [PMID: 31908732 PMCID: PMC6938933 DOI: 10.1016/j.csbj.2019.11.011] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 11/25/2019] [Accepted: 11/26/2019] [Indexed: 11/18/2022] Open
Abstract
Diploid genomes consist of two homologous copies of chromosomes with one from each parent while polyploid genomes contain more than two homologous sets of chromosomes. Most of the reference genome assemblies collapsed haplotypes that represent 'mosaic' sequences, ignoring allelic variants that may be involved in important cellular and biological functions. Unzipping haplotypes into distinct sets of sequences has been a growing trend in recent genome studies, as it is an essential tool towards resolving important clinical and biological questions, such as compound heterozygotes, heterosis, and evolution. Herein, we review existing methods for alignment-based and assembly-based haplotype phasing for heterozygous diploid and polyploid genomes, as well as recent advances of experimental approaches for improved genome phasing. We anticipate that full haplotype phasing could become a routine procedure in genome studies in the near future.
Collapse
Affiliation(s)
- Xingtan Zhang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Ministry of Education, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Ruoxi Wu
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Ministry of Education, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yibin Wang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Ministry of Education, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Jiaxin Yu
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Ministry of Education, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Haibao Tang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Ministry of Education, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Corresponding author.
| |
Collapse
|
38
|
Zhang L, Zhou X, Weng Z, Sidow A. De novo diploid genome assembly for genome-wide structural variant detection. NAR Genom Bioinform 2019; 2:lqz018. [PMID: 33575568 PMCID: PMC7671403 DOI: 10.1093/nargab/lqz018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 10/09/2019] [Accepted: 12/02/2019] [Indexed: 12/30/2022] Open
Abstract
Detection of structural variants (SVs) on the basis of read alignment to a reference genome remains a difficult problem. De novo assembly, traditionally used to generate reference genomes, offers an alternative for SV detection. However, it has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10× linked-read sequencing supports accurate SV detection. We examined variants in six de novo 10× assemblies with diverse experimental parameters from two commonly used human cell lines: NA12878 and NA24385. The assemblies are effective for detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies’ contigs to the reference (hg38). Our study also shows that the base-pair level SV breakpoint accuracy is high, with a majority of SVs having precisely correct sizes and breakpoints. Setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation. In about half of cases, the mechanism is the opposite of the reference-based call. We uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10× linked-read data can achieve cost-effective SV detection for personal genomes.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.,Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA.,Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ziming Weng
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA.,Department of Genetics, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
39
|
Wong KH, Levy‐Sakin M, Ma W, Gonzaludo N, Mak AC, Vaka D, Poon A, Chu C, Lao R, Balamir M, Grenville Z, Wong N, Kane JP, Kwok P, Malloy MJ, Pullinger CR. Three patients with homozygous familial hypercholesterolemia: Genomic sequencing and kindred analysis. Mol Genet Genomic Med 2019; 7:e1007. [PMID: 31617323 PMCID: PMC6900368 DOI: 10.1002/mgg3.1007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 09/18/2019] [Accepted: 09/25/2019] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Homozygous Familial Hypercholesterolemia (HoFH) is an inherited recessive condition associated with extremely high levels of low-density lipoprotein (LDL) cholesterol in affected individuals. It is usually caused by homozygous or compound heterozygous functional mutations in the LDL receptor (LDLR). A number of mutations causing FH have been reported in literature and such genetic heterogeneity presents great challenges for disease diagnosis. OBJECTIVE We aim to determine the likely genetic defects responsible for three cases of pediatric HoFH in two kindreds. METHODS We applied whole exome sequencing (WES) on the two probands to determine the likely functional variants among candidate FH genes. We additionally applied 10x Genomics (10xG) Linked-Reads whole genome sequencing (WGS) on one of the kindreds to identify potentially deleterious structural variants (SVs) underlying HoFH. A PCR-based screening assay was also established to detect the LDLR structural variant in a cohort of 641 patients with elevated LDL. RESULTS In the Caucasian kindred, the FH homozygosity can be attributed to two compound heterozygous LDLR damaging variants, an exon 12 p.G592E missense mutation and a novel 3kb exon 1 deletion. By analyzing the 10xG phased data, we ascertained that this deletion allele was most likely to have originated from a Russian ancestor. In the Mexican kindred, the strikingly elevated LDL cholesterol level can be attributed to a homozygous frameshift LDLR variant p.E113fs. CONCLUSIONS While the application of WES can provide a cost-effective way of identifying the genetic causes of FH, it often lacks sensitivity for detecting structural variants. Our finding of the LDLR exon 1 deletion highlights the broader utility of Linked-Read WGS in detecting SVs in the clinical setting, especially when HoFH patients remain undiagnosed after WES.
Collapse
Affiliation(s)
- Karen H.Y. Wong
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
| | - Michal Levy‐Sakin
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
| | - Walfred Ma
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
| | - Nina Gonzaludo
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
| | - Angel C.Y. Mak
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
- Lung Biology CenterUniversity of CaliforniaSan FranciscoCAUSA
| | - Dedeepya Vaka
- Institute for Human GeneticsUniversity of CaliforniaSan FranciscoCAUSA
| | - Annie Poon
- Institute for Human GeneticsUniversity of CaliforniaSan FranciscoCAUSA
| | - Catherine Chu
- Institute for Human GeneticsUniversity of CaliforniaSan FranciscoCAUSA
| | - Richard Lao
- Institute for Human GeneticsUniversity of CaliforniaSan FranciscoCAUSA
| | - Melek Balamir
- Department of Internal MedicineIstanbul UniversityIstanbulTurkey
| | - Zoe Grenville
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
| | - Nicolas Wong
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
| | - John P. Kane
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
- Department of MedicineUniversity of CaliforniaSan FranciscoCAUSA
- Department of Biochemistry and BiophysicsUniversity of CaliforniaSan FranciscoCAUSA
| | - Pui‐Yan Kwok
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
- Institute for Human GeneticsUniversity of CaliforniaSan FranciscoCAUSA
- Department of DermatologyUniversity of CaliforniaSan FranciscoCAUSA
| | - Mary J. Malloy
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
- Department of MedicineUniversity of CaliforniaSan FranciscoCAUSA
- Department of PediatricsUniversity of CaliforniaSan FranciscoCAUSA
| | - Clive R. Pullinger
- Cardiovascular Research InstituteUniversity of CaliforniaSan FranciscoCAUSA
- Department of Physiological NursingUniversity of CaliforniaSan FranciscoCAUSA
| |
Collapse
|
40
|
Dai Z, Li T, Li J, Han Z, Pan Y, Tang S, Diao X, Luo M. High-throughput long paired-end sequencing of a Fosmid library by PacBio. PLANT METHODS 2019; 15:142. [PMID: 31788019 PMCID: PMC6878638 DOI: 10.1186/s13007-019-0525-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 11/12/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND Large insert paired-end sequencing technologies are important tools for assembling genomes, delineating associated breakpoints and detecting structural rearrangements. To facilitate the comprehensive detection of inter- and intra-chromosomal structural rearrangements or variants (SVs) and complex genome assembly with long repeats and segmental duplications, we developed a new method based on single-molecule real-time synthesis sequencing technology for generating long paired-end sequences of large insert DNA libraries. RESULTS A Fosmid vector, pHZAUFOS3, was developed with the following new features: (1) two 18-bp non-palindromic I-SceI sites flank the cloning site, and another two sites are present in the skeleton of the vector, allowing long DNA inserts (and the long paired-ends in this paper) to be recovered as single fragments and the vector (~ 8 kb) to be fragmented into 2-3 kb fragments by I-SceI digestion and therefore was effectively removed from the long paired-ends (5-10 kb); (2) the chloramphenicol (Cm) resistance gene and replicon (oriV), necessary for colony growth, are located near the two sides of the cloning site, helping to increase the proportion of the paired-end fragments to single-end fragments in the paired-end libraries. Paired-end libraries were constructed by ligating the size-selected, mechanically sheared pooled Fosmid DNA fragments to the Ampicillin (Amp) resistance gene fragment and screening the colonies with Cm and Amp. We tested this method on yeast and Setaria italica Yugu1. Fosmid-size paired-ends with an average length longer than 2 kb for each end were generated. The N50 scaffold lengths of the de novo assemblies of the yeast and S. italica Yugu1 genomes were significantly improved. Five large and five small structural rearrangements or assembly errors spanning tens of bp to tens of kb were identified in S. italica Yugu1 including deletions, inversions, duplications and translocations. CONCLUSIONS We developed a new method for long paired-end sequencing of large insert libraries, which can efficiently improve the quality of de novo genome assembly and identify large and small structural rearrangements or assembly errors.
Collapse
Affiliation(s)
- Zhaozhao Dai
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Tong Li
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Jiadong Li
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Zhifei Han
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Yonglong Pan
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Sha Tang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 10081 China
| | - Xianmin Diao
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 10081 China
| | - Meizhong Luo
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| |
Collapse
|
41
|
Li R, Fu W, Su R, Tian X, Du D, Zhao Y, Zheng Z, Chen Q, Gao S, Cai Y, Wang X, Li J, Jiang Y. Towards the Complete Goat Pan-Genome by Recovering Missing Genomic Segments From the Reference Genome. Front Genet 2019; 10:1169. [PMID: 31803240 PMCID: PMC6874019 DOI: 10.3389/fgene.2019.01169] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 10/23/2019] [Indexed: 01/08/2023] Open
Abstract
It is broadly expected that next generation sequencing will ultimately generate a complete genome as is the latest goat reference genome (ARS1), which is considered to be one of the most continuous assemblies in livestock. However, the rich diversity of worldwide goat breeds indicates that a genome from one individual would be insufficient to represent the whole genomic contents of goats. By comparing nine de novo assemblies from seven sibling species of domestic goat with ARS1 and using resequencing and transcriptome data from goats for verification, we identified a total of 38.3 Mb sequences that were absent in ARS1. The pan-sequences contain genic fractions with considerable expression. Using the pan-genome (ARS1 together with the pan-sequences) as a reference genome, variation calling efficacy can be appreciably improved. A total of 56,657 spurious SNPs per individual were repressed and 24,414 novel SNPs per individual on average were recovered as a result of better reads mapping quality. The transcriptomic mapping rate was also increased by ∼1.15%. Our study demonstrated that comparing de novo assemblies from closely related species is an efficient and reliable strategy for finding missing sequences from the reference genome and could be applicable to other species. Pan-genome can serve as an improved reference genome in animals for a better exploration of the underlying genomic variations and could increase the probability of finding genotype-phenotype associations assessed by a comprehensive variation database containing much more differences between individuals. We have constructed a goat pan-genome web interface for data visualization (http://animal.nwsuaf.edu.cn/panGoat).
Collapse
Affiliation(s)
- Ran Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Weiwei Fu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Rui Su
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Xiaomeng Tian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Duo Du
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Yue Zhao
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Zhuqing Zheng
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Qiuming Chen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Shan Gao
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Yudong Cai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Xihong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Jinquan Li
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| |
Collapse
|
42
|
Li R, Tian X, Yang P, Fan Y, Li M, Zheng H, Wang X, Jiang Y. Recovery of non-reference sequences missing from the human reference genome. BMC Genomics 2019; 20:746. [PMID: 31619167 PMCID: PMC6796347 DOI: 10.1186/s12864-019-6107-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 09/20/2019] [Indexed: 01/12/2023] Open
Abstract
Background The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. Results Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6113 NRS adding up to 12.8 Mb. Besides 1571 insertions, we detected 3041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. Conclusions Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome.
Collapse
Affiliation(s)
- Ran Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Xiaomeng Tian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Peng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Yingzhi Fan
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Ming Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Hongxiang Zheng
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Xihong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| |
Collapse
|
43
|
Oliynyk RT. Future Preventive Gene Therapy of Polygenic Diseases from a Population Genetics Perspective. Int J Mol Sci 2019; 20:E5013. [PMID: 31658652 PMCID: PMC6834143 DOI: 10.3390/ijms20205013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/01/2019] [Accepted: 10/08/2019] [Indexed: 12/15/2022] Open
Abstract
With the accumulation of scientific knowledge of the genetic causes of common diseases and continuous advancement of gene-editing technologies, gene therapies to prevent polygenic diseases may soon become possible. This study endeavored to assess population genetics consequences of such therapies. Computer simulations were used to evaluate the heterogeneity in causal alleles for polygenic diseases that could exist among geographically distinct populations. The results show that although heterogeneity would not be easily detectable by epidemiological studies following population admixture, even significant heterogeneity would not impede the outcomes of preventive gene therapies. Preventive gene therapies designed to correct causal alleles to a naturally-occurring neutral state of nucleotides would lower the prevalence of polygenic early- to middle-age-onset diseases in proportion to the decreased population relative risk attributable to the edited alleles. The outcome would manifest differently for late-onset diseases, for which the therapies would result in a delayed disease onset and decreased lifetime risk; however, the lifetime risk would increase again with prolonging population life expectancy, which is a likely consequence of such therapies. If the preventive heritable gene therapies were to be applied on a large scale, the decreasing frequency of risk alleles in populations would reduce the disease risk or delay the age of onset, even with a fraction of the population receiving such therapies. With ongoing population admixture, all groups would benefit over generations.
Collapse
Affiliation(s)
- Roman Teo Oliynyk
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand.
- Department of Computer Science, University of Auckland, Auckland 1010, New Zealand.
| |
Collapse
|
44
|
Yang G, Lu H, Wang L, Zhao J, Zeng W, Zhang T. Genome-Wide Identification and Transcriptional Expression of the METTL21C Gene Family in Chicken. Genes (Basel) 2019; 10:genes10080628. [PMID: 31434291 PMCID: PMC6723737 DOI: 10.3390/genes10080628] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 08/06/2019] [Accepted: 08/15/2019] [Indexed: 12/31/2022] Open
Abstract
The chicken is a common type of poultry that is economically important both for its medicinal and nutritional values. Previous studies have found that free-range chickens have more skeletal muscle mass. The methyltransferase-like 21C gene (METTL21C) plays an important role in muscle development; however, there have been few reports on the role of METTL21C in chickens. In this study, we performed a genome-wide identification of chicken METTL21C genes and analyzed their phylogeny, transcriptional expression profile, and real-time quantitative polymerase chain reaction (qPCR). We identified 10 GgMETTL21C genes from chickens, 11 from mice, and 32 from humans, and these genes were divided into six groups, which showed a large amount of variation among these three species. A total of 15 motifs were detected in METTL21C genes, and the intron phase of the gene structure showed that the METTL21C gene family was conservative in evolution. Further, both the transcript data and qPCR showed that a single gene’s (GgMETTL21C3) expression level increased with the muscle development of chickens, indicating that the METTL21C genes are involved in the development of chicken muscles. Our results provide some reference value for the subsequent study of the function of METTL21C.
Collapse
Affiliation(s)
- Ge Yang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, Shaanxi 723001, China
| | - Hongzhao Lu
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, Shaanxi 723001, China
| | - Ling Wang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, Shaanxi 723001, China
| | - Jiarong Zhao
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, Shaanxi 723001, China
| | - Wenxian Zeng
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, Shaanxi 723001, China
| | - Tao Zhang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, Shaanxi 723001, China.
| |
Collapse
|
45
|
Belsare S, Levy-Sakin M, Mostovoy Y, Durinck S, Chaudhuri S, Xiao M, Peterson AS, Kwok PY, Seshagiri S, Wall JD. Evaluating the quality of the 1000 genomes project data. BMC Genomics 2019; 20:620. [PMID: 31416423 PMCID: PMC6696682 DOI: 10.1186/s12864-019-5957-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Accepted: 07/04/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Data from the 1000 Genomes project is quite often used as a reference for human genomic analysis. However, its accuracy needs to be assessed to understand the quality of predictions made using this reference. We present here an assessment of the genotyping, phasing, and imputation accuracy data in the 1000 Genomes project. We compare the phased haplotype calls from the 1000 Genomes project to experimentally phased haplotypes for 28 of the same individuals sequenced using the 10X Genomics platform. RESULTS We observe that phasing and imputation for rare variants are unreliable, which likely reflects the limited sample size of the 1000 Genomes project data. Further, it appears that using a population specific reference panel does not improve the accuracy of imputation over using the entire 1000 Genomes data set as a reference panel. We also note that the error rates and trends depend on the choice of definition of error, and hence any error reporting needs to take these definitions into account. CONCLUSIONS The quality of the 1000 Genomes data needs to be considered while using this database for further studies. This work presents an analysis that can be used for these assessments.
Collapse
Affiliation(s)
- Saurabh Belsare
- Institute for Human Genetics, University of California, San Francisco, CA, 94143, USA.
| | - Michal Levy-Sakin
- Department of Dermatology, University of California, San Francisco, CA, 94143, USA
| | - Yulia Mostovoy
- Department of Dermatology, University of California, San Francisco, CA, 94143, USA
| | - Steffen Durinck
- Department of Molecular Biology, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Subhra Chaudhuri
- Department of Molecular Biology, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Ming Xiao
- School of Biomedical Science, Engineering, and Health Systems, Drexel University, Philadelphia, PA, 19104, USA
| | - Andrew S Peterson
- Department of Molecular Biology, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Pui-Yan Kwok
- Institute for Human Genetics, University of California, San Francisco, CA, 94143, USA
- Department of Dermatology, University of California, San Francisco, CA, 94143, USA
- Cardiovascular Research Institute, San Francisco, CA, 94143, USA
| | - Somasekar Seshagiri
- Department of Molecular Biology, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Jeffrey D Wall
- Institute for Human Genetics, University of California, San Francisco, CA, 94143, USA.
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, 94143, USA.
| |
Collapse
|
46
|
LncRNA KCNQ1OT1 acting as a ceRNA for miR-4458 enhances osteosarcoma progression by regulating CCND2 expression. In Vitro Cell Dev Biol Anim 2019; 55:694-702. [PMID: 31392505 DOI: 10.1007/s11626-019-00386-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 07/02/2019] [Indexed: 12/19/2022]
Abstract
Osteosarcoma is prevalent worldwide and characterized as a challenging health burden. It has been increasingly indicated that long non-coding RNAs (lncRNAs) are significant in pathological processes of numerous cancers, exerting oncogenic or tumor-suppressive function. However, the participation of KCNQ1OT1 in osteosarcoma has not been elaborated. In this study, we focus on interrogating the function of KCNQ1OT1 and its underlying mechanism in osteosarcoma. Our work demonstrated the upregulation of KCNQ1OT1 in osteosarcoma through qRT-PCR. Besides, loss of function assay (CCK-8, transwell migration) indicated KCNQ1OT1 promoted cell proliferation, migration in osteosarcoma. Mechanically, KCNQ1OT1 acting as sponge for miR-4458 antagonized its tumor-suppressive impact on CCND2 expression. The anti-apoptotic nature of KCNQ1OT1 was also unveiled via caspase-3 activity assay. Overexpressed KCNQ1OT1 acted as competing endogenous RNA (ceRNA) for miR-4458 and subsequently reinforced target gene CCND2. Collectively, the results of rescue experiments suggested that the oncogenic role of KCNQ1OT1 was performed through sponging miR-4458 and upregulating CCND2 during osteosarcoma development, providing a novel perspective of intervention in osteosarcoma management.
Collapse
|
47
|
Duan Z, Qiao Y, Lu J, Lu H, Zhang W, Yan F, Sun C, Hu Z, Zhang Z, Li G, Chen H, Xiang Z, Zhu Z, Zhao H, Yu Y, Wei C. HUPAN: a pan-genome analysis pipeline for human genomes. Genome Biol 2019; 20:149. [PMID: 31366358 PMCID: PMC6670167 DOI: 10.1186/s13059-019-1751-y] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 07/01/2019] [Indexed: 12/13/2022] Open
Abstract
The human reference genome is still incomplete, especially for those population-specific or individual-specific regions, which may have important functions. Here, we developed a HUman Pan-genome ANalysis (HUPAN) system to build the human pan-genome. We applied it to 185 deep sequencing and 90 assembled Han Chinese genomes and detected 29.5 Mb novel genomic sequences and at least 188 novel protein-coding genes missing in the human reference genome (GRCh38). It can be an important resource for the human genome-related biomedical studies, such as cancer genome analysis. HUPAN is freely available at http://cgm.sjtu.edu.cn/hupan/ and https://github.com/SJTU-CGM/HUPAN .
Collapse
Affiliation(s)
- Zhongqu Duan
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Yuyang Qiao
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Jinyuan Lu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Huimin Lu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Wenmin Zhang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Fazhe Yan
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Chen Sun
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Zhiqiang Hu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Zhen Zhang
- Department of Radiation Oncology and Department of Oncology, Shanghai Medical College, Fudan University Shanghai Cancer Center, 270 Dong An Road, Shanghai, 200032, China
| | - Guichao Li
- Department of Radiation Oncology and Department of Oncology, Shanghai Medical College, Fudan University Shanghai Cancer Center, 270 Dong An Road, Shanghai, 200032, China
| | - Hongzhuan Chen
- Department of Pharmacology, Shanghai Key Laboratory For Translational Medicine, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, Shanghai, 200025, China
| | - Zhen Xiang
- Department of Surgery, Ruijin Hospital, Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, 197 Ruijin Road, Shanghai, 200025, China
| | - Zhenggang Zhu
- Department of Surgery, Ruijin Hospital, Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, 197 Ruijin Road, Shanghai, 200025, China
| | - Hongyu Zhao
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
- Department of Biostatistics, Yale University, 60 College Street, New Haven, CT, 06520, USA
| | - Yingyan Yu
- Department of Surgery, Ruijin Hospital, Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, 197 Ruijin Road, Shanghai, 200025, China.
| | - Chaochun Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
- Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Pudong District, Shanghai, 201203, China.
| |
Collapse
|
48
|
Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. SCIENCE CHINA-LIFE SCIENCES 2019; 63:750-763. [PMID: 31290097 DOI: 10.1007/s11427-019-9551-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 04/03/2019] [Indexed: 01/23/2023]
Abstract
Pigs were domesticated independently in the Near East and China, indicating that a single reference genome from one individual is unable to represent the full spectrum of divergent sequences in pigs worldwide. Therefore, 12 de novo pig assemblies from Eurasia were compared in this study to identify the missing sequences from the reference genome. As a result, 72.5 Mb of non-redundant sequences (∼3% of the genome) were found to be absent from the reference genome (Sscrofa11.1) and were defined as pan-sequences. Of the pan-sequences, 9.0 Mb were dominant in Chinese pigs, in contrast with their low frequency in European pigs. One sequence dominant in Chinese pigs contained the complete genic region of the tazarotene-induced gene 3 (TIG3) gene which is involved in fatty acid metabolism. Using flanking sequences and Hi-C based methods, 27.7% of the sequences could be anchored to the reference genome. The supplementation of these sequences could contribute to the accurate interpretation of the 3D chromatin structure. A web-based pan-genome database was further provided to serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.
Collapse
|
49
|
Zhang C, Bao C, Zhang X, Lin X, Pan D, Chen Y. Knockdown of lncRNA LEF1-AS1 inhibited the progression of oral squamous cell carcinoma (OSCC) via Hippo signaling pathway. Cancer Biol Ther 2019; 20:1213-1222. [PMID: 30983488 DOI: 10.1080/15384047.2019.1599671] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
It is verified that long non-coding RNAs (lncRNAs) play crucial roles in various cancers. LncRNA LEF1-AS1 is a reported oncogene in colorectal cancer and glioblastoma. In this study, we unveiled that LEF1-AS1 markedly increased in oral squamous cell carcinoma (OSCC) tissues and cell lines. Besides, OSCC patients with high levels of LEF1-AS1 were apt to poor prognosis. Functionally, LEF1-AS1 knockdown inhibited cell survival, proliferation and migration, whereas enhanced cell apoptosis and induced G0/G1 cell cycle arrest in vitro. Consistently, LEF1-AS1 silence hindered tumor growth in vivo. Moreover, LEF1-AS1 inhibition stimulated the activation of Hippo signaling pathway through directly interacting with LATS1. Furtherly, we disclosed that LEF1-AS1 silence abolished the interaction of LEF1-AS1 with LATS1 while enhanced the binding of LATS1 to MOB, therefore promoting YAP phosphorylation but impairing YAP1 nuclear translocation. Additionally, we demonstrated that LEF1-AS1 regulated YAP1 translocation via a LATS1-dependent manner. Furthermore, we also uncovered that YAP1 overexpression abolished the suppressive impact of LEF1-AS1 repression on the biological processes of OSCC cells. In a word, we concluded that LEF1-AS1 served an oncogenic part in OSCC through suppressing Hippo signaling pathway by interacting with LATS1, suggesting the therapeutic and prognostic potential of LEF1-AS1 in OSCC.
Collapse
Affiliation(s)
- Chanqiong Zhang
- Department of Pathology, Wenzhou People's Hospital , Wenzhou , Zhejiang , China
| | - Chunchun Bao
- Division of PET/CT, Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University , Wenzhou , Zhejiang , China
| | - Xiuxing Zhang
- Division of PET/CT, Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University , Wenzhou , Zhejiang , China
| | - Xinshi Lin
- Division of PET/CT, Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University , Wenzhou , Zhejiang , China
| | - Dan Pan
- Department of Pathology, Wenzhou People's Hospital , Wenzhou , Zhejiang , China
| | - Yangzong Chen
- Division of PET/CT, Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University , Wenzhou , Zhejiang , China
| |
Collapse
|
50
|
Genome maps across 26 human populations reveal population-specific patterns of structural variation. Nat Commun 2019; 10:1025. [PMID: 30833565 PMCID: PMC6399254 DOI: 10.1038/s41467-019-08992-7] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 02/12/2019] [Indexed: 01/10/2023] Open
Abstract
Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome. Large structural variants (SV) are understudied in human genetics research because of the difficulty to detect them in the routinely generated short-read sequencing data. Here, the authors generate optical genome maps of 154 individuals from 26 populations that allow comprehensive examination of large SVs.
Collapse
|