1
|
Pei Y, Tanguy M, Giess A, Dixit A, Wilson LC, Gibbons RJ, Twigg SRF, Elgar G, Wilkie AOM. A Comparison of Structural Variant Calling from Short-Read and Nanopore-Based Whole-Genome Sequencing Using Optical Genome Mapping as a Benchmark. Genes (Basel) 2024; 15:925. [PMID: 39062704 PMCID: PMC11276380 DOI: 10.3390/genes15070925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/03/2024] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open
Abstract
The identification of structural variants (SVs) in genomic data represents an ongoing challenge because of difficulties in reliable SV calling leading to reduced sensitivity and specificity. We prepared high-quality DNA from 9 parent-child trios, who had previously undergone short-read whole-genome sequencing (Illumina platform) as part of the Genomics England 100,000 Genomes Project. We reanalysed the genomes using both Bionano optical genome mapping (OGM; 8 probands and one trio) and Nanopore long-read sequencing (Oxford Nanopore Technologies [ONT] platform; all samples). To establish a "truth" dataset, we asked whether rare proband SV calls (n = 234) made by the Bionano Access (version 1.6.1)/Solve software (version 3.6.1_11162020) could be verified by individual visualisation using the Integrative Genomics Viewer with either or both of the Illumina and ONT raw sequence. Of these, 222 calls were verified, indicating that Bionano OGM calls have high precision (positive predictive value 95%). We then asked what proportion of the 222 true Bionano SVs had been identified by SV callers in the other two datasets. In the Illumina dataset, sensitivity varied according to variant type, being high for deletions (115/134; 86%) but poor for insertions (13/58; 22%). In the ONT dataset, sensitivity was generally poor using the original Sniffles variant caller (48% overall) but improved substantially with use of Sniffles2 (36/40; 90% and 17/23; 74% for deletions and insertions, respectively). In summary, we show that the precision of OGM is very high. In addition, when applying the Sniffles2 caller, the sensitivity of SV calling using ONT long-read sequence data outperforms Illumina sequencing for most SV types.
Collapse
Affiliation(s)
- Yang Pei
- Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK; (Y.P.); (S.R.F.T.)
| | - Melanie Tanguy
- Genomics England Limited, One Canada Square, London E14 5AB, UK
| | - Adam Giess
- Genomics England Limited, One Canada Square, London E14 5AB, UK
| | - Abhijit Dixit
- Clinical Genetics Service, Nottingham University Hospitals NHS Foundation Trust, City Hospital, Nottingham NG5 1PB, UK
| | - Louise C. Wilson
- North East Thames Regional Genetics Service, Great Ormond Street Hospital for Children NHS Foundation Trust, Great Ormond Street Hospital, London WC1N 3JH, UK
| | - Richard J. Gibbons
- MRC Molecular Haematology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK
| | - Stephen R. F. Twigg
- Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK; (Y.P.); (S.R.F.T.)
| | - Greg Elgar
- Genomics England Limited, One Canada Square, London E14 5AB, UK
| | - Andrew O. M. Wilkie
- Clinical Genetics Group, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK; (Y.P.); (S.R.F.T.)
| |
Collapse
|
2
|
Jia P, Dong L, Yang X, Wang B, Bush SJ, Wang T, Lin J, Wang S, Zhao X, Xu T, Che Y, Dang N, Ren L, Zhang Y, Wang X, Liang F, Wang Y, Ruan J, Xia H, Zheng Y, Shi L, Lv Y, Wang J, Ye K. Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet. Genome Biol 2023; 24:277. [PMID: 38049885 PMCID: PMC10694985 DOI: 10.1186/s13059-023-03116-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/21/2023] [Indexed: 12/06/2023] Open
Abstract
BACKGROUND Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology). RESULTS The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study. CONCLUSIONS In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.
Collapse
Affiliation(s)
- Peng Jia
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Lianhua Dong
- National Institute of Metrology, Beijing, 100029, China
| | - Xiaofei Yang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tingjie Wang
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Songbo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Xixi Zhao
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yizhuo Che
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Ningxin Dang
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yujing Zhang
- National Institute of Metrology, Beijing, 100029, China
| | - Xia Wang
- National Institute of Metrology, Beijing, 100029, China
| | - Fan Liang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Yang Wang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Han Xia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yi Lv
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
| | - Jing Wang
- National Institute of Metrology, Beijing, 100029, China.
| | - Kai Ye
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
- Faculty of Science, Leiden University, Leiden, 2311EZ, The Netherlands.
| |
Collapse
|
3
|
Yang X, Zhao X, Qu S, Jia P, Wang B, Gao S, Xu T, Zhang W, Huang J, Ye K. Haplotype-resolved Chinese male genome assembly based on high-fidelity sequencing. FUNDAMENTAL RESEARCH 2022; 2:946-953. [PMID: 38933383 PMCID: PMC11197534 DOI: 10.1016/j.fmre.2022.02.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 02/17/2022] [Accepted: 02/27/2022] [Indexed: 01/10/2023] Open
Abstract
The advantages of both the length and accuracy of high-fidelity (HiFi) reads enable chromosome-scale haplotype-resolved genome assembly. In this study, we sequenced a cell line named HJ, established from a Chinese Han male individual by using HiFi and Hi-C. We assembled two high-quality haplotypes of the HJ genome (haplotype 1 (H1): 3.1 Gb, haplotype 2 (H2): 2.9 Gb). The continuity (H1: contig N50 = 28.2 Mb, H2: contig N50 = 25.9 Mb) and completeness (BUSCO: H1 = 94.9%, H2 = 93.5%) are substantially better than those of other Chinese genomes, for example, HX1, NH1.0, and YH2.0. By comparing HJ genome with GRCh38, we reported the mutation landscape of HJ and found that 176 and 213 N-gaps were filled in H1 and H2, respectively. In addition, we detected 12.9 Mb and 13.4 Mb novel sequences containing 246 and 135 protein-coding genes in H1 and H2, respectively. Our results demonstrate the advantages of HiFi reads in haplotype-resolved genome assembly and provide two high-quality haplotypes of a potential Chinese genome as a reference for the Chinese Han population.
Collapse
Affiliation(s)
- Xiaofei Yang
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, Shaanxi, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Xixi Zhao
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, Shaanxi, China
| | - Shoufang Qu
- National Institutes for food and drug Control (NIFDC), No.2, Tiantan Xili, Dongcheng District, Beijing 102629, China
| | - Peng Jia
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Bo Wang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Shenghan Gao
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Tun Xu
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Wenxin Zhang
- National Institutes for food and drug Control (NIFDC), No.2, Tiantan Xili, Dongcheng District, Beijing 102629, China
| | - Jie Huang
- National Institutes for food and drug Control (NIFDC), No.2, Tiantan Xili, Dongcheng District, Beijing 102629, China
| | - Kai Ye
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, Shaanxi, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
- Faculty of Science, Leiden University, Leiden, The Netherlands
| |
Collapse
|
4
|
Liu Z, Zhao G, Xiao Y, Zeng S, Yuan Y, Zhou X, Fang Z, He R, Li B, Zhao Y, Pan H, Wang Y, Yu G, Peng IF, Wang D, Meng Q, Xu Q, Sun Q, Yan X, Shen L, Jiang H, Xia K, Wang J, Guo J, Liang F, Li J, Tang B. Profiling the Genome-Wide Landscape of Short Tandem Repeats by Long-Read Sequencing. Front Genet 2022; 13:810595. [PMID: 35601492 PMCID: PMC9117641 DOI: 10.3389/fgene.2022.810595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 03/30/2022] [Indexed: 11/23/2022] Open
Abstract
Background: Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases and the regulation of gene expression. Long-read sequencing (LRS) offers a potential solution to genome-wide STR analysis. However, characterizing STRs in human genomes using LRS on a large population scale has not been reported. Methods: We conducted the large LRS-based STR analysis in 193 unrelated samples of the Chinese population and performed genome-wide profiling of STR variation in the human genome. The repeat dynamic index (RDI) was introduced to evaluate the variability of STR. We sourced the expression data from the Genotype-Tissue Expression to explore the tissue specificity of highly variable STRs related genes across tissues. Enrichment analyses were also conducted to identify potential functional roles of the high variable STRs. Results: This study reports the large-scale analysis of human STR variation by LRS and offers a reference STR database based on the LRS dataset. We found that the disease-associated STRs (dSTRs) and STRs associated with the expression of nearby genes (eSTRs) were highly variable in the general population. Moreover, tissue-specific expression analysis showed that those highly variable STRs related genes presented the highest expression level in brain tissues, and enrichment pathways analysis found those STRs are involved in synaptic function-related pathways. Conclusion: Our study profiled the genome-wide landscape of STR using LRS and highlighted the highly variable STRs in the human genome, which provide a valuable resource for studying the role of STRs in human disease and complex traits.
Collapse
Affiliation(s)
- Zhenhua Liu
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | | | - Sheng Zeng
- Department of Geriatrics, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Yanchun Yuan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Xun Zhou
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Zhenghuan Fang
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Runcheng He
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | | | | | | | - Qingtuan Meng
- Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital of University of South China, Hengyang, China
| | - Qian Xu
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Qiying Sun
- Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, China
| | - Xinxiang Yan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Lu Shen
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Hong Jiang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
| | - Kun Xia
- Centre for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Junling Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Jifeng Guo
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Fan Liang
- GrandOmics Biosciences, Beijing, China
- *Correspondence: Beisha Tang, ; Jinchen Li, ; Fan Liang,
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
- Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, China
- Centre for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
- *Correspondence: Beisha Tang, ; Jinchen Li, ; Fan Liang,
| | - Beisha Tang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
- Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital of University of South China, Hengyang, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, China
- *Correspondence: Beisha Tang, ; Jinchen Li, ; Fan Liang,
| |
Collapse
|
5
|
Xu M, Guo L, Du X, Li L, Peters BA, Deng L, Wang O, Chen F, Wang J, Jiang Z, Han J, Ni M, Yang H, Xu X, Liu X, Huang J, Fan G. Accurate Haplotype-Resolved Assembly Reveals The Origin Of Structural Variants For Human Trios. Bioinformatics 2021; 37:2095-2102. [PMID: 33538292 PMCID: PMC8613828 DOI: 10.1093/bioinformatics/btab068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 12/07/2020] [Accepted: 01/28/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Achieving a near complete understanding of how the genome of an individual affects the phenotypes of that individual requires deciphering the order of variations along homologous chromosomes in species with diploid genomes. However, true diploid assembly of long-range haplotypes remains challenging. RESULTS To address this, we have developed Haplotype-resolved Assembly for Synthetic long reads using a Trio-binning strategy, or HAST, which uses parental information to classify reads into maternal or paternal. Once sorted, these reads are used to independently de novo assemble the parent-specific haplotypes. We applied HAST to co-barcoded second-generation sequencing data from an Asian individual, resulting in a haplotype assembly covering 94.7% of the reference genome with a scaffold N50 longer than 11 Mb. The high haplotyping precision (∼99.7%) and recall (∼95.9%) represents a substantial improvement over the commonly used tool for assembling co-barcoded reads (Supernova), and is comparable to a trio-binning-based third generation long-read based assembly method (TrioCanu) but with a significantly higher single-base accuracy (up to 99.99997% (Q65)). This makes HAST a superior tool for accurate haplotyping and future haplotype-based studies. AVAILABILITY The code of the analysis is available at https://github.com/BGI-Qingdao/HAST. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mengyang Xu
- BGI-QingDao, Qingdao, 266555, China.,BGI-Shenzhen, Shenzhen, 518083, China
| | - Lidong Guo
- BGI-QingDao, Qingdao, 266555, China.,BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China
| | - Xiao Du
- BGI-QingDao, Qingdao, 266555, China.,BGI-Shenzhen, Shenzhen, 518083, China
| | - Lei Li
- BGI-QingDao, Qingdao, 266555, China.,School of Future Technology, University of Chinese Academy of Sciences, Beijing, 101408, China
| | - Brock A Peters
- BGI-Shenzhen, Shenzhen, 518083, China.,Complete Genomics Inc, 2904 Orchard Pkwy, San Jose, California, 95134, USA
| | - Li Deng
- BGI-QingDao, Qingdao, 266555, China
| | - Ou Wang
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Fang Chen
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | - Jun Wang
- BGI-QingDao, Qingdao, 266555, China
| | | | | | - Ming Ni
- BGI-QingDao, Qingdao, 266555, China.,BGI-Shenzhen, Shenzhen, 518083, China
| | | | - Xun Xu
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Xin Liu
- BGI-QingDao, Qingdao, 266555, China.,BGI-Shenzhen, Shenzhen, 518083, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Jie Huang
- National Institutes for food and drug Control (NIFDC), No.2 Tiantan Xili, Dongcheng District, Beijing, 10050, China
| | - Guangyi Fan
- BGI-QingDao, Qingdao, 266555, China.,BGI-Shenzhen, Shenzhen, 518083, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| |
Collapse
|