1
|
Yang X, Zheng G, Jia P, Wang S, Ye K. Pindel-TD: A Tandem Duplication Detector Based on A Pattern Growth Approach. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae008. [PMID: 38862430 DOI: 10.1093/gpbjnl/qzae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 10/24/2023] [Accepted: 11/03/2023] [Indexed: 06/13/2024]
Abstract
Tandem duplication (TD) is a major type of structural variations (SVs) that plays an important role in novel gene formation and human diseases. However, TDs are often missed or incorrectly classified as insertions by most modern SV detection methods due to the lack of specialized operation on TD-related mutational signals. Herein, we developed a TD detection module for the Pindel tool, referred to as Pindel-TD, based on a TD-specific pattern growth approach. Pindel-TD is capable of detecting TDs with a wide size range at single nucleotide resolution. Using simulated and real read data from HG002, we demonstrated that Pindel-TD outperforms other leading methods in terms of precision, recall, F1-score, and robustness. Furthermore, by applying Pindel-TD to data generated from the K562 cancer cell line, we identified a TD located at the seventh exon of SAGE1, providing an explanation for its high expression. Pindel-TD is available for non-commercial use at https://github.com/xjtu-omics/pindel.
Collapse
Affiliation(s)
- Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Gaoyang Zheng
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
| | - Peng Jia
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Songbo Wang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Kai Ye
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Faculty of Science, Leiden University, Leiden 2311 EZ, Netherland
| |
Collapse
|
2
|
Denti L, Khorsand P, Bonizzoni P, Hormozdiari F, Chikhi R. SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads. Nat Methods 2023; 20:550-558. [PMID: 36550274 DOI: 10.1038/s41592-022-01674-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022]
Abstract
Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)-a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome.
Collapse
Affiliation(s)
- Luca Denti
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA, USA.
- UC Davis MIND Institute, Sacramento, CA, USA.
- Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA, USA.
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.
| |
Collapse
|
3
|
Yang J, Chaisson MJP. TT-Mars: structural variants assessment based on haplotype-resolved assemblies. Genome Biol 2022; 23:110. [PMID: 35524317 PMCID: PMC9077962 DOI: 10.1186/s13059-022-02666-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 03/30/2022] [Indexed: 01/30/2023] Open
Abstract
Variant benchmarking is often performed by comparing a test callset to a gold standard set of variants. In repetitive regions of the genome, it may be difficult to establish what is the truth for a call, for example, when different alignment scoring metrics provide equally supported but different variant calls on the same data. Here, we provide an alternative approach, TT-Mars, that takes advantage of the recent production of high-quality haplotype-resolved genome assemblies by providing false discovery rates for variant calls based on how well their call reflects the content of the assembly, rather than comparing calls themselves.
Collapse
Affiliation(s)
- Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
4
|
Quan C, Lu H, Lu Y, Zhou G. Population-scale genotyping of structural variation in the era of long-read sequencing. Comput Struct Biotechnol J 2022; 20:2639-2647. [PMID: 35685364 PMCID: PMC9163579 DOI: 10.1016/j.csbj.2022.05.047] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 05/24/2022] [Accepted: 05/24/2022] [Indexed: 11/29/2022] Open
Abstract
Population-scale studies of structural variation (SV) are growing rapidly worldwide with the development of long-read sequencing technology, yielding a considerable number of novel SVs and complete gap-closed genome assemblies. Herein, we highlight recent studies using a hybrid sequencing strategy and present the challenges toward large-scale genotyping for SVs due to the reference bias. Genotyping SVs at a population scale remains challenging, which severely impacts genotype-based population genetic studies or genome-wide association studies of complex diseases. We summarize academic efforts to improve genotype quality through linear or graph representations of reference and alternative alleles. Graph-based genotypers capable of integrating diverse genetic information are effectively applied to large and diverse cohorts, contributing to unbiased downstream analysis. Meanwhile, there is still an urgent need in this field for efficient tools to construct complex graphs and perform sequence-to-graph alignments.
Collapse
Affiliation(s)
- Cheng Quan
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
| | - Hao Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
| | - Yiming Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
- Hebei University, Baoding, Hebei Province 071002, PR China
- Corresponding authors at: Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, PR China (G. Zhou). Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850, PR China (Y. Lu).
| | - Gangqiao Zhou
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
- Collaborative Innovation Center for Personalized Cancer Medicine, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu Province 211166, PR China
- Medical College of Guizhou University, Guiyang, Guizhou Province 550025, PR China
- Hebei University, Baoding, Hebei Province 071002, PR China
- Corresponding authors at: Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, PR China (G. Zhou). Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850, PR China (Y. Lu).
| |
Collapse
|