1
|
Masood D, Ren L, Nguyen C, Brundu FG, Zheng L, Zhao Y, Jaeger E, Li Y, Cha SW, Halpern A, Truong S, Virata M, Yan C, Chen Q, Pang A, Alberto R, Xiao C, Yang Z, Chen W, Wang C, Cross F, Catreux S, Shi L, Beaver JA, Xiao W, Meerzaman DM. Evaluation of somatic copy number variation detection by NGS technologies and bioinformatics tools on a hyper-diploid cancer genome. Genome Biol 2024; 25:163. [PMID: 38902799 PMCID: PMC11188507 DOI: 10.1186/s13059-024-03294-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/29/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Copy number variation (CNV) is a key genetic characteristic for cancer diagnostics and can be used as a biomarker for the selection of therapeutic treatments. Using data sets established in our previous study, we benchmark the performance of cancer CNV calling by six most recent and commonly used software tools on their detection accuracy, sensitivity, and reproducibility. In comparison to other orthogonal methods, such as microarray and Bionano, we also explore the consistency of CNV calling across different technologies on a challenging genome. RESULTS While consistent results are observed for copy gain, loss, and loss of heterozygosity (LOH) calls across sequencing centers, CNV callers, and different technologies, variation of CNV calls are mostly affected by the determination of genome ploidy. Using consensus results from six CNV callers and confirmation from three orthogonal methods, we establish a high confident CNV call set for the reference cancer cell line (HCC1395). CONCLUSIONS NGS technologies and current bioinformatics tools can offer reliable results for detection of copy gain, loss, and LOH. However, when working with a hyper-diploid genome, some software tools can call excessive copy gain or loss due to inaccurate assessment of genome ploidy. With performance matrices on various experimental conditions, this study raises awareness within the cancer research community for the selection of sequencing platforms, sample preparation, sequencing coverage, and the choice of CNV detection tools.
Collapse
Affiliation(s)
- Daniall Masood
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Cu Nguyen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | | | - Lily Zheng
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Yong Li
- Illumina Inc., San Diego, CA, USA
| | | | | | | | | | - Chunhua Yan
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Qingrong Chen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Andy Pang
- Bionano Genomics, San Diego, CA, 20892, USA
| | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Librarssy of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Zhaowei Yang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Wanqiu Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Charles Wang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Frank Cross
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
| | | | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Julia A Beaver
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
- Oncology Center of Excellence, Food and Drug Administration, Silver Spring, MD, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA.
| | - Daoud M Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA.
| |
Collapse
|
2
|
Gjoni K, Pollard KS. SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models. Bioinformatics 2024; 40:btae340. [PMID: 38796686 PMCID: PMC11153836 DOI: 10.1093/bioinformatics/btae340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 05/04/2024] [Accepted: 05/24/2024] [Indexed: 05/28/2024] Open
Abstract
SUMMARY The increasing development of sequence-based machine learning models has raised the demand for manipulating sequences for this application. However, existing approaches to edit and evaluate genome sequences using models have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing and supporting in silico mutagenesis experiments. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences. AVAILABILITY AND IMPLEMENTATION SuPreMo was written in Python, and can be run using only one line of code to generate both sequences and 3D genome disruption scores. The codebase, instructions for installation and use, and tutorials are on the GitHub page: https://github.com/ketringjoni/SuPreMo.
Collapse
Affiliation(s)
- Ketrin Gjoni
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, CA 94158, United States
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, United States
| | - Katherine S Pollard
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, CA 94158, United States
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, United States
- Chan Zuckerberg Biohub, San Francisco, CA 94158, United States
| |
Collapse
|
3
|
Zhao Q, Yang S, Hao S, Chen Z, Tang L, Wu Z, Wu J, Xu M, Ma Z, Zhou L, Xu J, Qin Q. Identification of transcriptionally-active human papillomavirus integrants through nanopore sequencing reveals viable targets for gene therapy against cervical cancer. J Med Virol 2024; 96:e29769. [PMID: 38932482 DOI: 10.1002/jmv.29769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/13/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024]
Abstract
Integration of the human papillomavirus (HPV) genome into the cellular genome is a key event that leads to constitutive expression of viral oncoprotein E6/E7 and drives the progression of cervical cancer. However, HPV integration patterns differ on a case-by-case basis among related malignancies. Next-generation sequencing technologies still face challenges for interrogating HPV integration sites. In this study, utilizing Nanopore long-read sequencing, we identified 452 and 108 potential integration sites from the cervical cancer cell lines (CaSki and HeLa) and five tissue samples, respectively. Based on long Nanopore chimeric reads, we were able to analyze the methylation status of the HPV long control region (LCR), which controls oncogene E6/E7 expression, and to identify transcriptionally-active integrants among the numerous integrants. As a proof of concept, we identified an active HPV integrant in between RUNX2 and CLIC5 on chromosome 6 in the CaSki cell line, which was supported by ATAC-seq, H3K27Ac ChIP-seq, and RNA-seq analysis. Knockout of the active HPV integrant, by the CRISPR/Cas9 system, dramatically crippled cell proliferation and induced cell senescence. In conclusion, identifying transcriptionally-active HPV integrants with Nanopore sequencing can provide viable targets for gene therapy against HPV-associated cancers.
Collapse
Affiliation(s)
- Qianqian Zhao
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
- Computational Systems Biology Laboratory, Department of Bioinformatics, Shantou University Medical College, Shantou, China
| | - Shuaibing Yang
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
| | - Shijia Hao
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
| | - Zejia Chen
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Lihua Tang
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Zhaoting Wu
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Jiaxin Wu
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
| | - Mingqian Xu
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
| | - Zebiao Ma
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Li Zhou
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Jianzhen Xu
- Computational Systems Biology Laboratory, Department of Bioinformatics, Shantou University Medical College, Shantou, China
| | - Qingsong Qin
- Department of Gynecologic Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
- Laboratory of Human Virology and Oncology, Shantou University Medical College, Shantou, China
- International Science and Technology Collaboration Center for Emerging Infectious Diseases, Shantou University Medical College, Shantou, China
| |
Collapse
|
4
|
Ermini L, Driguez P. The Application of Long-Read Sequencing to Cancer. Cancers (Basel) 2024; 16:1275. [PMID: 38610953 PMCID: PMC11011098 DOI: 10.3390/cancers16071275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Luca Ermini
- NORLUX Neuro-Oncology Laboratory, Department of Cancer Research, Luxembourg Institute of Health, L-1210 Luxembourg, Luxembourg
| | - Patrick Driguez
- Bioscience Core Lab, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
5
|
Wang S, Lin J, Jia P, Xu T, Li X, Liu Y, Xu D, Bush SJ, Meng D, Ye K. De novo and somatic structural variant discovery with SVision-pro. Nat Biotechnol 2024:10.1038/s41587-024-02190-7. [PMID: 38519720 DOI: 10.1038/s41587-024-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 02/27/2024] [Indexed: 03/25/2024]
Abstract
Long-read-based de novo and somatic structural variant (SV) discovery remains challenging, necessitating genomic comparison between samples. We developed SVision-pro, a neural-network-based instance segmentation framework that represents genome-to-genome-level sequencing differences visually and discovers SV comparatively between genomes without any prerequisite for inference models. SVision-pro outperforms state-of-the-art approaches, in particular, the resolving of complex SVs is improved, with low Mendelian error rates, high sensitivity of low-frequency SVs and reduced false-positive rates compared with SV merging approaches.
Collapse
Affiliation(s)
- Songbo Wang
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Peng Jia
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xiujuan Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yuezhuangnan Liu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Dan Xu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Deyu Meng
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
- Macau Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau
- Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China
| | - Kai Ye
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
6
|
Liu YH, Luo C, Golding SG, Ioffe JB, Zhou XM. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun 2024; 15:2447. [PMID: 38503752 PMCID: PMC10951360 DOI: 10.1038/s41467-024-46614-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 03/04/2024] [Indexed: 03/21/2024] Open
Abstract
Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Can Luo
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Staunton G Golding
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Jacob B Ioffe
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, 37235, Nashville, TN, USA.
| |
Collapse
|
7
|
Wang Y, Chen Y, Gao J, Xie H, Guo Y, Yang J, Liu J, Chen Z, Li Q, Li M, Ren J, Wen L, Tang F. Mapping crossover events of mouse meiotic recombination by restriction fragment ligation-based Refresh-seq. Cell Discov 2024; 10:26. [PMID: 38443370 PMCID: PMC10915157 DOI: 10.1038/s41421-023-00638-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 12/11/2023] [Indexed: 03/07/2024] Open
Abstract
Single-cell whole-genome sequencing methods have undergone great improvements over the past decade. However, allele dropout, which means the inability to detect both alleles simultaneously in an individual diploid cell, largely restricts the application of these methods particularly for medical applications. Here, we develop a new single-cell whole-genome sequencing method based on third-generation sequencing (TGS) platform named Refresh-seq (restriction fragment ligation-based genome amplification and TGS). It is based on restriction endonuclease cutting and ligation strategy in which two alleles in an individual cell can be cut into equal fragments and tend to be amplified simultaneously. As a new single-cell long-read genome sequencing method, Refresh-seq features much lower allele dropout rate compared with SMOOTH-seq. Furthermore, we apply Refresh-seq to 688 sperm cells and 272 female haploid cells (secondary polar bodies and parthenogenetic oocytes) from F1 hybrid mice. We acquire high-resolution genetic map of mouse meiosis recombination at low sequencing depth and reveal the sexual dimorphism in meiotic crossovers. We also phase the structure variations (deletions and insertions) in sperm cells and female haploid cells with high precision. Refresh-seq shows great performance in screening aneuploid sperm cells and oocytes due to the low allele dropout rate and has great potential for medical applications such as preimplantation genetic diagnosis.
Collapse
Affiliation(s)
- Yan Wang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Yijun Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Junpeng Gao
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Emergency Center, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Haoling Xie
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Yuqing Guo
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Jingwei Yang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Qingqing Li
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Mengyao Li
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Jie Ren
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- Changping Laboratory, Beijing, China.
| |
Collapse
|
8
|
LoTempio J, Delot E, Vilain E. Benchmarking long-read genome sequence alignment tools for human genomics applications. PeerJ 2023; 11:e16515. [PMID: 38130927 PMCID: PMC10734412 DOI: 10.7717/peerj.16515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/02/2023] [Indexed: 12/23/2023] Open
Abstract
Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform.
Collapse
Affiliation(s)
- Jonathan LoTempio
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| | - Emmanuele Delot
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, United States of America
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, United States of America
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| |
Collapse
|
9
|
Gjoni K, Pollard KS. SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565556. [PMID: 37961123 PMCID: PMC10635135 DOI: 10.1101/2023.11.03.565556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Computationally editing genome sequences is a common bioinformatics task, but current approaches have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing in silico mutagenesis. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences.
Collapse
Affiliation(s)
- Ketrin Gjoni
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
10
|
Geoffroy V, Lamouche JB, Guignard T, Nicaise S, Kress A, Scheidecker S, Le Béchec A, Muller J. The AnnotSV webserver in 2023: updated visualization and ranking. Nucleic Acids Res 2023:7175348. [PMID: 37216590 DOI: 10.1093/nar/gkad426] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 04/20/2023] [Accepted: 05/09/2023] [Indexed: 05/24/2023] Open
Abstract
Much of the human genetics variant repertoire is composed of single nucleotide variants (SNV) and small insertion/deletions (indel) but structural variants (SV) remain a major part of our modified DNA. SV detection has often been a complex question to answer either because of the necessity to use different technologies (array CGH, SNP array, Karyotype, Optical Genome Mapping…) to detect each category of SV or to get an appropriate resolution (Whole Genome Sequencing). Thanks to the deluge of pangenomic analysis, Human geneticists are accumulating SV and their interpretation remains time consuming and challenging. The AnnotSV webserver (https://www.lbgi.fr/AnnotSV/) aims at being an efficient tool to (i) annotate and interpret SV potential pathogenicity in the context of human diseases, (ii) recognize potential false positive variants from all the SV identified and (iii) visualize the patient variants repertoire. The most recent developments in the AnnotSV webserver are: (i) updated annotations sources and ranking, (ii) three novel output formats to allow diverse utilization (analysis, pipelines), as well as (iii) two novel user interfaces including an interactive circos view.
Collapse
Affiliation(s)
- Véronique Geoffroy
- Université de Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
| | - Jean-Baptiste Lamouche
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | | | - Samuel Nicaise
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Arnaud Kress
- Complex Systems and Translational Bioinformatics, ICube, UMR 7357, University of Strasbourg, CNRS, FMTS, Strasbourg, France
| | - Sophie Scheidecker
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
- Laboratoires de Diagnostic Génétique, IGMA, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Antony Le Béchec
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Jean Muller
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
- Laboratoires de Diagnostic Génétique, IGMA, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| |
Collapse
|
11
|
Weisweiler M, Stich B. Benchmarking of structural variant detection in the tetraploid potato genome using linked-read sequencing. Genomics 2023; 115:110568. [PMID: 36702293 DOI: 10.1016/j.ygeno.2023.110568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/12/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023]
Abstract
It has recently been shown that structural variants (SV) can have a higher impact on gene expression variation compared to single nucleotide variants (SNV) in different plant species. Additionally, SV were associated with phenotypic variation in several crops. However, compared to the established SV detection based on short-read sequencing, less approaches were described for linked-read based SV calling. We therefore evaluated the performance of six linked-read SV callers compared to an established short-read SV caller based on simulated linked-reads in tetraploid potato. The objectives of our study were to i) compare the performance of SV callers based on linked-read sequencing to short-read sequencing, ii) examine the influence of SV type, SV length, haplotype incidence (HI), as well as sequencing coverage on the SV calling performance in the tetraploid potato genome, and iii) evaluate the accuracy of detecting insertions by linked-read compared to short-read sequencing. We observed high break point resolutions (BPR) detecting short SV and slightly lower BPR for large SV. Our observations highlighted the importance of short-read signals provided by Manta and LinkedSV to detect short SV. Manta and NAIBR performed well for detecting larger deletions, inversions, and duplications. Detected large SV were weakly influenced by the HI. Furthermore, we illustrated that large insertions can be assembled by Novel-X. Our results suggest the usage of the short-read and linked-read SV callers Manta, NAIBR, LinkedSV, and Novel-X based on at least 90x linked-read sequencing coverage to ensure the detection of a broad range of SV in the tetraploid potato genome.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany; Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, 50829 Köln, Germany.
| |
Collapse
|