1
|
Lai J, Yang Y, Liu Y, Scharpf RB, Karchin R. Assessing the merits: an opinion on the effectiveness of simulation techniques in tumor subclonal reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae094. [PMID: 38948008 PMCID: PMC11213631 DOI: 10.1093/bioadv/vbae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 06/15/2024] [Indexed: 07/02/2024]
Abstract
Summary Neoplastic tumors originate from a single cell, and their evolution can be traced through lineages characterized by mutations, copy number alterations, and structural variants. These lineages are reconstructed and mapped onto evolutionary trees with algorithmic approaches. However, without ground truth benchmark sets, the validity of an algorithm remains uncertain, limiting potential clinical applicability. With a growing number of algorithms available, there is urgent need for standardized benchmark sets to evaluate their merits. Benchmark sets rely on in silico simulations of tumor sequence, but there are no accepted standards for simulation tools, presenting a major obstacle to progress in this field. Availability and implementation All analysis done in the paper was based on publicly available data from the publication of each accessed tool.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yi Yang
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Robert B Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
2
|
Lai J, Liu Y, Scharpf RB, Karchin R. Evaluation of simulation methods for tumor subclonal reconstruction. ARXIV 2024:arXiv:2402.09599v1. [PMID: 38410652 PMCID: PMC10896360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Most neoplastic tumors originate from a single cell, and their evolution can be genetically traced through lineages characterized by common alterations such as small somatic mutations (SSMs), copy number alterations (CNAs), structural variants (SVs), and aneuploidies. Due to the complexity of these alterations in most tumors and the errors introduced by sequencing protocols and calling algorithms, tumor subclonal reconstruction algorithms are necessary to recapitulate the DNA sequence composition and tumor evolution in silico. With a growing number of these algorithms available, there is a pressing need for consistent and comprehensive benchmarking, which relies on realistic tumor sequencing generated by simulation tools. Here, we examine the current simulation methods, identifying their strengths and weaknesses, and provide recommendations for their improvement. Our review also explores potential new directions for research in this area. This work aims to serve as a resource for understanding and enhancing tumor genomic simulations, contributing to the advancement of the field.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Robert B. Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
3
|
Wang X, Xu Y, Liu R, Lai X, Liu Y, Wang S, Zhang X, Wang J. PEcnv: accurate and efficient detection of copy number variations of various lengths. Brief Bioinform 2022; 23:6686740. [PMID: 36056740 PMCID: PMC9487654 DOI: 10.1093/bib/bbac375] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 06/19/2022] [Accepted: 08/08/2022] [Indexed: 11/14/2022] Open
Abstract
Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model at their foundation is weakened by two critical computational issues: (i) identifying the optimal setting on the sliding window and (ii) correcting for bias and noise. We designed a statistical process model to overcome these limitations by calculating regional read depths via an exponentially weighted moving average strategy. A one-run detection of CNVs of various lengths is then achieved by a dynamic sliding window, whose size is self-adopted according to the weighted averages. We also designed a novel bias/noise reduction model, accompanied by the moving average, which can handle complicated patterns and extend training data. This model, called PEcnv, accurately detects CNVs ranging from kb-scale to chromosome-arm level. The model performance was validated with simulation samples and real samples. Comparative analysis showed that PEcnv outperforms current popular approaches. Notably, PEcnv provided considerable advantages in detecting small CNVs (1 kb–1 Mb) in panel sequencing data. Thus, PEcnv fills the gap left by existing methods focusing on large CNVs. PEcnv may have broad applications in clinical testing where panel sequencing is the dominant strategy. Availability and implementation: Source code is freely available at https://github.com/Sherwin-xjtu/PEcnv
Collapse
Affiliation(s)
- Xuwen Wang
- Department of Computer Science and Technology, School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Ying Xu
- Department of Computer Science and Technology, School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Ruoyu Liu
- Department of Computer Science and Technology, School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xin Lai
- Department of Computer Science and Technology, School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Yuqian Liu
- Department of Computer Science and Technology, School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Shenjie Wang
- Department of Computer Science and Technology, School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xuanping Zhang
- Department of Computer Science and Technology, School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Jiayin Wang
- Department of Computer Science and Technology, School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
4
|
Angaroni F, Guidi A, Ascolani G, d'Onofrio A, Antoniotti M, Graudenzi A. J-SPACE: a Julia package for the simulation of spatial models of cancer evolution and of sequencing experiments. BMC Bioinformatics 2022; 23:269. [PMID: 35804300 PMCID: PMC9270769 DOI: 10.1186/s12859-022-04779-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 06/09/2022] [Indexed: 11/15/2022] Open
Abstract
Background The combined effects of biological variability and measurement-related errors on cancer sequencing data remain largely unexplored. However, the spatio-temporal simulation of multi-cellular systems provides a powerful instrument to address this issue. In particular, efficient algorithmic frameworks are needed to overcome the harsh trade-off between scalability and expressivity, so to allow one to simulate both realistic cancer evolution scenarios and the related sequencing experiments, which can then be used to benchmark downstream bioinformatics methods. Result We introduce a Julia package for SPAtial Cancer Evolution (J-SPACE), which allows one to model and simulate a broad set of experimental scenarios, phenomenological rules and sequencing settings.Specifically, J-SPACE simulates the spatial dynamics of cells as a continuous-time multi-type birth-death stochastic process on a arbitrary graph, employing different rules of interaction and an optimised Gillespie algorithm. The evolutionary dynamics of genomic alterations (single-nucleotide variants and indels) is simulated either under the Infinite Sites Assumption or several different substitution models, including one based on mutational signatures. After mimicking the spatial sampling of tumour cells, J-SPACE returns the related phylogenetic model, and allows one to generate synthetic reads from several Next-Generation Sequencing (NGS) platforms, via the ART read simulator. The results are finally returned in standard FASTA, FASTQ, SAM, ALN and Newick file formats. Conclusion J-SPACE is designed to efficiently simulate the heterogeneous behaviour of a large number of cancer cells and produces a rich set of outputs. Our framework is useful to investigate the emergent spatial dynamics of cancer subpopulations, as well as to assess the impact of incomplete sampling and of experiment-specific errors. Importantly, the output of J-SPACE is designed to allow the performance assessment of downstream bioinformatics pipelines processing NGS data. J-SPACE is freely available at: https://github.com/BIMIB-DISCo/J-Space.jl.
Collapse
Affiliation(s)
- Fabrizio Angaroni
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy.
| | - Alessandro Guidi
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy
| | - Gianluca Ascolani
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy
| | - Alberto d'Onofrio
- Department of Mathematics and Geosciences, Univ. of Trieste, Trieste, Italy
| | - Marco Antoniotti
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy.,Bicocca Bioinformatics, Biostatistics and Bioimaging Centre (B4), Milan, Italy
| | - Alex Graudenzi
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy.,Bicocca Bioinformatics, Biostatistics and Bioimaging Centre (B4), Milan, Italy.,Inst. of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Segrate, Italy
| |
Collapse
|
5
|
Lei Y, Meng Y, Guo X, Ning K, Bian Y, Li L, Hu Z, Anashkina AA, Jiang Q, Dong Y, Zhu X. Overview of structural variation calling: Simulation, identification, and visualization. Comput Biol Med 2022; 145:105534. [DOI: 10.1016/j.compbiomed.2022.105534] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 04/09/2022] [Accepted: 04/14/2022] [Indexed: 12/11/2022]
|
6
|
Identification of Copy Number Alterations from Next-Generation Sequencing Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:55-74. [DOI: 10.1007/978-3-030-91836-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
7
|
Dierckxsens N, Li T, Vermeesch JR, Xie Z. A benchmark of structural variation detection by long reads through a realistic simulated model. Genome Biol 2021; 22:342. [PMID: 34911553 PMCID: PMC8672642 DOI: 10.1186/s13059-021-02551-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Accepted: 11/22/2021] [Indexed: 12/30/2022] Open
Abstract
Accurate simulations of structural variation distributions and sequencing data are crucial for the development and benchmarking of new tools. We develop Sim-it, a straightforward tool for the simulation of both structural variation and long-read data. These simulations from Sim-it reveal the strengths and weaknesses for current available structural variation callers and long-read sequencing platforms. With these findings, we develop a new method (combiSV) that can combine the results from structural variation callers into a superior call set with increased recall and precision, which is also observed for the latest structural variation benchmark set developed by the GIAB Consortium.
Collapse
Affiliation(s)
- Nicolas Dierckxsens
- Center for Human Genetics, University Hospital Leuven and KU Leuven, Leuven, Belgium. .,State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Joris R Vermeesch
- Center for Human Genetics, University Hospital Leuven and KU Leuven, Leuven, Belgium
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
8
|
Yuan X, Xu X, Zhao H, Duan J. ERINS: Novel Sequence Insertion Detection by Constructing an Extended Reference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1893-1901. [PMID: 31751246 DOI: 10.1109/tcbb.2019.2954315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Next generation sequencing technology has led to the development of methods for the detection of novel sequence insertions (nsINS). Multiple signatures from short reads are usually extracted to improve nsINS detection performance. However, characterization of nsINSs larger than the mean insert size is still challenging. This article presents a new method, ERINS, to detect nsINS contents and genotypes of full spectrum range size. It integrates the features of structural variations and mapping states of split reads to find nsINS breakpoints, and then adopts a left-most mapping strategy to infer nsINS content by iteratively extending the standard reference at each breakpoint. Finally, it realigns all reads to the extended reference and infers nsINS genotypes through statistical testing on read counts. We test and validate the performance of ERINS on simulation and real sequencing datasets. The simulation experimental results demonstrate that it outperforms several peer methods with respect to sensitivity and precision. The real data application indicates that ERINS obtains high consistent results with those of previously reported and detects nsINSs over 200 base pairs that many other methods fail. In conclusion, ERINS can be used as a supplement to existing tools and will become a routine approach for characterizing nsINSs.
Collapse
|
9
|
Bolognini D, Sanders A, Korbel JO, Magi A, Benes V, Rausch T. VISOR: a versatile haplotype-aware structural variant simulator for short- and long-read sequencing. Bioinformatics 2020; 36:1267-1269. [PMID: 31589307 DOI: 10.1093/bioinformatics/btz719] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 07/29/2019] [Accepted: 10/01/2019] [Indexed: 12/19/2022] Open
Abstract
SUMMARY VISOR is a tool for haplotype-specific simulations of simple and complex structural variants (SVs). The method is applicable to haploid, diploid or higher ploidy simulations for bulk or single-cell sequencing data. SVs are implanted into FASTA haplotypes at single-basepair resolution, optionally with nearby single-nucleotide variants. Short or long reads are drawn at random from these haplotypes using standard error profiles. Double- or single-stranded data can be simulated and VISOR supports the generation of haplotype-tagged BAM files. The tool further includes methods to interactively visualize simulated variants in single-stranded data. The versatility of VISOR is unmet by comparable tools and it lays the foundation to simulate haplotype-resolved cancer heterogeneity data in bulk or at single-cell resolution. AVAILABILITY AND IMPLEMENTATION VISOR is implemented in python 3.6, open-source and freely available at https://github.com/davidebolo1993/VISOR. Documentation is available at https://davidebolo1993.github.io/visordoc/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Florence 50134, Italy.,European Molecular Biology Laboratory (EMBL), GeneCore, Heidelberg 69917, Germany
| | - Ashley Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg 69917, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg 69917, Germany
| | - Alberto Magi
- Department of Information Engineering, University of Florence, Florence 50134, Italy
| | - Vladimir Benes
- European Molecular Biology Laboratory (EMBL), GeneCore, Heidelberg 69917, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), GeneCore, Heidelberg 69917, Germany.,European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg 69917, Germany
| |
Collapse
|
10
|
Yu Z, Du F, Ban R, Zhang Y. SimuSCoP: reliably simulate Illumina sequencing data based on position and context dependent profiles. BMC Bioinformatics 2020; 21:331. [PMID: 32703148 PMCID: PMC7379788 DOI: 10.1186/s12859-020-03665-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Accepted: 07/16/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A number of simulators have been developed for emulating next-generation sequencing data by incorporating known errors such as base substitutions and indels. However, their practicality may be degraded by functional and runtime limitations. Particularly, the positional and genomic contextual information is not effectively utilized for reliably characterizing base substitution patterns, as well as the positional and contextual difference of Phred quality scores is not fully investigated. Thus, a more effective and efficient bioinformatics tool is sorely required. RESULTS Here, we introduce a novel tool, SimuSCoP, to reliably emulate complex DNA sequencing data. The base substitution patterns and the statistical behavior of quality scores in Illumina sequencing data are fully explored and integrated into the simulation model for reliably emulating datasets for different applications. In addition, an integrated and easy-to-use pipeline is employed in SimuSCoP to facilitate end-to-end simulation of complex samples, and high runtime efficiency is achieved by implementing the tool to run in multithreading with low memory consumption. These features enable SimuSCoP to gets substantial improvements in reliability, functionality, practicality and runtime efficiency. The tool is comprehensively evaluated in multiple aspects including consistency of profiles, simulation of genomic variations and complex tumor samples, and the results demonstrate the advantages of SimuSCoP over existing tools. CONCLUSIONS SimuSCoP, a new bioinformatics tool is developed to learn informative profiles from real sequencing data and reliably mimic complex data by introducing various genomic variations. We believe that the presented work will catalyse new development of downstream bioinformatics methods for analyzing sequencing data.
Collapse
Affiliation(s)
- Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, 750021, China.
| | - Fang Du
- School of Information Engineering, Ningxia University, Yinchuan, 750021, China
| | - Rongjun Ban
- Hefei National Laboratory for Physical Sciences at Microscale, USTC-SJH Joint Center for Human Reproduction and Genetics, School of Life Sciences, University of Science and Technology of China, Hefei, 230027, China
| | - Yuanwei Zhang
- Hefei National Laboratory for Physical Sciences at Microscale, USTC-SJH Joint Center for Human Reproduction and Genetics, School of Life Sciences, University of Science and Technology of China, Hefei, 230027, China.
| |
Collapse
|
11
|
Wilson-Sánchez D, Lup SD, Sarmiento-Mañús R, Ponce MR, Micol JL. Next-generation forward genetic screens: using simulated data to improve the design of mapping-by-sequencing experiments in Arabidopsis. Nucleic Acids Res 2020; 47:e140. [PMID: 31544937 PMCID: PMC6868388 DOI: 10.1093/nar/gkz806] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 09/07/2019] [Accepted: 09/10/2019] [Indexed: 12/25/2022] Open
Abstract
Forward genetic screens have successfully identified many genes and continue to be powerful tools for dissecting biological processes in Arabidopsis and other model species. Next-generation sequencing technologies have revolutionized the time-consuming process of identifying the mutations that cause a phenotype of interest. However, due to the cost of such mapping-by-sequencing experiments, special attention should be paid to experimental design and technical decisions so that the read data allows to map the desired mutation. Here, we simulated different mapping-by-sequencing scenarios. We first evaluated which short-read technology was best suited for analyzing gene-rich genomic regions in Arabidopsis and determined the minimum sequencing depth required to confidently call single nucleotide variants. We also designed ways to discriminate mutagenesis-induced mutations from background Single Nucleotide Polymorphisms in mutants isolated in Arabidopsis non-reference lines. In addition, we simulated bulked segregant mapping populations for identifying point mutations and monitored how the size of the mapping population and the sequencing depth affect mapping precision. Finally, we provide the computational basis of a protocol that we already used to map T-DNA insertions with paired-end Illumina-like reads, using very low sequencing depths and pooling several mutants together; this approach can also be used with single-end reads as well as to map any other insertional mutagen. All these simulations proved useful for designing experiments that allowed us to map several mutations in Arabidopsis.
Collapse
Affiliation(s)
- David Wilson-Sánchez
- Instituto de Bioingeniería, Universidad Miguel Hernández, Campus de Elche, 03202 Elche, Spain
| | - Samuel Daniel Lup
- Instituto de Bioingeniería, Universidad Miguel Hernández, Campus de Elche, 03202 Elche, Spain
| | - Raquel Sarmiento-Mañús
- Instituto de Bioingeniería, Universidad Miguel Hernández, Campus de Elche, 03202 Elche, Spain
| | - María Rosa Ponce
- Instituto de Bioingeniería, Universidad Miguel Hernández, Campus de Elche, 03202 Elche, Spain
| | - José Luis Micol
- Instituto de Bioingeniería, Universidad Miguel Hernández, Campus de Elche, 03202 Elche, Spain
| |
Collapse
|
12
|
Yuan X, Gao M, Bai J, Duan J. SVSR: A Program to Simulate Structural Variations and Generate Sequencing Reads for Multiple Platforms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1082-1091. [PMID: 30334804 DOI: 10.1109/tcbb.2018.2876527] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Structural variation accounts for a major fraction of mutations in the human genome and confers susceptibility to complex diseases. Next generation sequencing along with the rapid development of computational methods provides a cost-effective procedure to detect such variations. Simulation of structural variations and sequencing reads with real characteristics is essential for benchmarking the computational methods. Here, we develop a new program, SVSR, to simulate five types of structural variations (indels, tandem duplication, CNVs, inversions, and translocations) and SNPs for the human genome and to generate sequencing reads with features from popular platforms (Illumina, SOLiD, 454, and Ion Torrent). We adopt a selection model trained from real data to predict copy number states, starting from the first site of a particular genome to the end. Furthermore, we utilize references of microbial genomes to produce insertion fragments and design probabilistic models to imitate inversions and translocations. Moreover, we create platform-specific errors and base quality profiles to generate normal, tumor, or normal-tumor mixture reads. Experimental results show that SVSR could capture more features that are realistic and generate datasets with satisfactory quality scores. SVSR is able to evaluate the performance of structural variation detection methods and guide the development of new computational methods.
Collapse
|
13
|
Li N, Yang J, Zhu W, Liang Y. MVSC: A Multi-variation Simulator of Cancer Genome. Comb Chem High Throughput Screen 2020; 23:326-333. [PMID: 32183666 DOI: 10.2174/1386207323666200317121136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 11/29/2019] [Accepted: 02/27/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Many forms of variations exist in the genome, which are the main causes of individual phenotypic differences. The detection of variants, especially those located in the tumor genome, still faces many challenges due to the complexity of the genome structure. Thus, the performance assessment of variation detection tools using next-generation sequencing platforms is urgently needed. METHOD We have created a software package called the Multi-Variation Simulator of Cancer genomes (MVSC) to simulate common genomic variants, including single nucleotide polymorphisms, small insertion and deletion polymorphisms, and structural variations (SVs), which are analogous to human somatically acquired variations. Three sets of variations embedded in genomic sequences in different periods were dynamically and sequentially simulated one by one. RESULTS In cancer genome simulation, complex SVs are important because this type of variation is characteristic of the tumor genome structure. Overlapping variations of different sizes can also coexist in the same genome regions, adding to the complexity of cancer genome architecture. Our results show that MVSC can efficiently simulate a variety of genomic variants that cannot be simulated by existing software packages. CONCLUSION The MVSC-simulated variants can be used to assess the performance of existing tools designed to detect SVs in next-generation sequencing data, and we also find that MVSC is memory and time-efficient compared with similar software packages.
Collapse
Affiliation(s)
- Ning Li
- School of Information and Electronic Engineering, Wuzhou University, Wuzhou, China
| | - Jialiang Yang
- Department of Mathematics and Statistics, Hainan Normal University, Haikou, Hainan 571158, China
| | - Wen Zhu
- Department of Mathematics and Statistics, Hainan Normal University, Haikou, Hainan 571158, China.,College of Computer Science and Electronic Engineering, Hunan University, Hunan, China
| | - Ying Liang
- College of Computer Science and Electronic Engineering, Hunan University, Hunan, China.,College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330000, China
| |
Collapse
|
14
|
Xing Y, Dabney AR, Li X, Wang G, Gill CA, Casola C. SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences From Reference Genomes. Front Genet 2020; 11:82. [PMID: 32153642 PMCID: PMC7046838 DOI: 10.3389/fgene.2020.00082] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Accepted: 01/24/2020] [Indexed: 01/26/2023] Open
Abstract
Copy number variants are duplications and deletions of the genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested Simulator of Exome Copy Number Variants (SECNVs), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.
Collapse
Affiliation(s)
- Yue Xing
- Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX, United States
- Department of Statistics, Texas A&M University, College Station, TX, United States
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States
| | - Alan R. Dabney
- Department of Statistics, Texas A&M University, College Station, TX, United States
| | - Xiao Li
- Department of Molecular and Cellular Medicine, Texas A&M University, College Station, TX, United States
| | - Guosong Wang
- Department of Animal Science, Texas A&M University, College Station, TX, United States
| | - Clare A. Gill
- Department of Animal Science, Texas A&M University, College Station, TX, United States
| | - Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University, College Station, TX, United States
| |
Collapse
|
15
|
Yang H, Lu B, Lai LH, Lim AH, Alvarez JJS, Zhai W. PSiTE: a Phylogeny guided Simulator for Tumor Evolution. Bioinformatics 2019; 35:3148-3150. [PMID: 30649258 DOI: 10.1093/bioinformatics/btz028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 12/10/2018] [Accepted: 01/08/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Simulating realistic clonal dynamics of tumors is an important topic in cancer genomics. Here, we present Phylogeny guided Simulator for Tumor Evolution, a tool that can simulate different types of tumor samples including single sector, multi-sector bulk tumor as well as single-cell tumor data under a wide range of evolutionary trajectories. Phylogeny guided Simulator for Tumor Evolution provides an efficient tool for understanding clonal evolution of cancer. AVAILABILITY AND IMPLEMENTATION PSiTE is implemented in Python and is available at https://github.com/hchyang/PSiTE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hechuan Yang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, P.R.China.,Human Genetics, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Bingxin Lu
- Human Genetics, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Lan Huong Lai
- Human Genetics, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Abner Herbert Lim
- Human Genetics, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | | | - Weiwei Zhai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, P.R.China.,Human Genetics, Genome Institute of Singapore, A*STAR, Singapore, Singapore.,National Cancer Centre Singapore, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, P.R.China
| |
Collapse
|
16
|
Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software. Nat Commun 2019; 10:3240. [PMID: 31324872 PMCID: PMC6642177 DOI: 10.1038/s41467-019-11146-4] [Citation(s) in RCA: 137] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 06/26/2019] [Indexed: 01/12/2023] Open
Abstract
In recent years, many software packages for identifying structural variants (SVs) using whole-genome sequencing data have been released. When published, a new method is commonly compared with those already available, but this tends to be selective and incomplete. The lack of comprehensive benchmarking of methods presents challenges for users in selecting methods and for developers in understanding algorithm behaviours and limitations. Here we report the comprehensive evaluation of 10 SV callers, selected following a rigorous process and spanning the breadth of detection approaches, using high-quality reference cell lines, as well as simulations. Due to the nature of available truth sets, our focus is on general-purpose rather than somatic callers. We characterise the impact on performance of event size and type, sequencing characteristics, and genomic context, and analyse the efficacy of ensemble calling and calibration of variant quality scores. Finally, we provide recommendations for both users and methods developers. A number of computational methods have been developed for calling structural variants (SVs) using short read sequencing data. Here, the authors perform a comprehensive benchmarking analysis comparing 10 general-purpose callers and provide recommendations for both users and methods developers.
Collapse
|
17
|
Roca I, González-Castro L, Fernández H, Couce ML, Fernández-Marmiesse A. Free-access copy-number variant detection tools for targeted next-generation sequencing data. MUTATION RESEARCH-REVIEWS IN MUTATION RESEARCH 2019; 779:114-125. [DOI: 10.1016/j.mrrev.2019.02.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 12/25/2018] [Accepted: 02/22/2019] [Indexed: 01/23/2023]
|
18
|
Xia LC, Ai D, Lee H, Andor N, Li C, Zhang NR, Ji HP. SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. Gigascience 2018; 7:5049476. [PMID: 29982625 PMCID: PMC6057526 DOI: 10.1093/gigascience/giy081] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 05/22/2018] [Accepted: 06/26/2018] [Indexed: 11/29/2022] Open
Abstract
Background Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes. Findings We developed SVEngine, an open-source tool to address this need. SVEngine simulates next-generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file, and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs), and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions, and translocations. Finally, SVEngine simulates sequence data that replicate the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time. Conclusions We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogeneous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift, and neighboring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use .
Collapse
Affiliation(s)
- Li Charlie Xia
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, Stanford, CA 94305
- Department of Statistics, the Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 18014
| | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083 P. R. China
| | - Hojoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, Stanford, CA 94305
| | - Noemi Andor
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, Stanford, CA 94305
| | - Chao Li
- School of Mathematics and Physics, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083 P. R. China
| | - Nancy R Zhang
- Department of Statistics, the Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 18014
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, Stanford, CA 94305
- Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304
| |
Collapse
|
19
|
Zhao M, Liu D, Qu H. Systematic review of next-generation sequencing simulators: computational tools, features and perspectives. Brief Funct Genomics 2018; 16:121-128. [PMID: 27069250 DOI: 10.1093/bfgp/elw012] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
High-throughput next-generation sequencing (NGS) technologies have rapidly generated a large volume of genomic data. To aid the development and evaluation of new statistical models and computational methods, NGS-based simulators have been proposed to construct better experimental workflows. However, the comparative performance of these NGS simulators remains unclear. In this review, we conducted a comprehensive investigation of NGS simulators for various sequencing techniques, including DNA sequencing, metagenomic sequencing, RNA-seq, ChIP-seq and bisulfite sequencing for methylation.
Collapse
|
20
|
Xia Y, Liu Y, Deng M, Xi R. Pysim-sv: a package for simulating structural variation data with GC-biases. BMC Bioinformatics 2017; 18:53. [PMID: 28361688 PMCID: PMC5374556 DOI: 10.1186/s12859-017-1464-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background Structural variations (SVs) are wide-spread in human genomes and may have important implications in disease-related and evolutionary studies. High-throughput sequencing (HTS) has become a major platform for SV detection and simulation serves as a powerful and cost-effective approach for benchmarking SV detection algorithms. Accurate performance assessment by simulation requires the simulator capable of generating simulation data with all important features of real data, such GC biases in HTS data and various complexities in tumor data. However, no available package has systematically addressed all issues in data simulation for SV benchmarking. Results Pysim-sv is a package for simulating HTS data to evaluate performance of SV detection algorithms. Pysim-sv can introduce a wide spectrum of germline and somatic genomic variations. The package contains functionalities to simulate tumor data with aneuploidy and heterogeneous subclones, which is very useful in assessing algorithm performance in tumor studies. Furthermore, Pysim-sv can introduce GC-bias, the most important and prevalent bias in HTS data, in the simulated HTS data. Conclusions Pysim-sv provides an unbiased toolkit for evaluating HTS-based SV detection algorithms.
Collapse
Affiliation(s)
- Yuchao Xia
- School of Mathematics Science and Center for Statistical Science, Peking University, Yiheyuan Road 5, Beijing, 100871, China
| | - Yun Liu
- School of Mathematics Science and Center for Statistical Science, Peking University, Yiheyuan Road 5, Beijing, 100871, China
| | - Minghua Deng
- School of Mathematics Science and Center for Statistical Science, Peking University, Yiheyuan Road 5, Beijing, 100871, China.
| | - Ruibin Xi
- School of Mathematics Science and Center for Statistical Science, Peking University, Yiheyuan Road 5, Beijing, 100871, China.
| |
Collapse
|
21
|
Global analysis of somatic structural genomic alterations and their impact on gene expression in diverse human cancers. Proc Natl Acad Sci U S A 2016; 113:13768-13773. [PMID: 27856756 DOI: 10.1073/pnas.1606220113] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Tumor genomes are mosaics of somatic structural variants (SVs) that may contribute to the activation of oncogenes or inactivation of tumor suppressors, for example, by altering gene copy number amplitude. However, there are multiple other ways in which SVs can modulate transcription, but the general impact of such events on tumor transcriptional output has not been systematically determined. Here we use whole-genome sequencing data to map SVs across 600 tumors and 18 cancers, and investigate the relationship between SVs, copy number alterations (CNAs), and mRNA expression. We find that 34% of CNA breakpoints can be clarified structurally and that most amplifications are due to tandem duplications. We observe frequent swapping of strong and weak promoters in the context of gene fusions, and find that this has a measurable global impact on mRNA levels. Interestingly, several long noncoding RNAs were strongly activated by this mechanism. Additionally, SVs were confirmed in telomere reverse transcriptase (TERT) upstream regions in several cancers, associated with elevated TERT mRNA levels. We also highlight high-confidence gene fusions supported by both genomic and transcriptomic evidence, including a previously undescribed paired box 8 (PAX8)-nuclear factor, erythroid 2 like 2 (NFE2L2) fusion in thyroid carcinoma. In summary, we combine SV, CNA, and expression data to provide insights into the structural basis of CNAs as well as the impact of SVs on gene expression in tumors.
Collapse
|
22
|
Ivakhno S, Colombo C, Tanner S, Tedder P, Berri S, Cox AJ. tHapMix: simulating tumour samples through haplotype mixtures. Bioinformatics 2016; 33:280-282. [PMID: 27605106 DOI: 10.1093/bioinformatics/btw589] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 08/11/2016] [Accepted: 09/02/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Large-scale rearrangements and copy number changes combined with different modes of clonal evolution create extensive somatic genome diversity, making it difficult to develop versatile and scalable variant calling tools and create well-calibrated benchmarks. RESULTS We developed a new simulation framework tHapMix that enables the creation of tumour samples with different ploidy, purity and polyclonality features. It easily scales to simulation of hundreds of somatic genomes, while re-use of real read data preserves noise and biases present in sequencing platforms. We further demonstrate tHapMix utility by creating a simulated set of 140 somatic genomes and showing how it can be used in training and testing of somatic copy number variant calling tools. AVAILABILITY AND IMPLEMENTATION tHapMix is distributed under an open source license and can be downloaded from https://github.com/Illumina/tHapMix CONTACT: sivakhno@illumina.comSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sergii Ivakhno
- Chesterford Research Park, Illumina Cambridge Ltd, Little Chesterford, CB10 1XL, UK
| | - Camilla Colombo
- Chesterford Research Park, Illumina Cambridge Ltd, Little Chesterford, CB10 1XL, UK
| | | | - Philip Tedder
- Chesterford Research Park, Illumina Cambridge Ltd, Little Chesterford, CB10 1XL, UK
| | - Stefano Berri
- Chesterford Research Park, Illumina Cambridge Ltd, Little Chesterford, CB10 1XL, UK
| | - Anthony J Cox
- Chesterford Research Park, Illumina Cambridge Ltd, Little Chesterford, CB10 1XL, UK
| |
Collapse
|
23
|
Ma L, Qin M, Liu B, Hu Q, Wei L, Wang J, Liu S. cnvCurator: an interactive visualization and editing tool for somatic copy number variations. BMC Bioinformatics 2015; 16:331. [PMID: 26472134 PMCID: PMC4608136 DOI: 10.1186/s12859-015-0766-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 10/08/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the most important somatic aberrations, copy number variations (CNVs) in tumor genomes is believed to have a high probability of harboring oncotargets. Detection of somatic CNVs is an essential part of cancer genome sequencing analysis, but the accuracy is usually limited due to various factors. A post-processing procedure including manual review and refinement of CNV segments is often needed in practice to achieve better accuracy. RESULTS cnvCurator is a user-friendly tool with functions specifically designed to facilitate the process of interactively visualizing and editing somatic CNV calling results. Different from other general genomics viewers, the index and display of CNV calling results in cnvCurator is segment central. It incorporates multiple CNV-specific information for concurrent, interactive display, as well as a number of relevant features allowing user to examine and curate the CNV calls. CONCLUSIONS cnvCurator provides important and practical utilities to assist the manual review and edition of results from a chosen somatic CNV caller, such that curated CNV segments will be used for down-stream applications.
Collapse
Affiliation(s)
- Lingnan Ma
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA. .,Department of Mathematics, University at Buffalo, Buffalo, NY, 14260, USA. .,College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Maochun Qin
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Biao Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Qiang Hu
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Lei Wei
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Jianmin Wang
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| | - Song Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, 14263, USA.
| |
Collapse
|