Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yuan X, Zhang J, Yang L. IntSIM: An Integrated Simulator of Next-Generation Sequencing Data. IEEE Trans Biomed Eng 2016;64:441-451. [PMID: 27164567 DOI: 10.1109/tbme.2016.2560939] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

For:	Yuan X, Zhang J, Yang L. IntSIM: An Integrated Simulator of Next-Generation Sequencing Data. IEEE Trans Biomed Eng 2016;64:441-451. [PMID: 27164567 DOI: 10.1109/tbme.2016.2560939] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Number

Cited by Other Article(s)

Liu G, Yang H, He Z. Detection of copy number variations based on a local distance using next-generation sequencing data. Front Genet 2023;14:1147761. [PMID: 37811148 PMCID: PMC10556732 DOI: 10.3389/fgene.2023.1147761] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 09/14/2023] [Indexed: 10/10/2023] Open

Liu G, Yang H, Yuan X. A shortest path-based approach for copy number variation detection from next-generation sequencing data. Front Genet 2023;13:1084974. [PMID: 36733945 PMCID: PMC9887524 DOI: 10.3389/fgene.2022.1084974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 12/27/2022] [Indexed: 01/18/2023] Open

Abstract

Copy number variation (CNV) is one of the main structural variations in the human genome and accounts for a considerable proportion of variations. As CNVs can directly or indirectly cause cancer, mental illness, and genetic disease in humans, their effective detection in humans is of great interest in the fields of oncogene discovery, clinical decision-making, bioinformatics, and drug discovery. The advent of next-generation sequencing data makes CNV detection possible, and a large number of CNV detection tools are based on next-generation sequencing data. Due to the complexity (e.g., bias, noise, alignment errors) of next-generation sequencing data and CNV structures, the accuracy of existing methods in detecting CNVs remains low. In this work, we design a new CNV detection approach, called shortest path-based Copy number variation (SPCNV), to improve the detection accuracy of CNVs. SPCNV calculates the k nearest neighbors of each read depth and defines the shortest path, shortest path relation, and shortest path cost sets based on which further calculates the mean shortest path cost of each read depth and its k nearest neighbors. We utilize the ratio between the mean shortest path cost for each read depth and the mean of the mean shortest path cost of its k nearest neighbors to construct a relative shortest path score formula that is able to determine a score for each read depth. Based on the score profile, a boxplot is then applied to predict CNVs. The performance of the proposed method is verified by simulation data experiments and compared against several popular methods of the same type. Experimental results show that the proposed method achieves the best balance between recall and precision in each set of simulated samples. To further verify the performance of the proposed method in real application scenarios, we then select real sample data from the 1,000 Genomes Project to conduct experiments. The proposed method achieves the best F1-scores in almost all samples. Therefore, the proposed method can be used as a more reliable tool for the routine detection of CNVs.

Collapse

Duncavage EJ, Coleman JF, de Baca ME, Kadri S, Leon A, Routbort M, Roy S, Suarez CJ, Vanderbilt C, Zook JM. Recommendations for the Use of in Silico Approaches for Next-Generation Sequencing Bioinformatic Pipeline Validation: A Joint Report of the Association for Molecular Pathology, Association for Pathology Informatics, and College of American Pathologists. J Mol Diagn 2023;25:3-16. [PMID: 36244574 DOI: 10.1016/j.jmoldx.2022.09.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 09/14/2022] [Accepted: 09/28/2022] [Indexed: 11/21/2022] Open

Affiliation(s)

Eric J Duncavage In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri.
Joshua F Coleman In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, University of Utah, Salt Lake City, Utah
Monica E de Baca In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Pacific Pathology Partners, Seattle, Washington
Sabah Kadri In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Anne and Robert H Lurie Children's Hospital of Chicago, Chicago, Illinois
Annette Leon In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Color Health, Burlingame, California
Mark Routbort In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Hematopathology, MD Anderson Cancer Center, Houston, Texas
Somak Roy In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology and Laboratory Medicine, Cincinnati Children's Hospital, Cincinnati, Ohio
Carlos J Suarez In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Stanford University, Palo Alto, California
Chad Vanderbilt In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
Justin M Zook In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Biomarker and Genomic Sciences Group, National Institute of Standards and Technology, Gaithersburg, Maryland

Collapse

WAVECNV: A New Approach for Detecting Copy Number Variation by Wavelet Clustering. MATHEMATICS 2022. [DOI: 10.3390/math10122151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

svBreak: A New Approach for the Detection of Structural Variant Breakpoints Based on Convolutional Neural Network. BIOMED RESEARCH INTERNATIONAL 2022;2022:7196040. [PMID: 35345526 PMCID: PMC8957449 DOI: 10.1155/2022/7196040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 01/04/2022] [Accepted: 01/27/2022] [Indexed: 12/01/2022]

Xie K, Liu K, Alvi HAK, Chen Y, Wang S, Yuan X. KNNCNV: A K-Nearest Neighbor Based Method for Detection of Copy Number Variations Using NGS Data. Front Cell Dev Biol 2022;9:796249. [PMID: 35004691 PMCID: PMC8728060 DOI: 10.3389/fcell.2021.796249] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 11/23/2021] [Indexed: 11/19/2022] Open

Yuan X, Ma C, Zhao H, Yang L, Wang S, Xi J. STIC: Predicting Single Nucleotide Variants and Tumor Purity in Cancer Genome. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2692-2701. [PMID: 32086221 DOI: 10.1109/tcbb.2020.2975181] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Lee CY, Zeng JH, Lee SY, Lu RB, Kuo PH. SNP Data Science for Classification of Bipolar Disorder I and Bipolar Disorder II. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2862-2869. [PMID: 32324560 DOI: 10.1109/tcbb.2020.2988024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Huang T, Li J, Jia B, Sang H. CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021;12:700874. [PMID: 34484298 PMCID: PMC8415314 DOI: 10.3389/fgene.2021.700874] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/19/2021] [Indexed: 11/20/2022] Open

Yuan X, Li J, Bai J, Xi J. A Local Outlier Factor-Based Detection of Copy Number Variations From NGS Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:1811-1820. [PMID: 31880558 DOI: 10.1109/tcbb.2019.2961886] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Liu G, Zhang J. A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021;12:699510. [PMID: 34262604 PMCID: PMC8273656 DOI: 10.3389/fgene.2021.699510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/07/2021] [Indexed: 11/13/2022] Open

Zhao HY, Li Q, Tian Y, Chen YH, Alvi HAK, Yuan XG. CIRCNV: Detection of CNVs Based on a Circular Profile of Read Depth from Sequencing Data. BIOLOGY 2021;10:biology10070584. [PMID: 34202028 PMCID: PMC8301091 DOI: 10.3390/biology10070584] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 06/10/2021] [Accepted: 06/21/2021] [Indexed: 12/29/2022]

Abstract

Simple Summary

In this study, we propose a copy number variation (CNV) detection method called CIRCNV, which is based on a circular profile of the read depth from sequencing data. The proposed method is an extended version of our previously developed method CNV-LOF. The main difference of CIRCNV from CNV-LOF lies in its two new features: (1) it transfers the read depth profile from a line shape to a circular shape via a polar coordinate transformation to generate a meaningful two-dimensional dataset for CNV analysis and promote fairness between the ends and middle part of the genome, and (2) it performs two rounds of CNV declaration via estimating tumor purity and recovering the truth circular RD profile. We test and evaluate the performance of CIRCNV via conducting simulation studies and real sequencing tumor sample applications. The experimental results show that CIRCNV outperforms peer methods with respect to sensitivity, precision, and the F1-score. The experiments prove that the proposed method is a reliable and effective tool in the field of variation analysis of tumor genomes.

Abstract

Copy number variation (CNV) is a common type of structural variation in the human genome. Accurate detection of CNVs from tumor genomes can provide crucial information for the study of tumor genesis and cancer precision diagnosis. However, the contamination of normal genomes in tumor genomes and the crude profiles of the read depth make such a task difficult. In this paper, we propose an alternative approach, called CIRCNV, for the detection of CNVs from sequencing data. CIRCNV is an extension of our previously developed method CNV-LOF, which uses local outlier factors to predict CNVs. Comparatively, CIRCNV can be performed on individual tumor samples and has the following two new features: (1) it transfers the read depth profile from a line shape to a circular shape via a polar coordinate transformation, in order to improve the efficiency of the read depth (RD) profile for the detection of CNVs; and (2) it performs a second round of CNV declaration based on the truth circular RD profile, which is recovered by estimating tumor purity. We test and validate the performance of CIRCNV based on simulation and real sequencing data and perform comparisons with several peer methods. The results demonstrate that CIRCNV can obtain superior performance in terms of sensitivity and precision. We expect that our proposed method will be a supplement to existing methods and become a routine tool in the field of variation analysis of tumor genomes.

Collapse

Guo Y, Wang S, Yuan X. HBOS-CNV: A New Approach to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021;12:642473. [PMID: 34163521 PMCID: PMC8215577 DOI: 10.3389/fgene.2021.642473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 05/05/2021] [Indexed: 11/13/2022] Open

Mao YF, Yuan XG, Cun YP. A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data. Zool Res 2021;42:246-249. [PMID: 33709636 PMCID: PMC7995270 DOI: 10.24272/j.issn.2095-8137.2021.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 03/10/2021] [Indexed: 11/07/2022] Open

Yuan X, Yu J, Xi J, Yang L, Shang J, Li Z, Duan J. CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:539-549. [PMID: 31180897 DOI: 10.1109/tcbb.2019.2920889] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

He Z, Zhang J, Yuan X, Zhang Y. Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods. Front Genet 2021;11:632901. [PMID: 33537063 PMCID: PMC7848170 DOI: 10.3389/fgene.2020.632901] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 12/30/2020] [Indexed: 12/13/2022] Open

Abstract

Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.

Collapse

Xie K, Tian Y, Yuan X. A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021;11:632311. [PMID: 33519925 PMCID: PMC7838601 DOI: 10.3389/fgene.2020.632311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 12/21/2020] [Indexed: 11/29/2022] Open

Alosaimi S, Bandiang A, van Biljon N, Awany D, Thami PK, Tchamga MSS, Kiran A, Messaoud O, Hassan RIM, Mugo J, Ahmed A, Bope CD, Allali I, Mazandu GK, Mulder NJ, Chimusa ER. A broad survey of DNA sequence data simulation tools. Brief Funct Genomics 2020;19:49-59. [PMID: 31867604 DOI: 10.1093/bfgp/elz033] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 10/27/2019] [Accepted: 11/04/2019] [Indexed: 11/12/2022] Open

Affiliation(s)

Shatha Alosaimi Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Armand Bandiang Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Noelle van Biljon Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Denis Awany Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Prisca K Thami Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Botswana Harvard AIDS Institute Partnership, Gaborone, Botswana
Milaine S S Tchamga Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Anmol Kiran Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi.,Edinburgh University, Edinburgh, UK
Olfa Messaoud Université de Tunis El Manar, Institut Pasteur de Tunis, LR16IPT05 Génomique Biomédicale et Oncogénétique, Tunis, 1002, Tunisia
Radia Ismaeel Mohammed Hassan Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Jacquiline Mugo Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Azza Ahmed Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Sudan
Christian D Bope Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Imane Allali Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Gaston K Mazandu Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,African Institute for Mathematical Sciences (AIMS), Cape Town, South Africa
Nicola J Mulder Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Emile R Chimusa Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa

Collapse

Liu G, Zhang J, Yuan X, Wei C. RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data. Front Genet 2020;11:569227. [PMID: 33329705 PMCID: PMC7673372 DOI: 10.3389/fgene.2020.569227] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/04/2020] [Indexed: 12/04/2022] Open

Lin X, Zhang X, Xu X. Efficient Classification of Hot Spots and Hub Protein Interfaces by Recursive Feature Elimination and Gradient Boosting. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:1525-1534. [PMID: 31380766 DOI: 10.1109/tcbb.2019.2931717] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Dong J, Qi M, Wang S, Yuan X. DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads. Front Genet 2020;11:924. [PMID: 32849857 PMCID: PMC7433346 DOI: 10.3389/fgene.2020.00924] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/24/2020] [Indexed: 11/21/2022] Open

Sater V, Viailly PJ, Lecroq T, Ruminy P, Bérard C, Prieur-Gaston É, Jardin F. UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries. Comput Struct Biotechnol J 2020;18:2270-2280. [PMID: 32952940 PMCID: PMC7484502 DOI: 10.1016/j.csbj.2020.08.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 08/03/2020] [Accepted: 08/05/2020] [Indexed: 11/02/2022] Open

Yu Z, Du F, Ban R, Zhang Y. SimuSCoP: reliably simulate Illumina sequencing data based on position and context dependent profiles. BMC Bioinformatics 2020;21:331. [PMID: 32703148 PMCID: PMC7379788 DOI: 10.1186/s12859-020-03665-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Accepted: 07/16/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

A number of simulators have been developed for emulating next-generation sequencing data by incorporating known errors such as base substitutions and indels. However, their practicality may be degraded by functional and runtime limitations. Particularly, the positional and genomic contextual information is not effectively utilized for reliably characterizing base substitution patterns, as well as the positional and contextual difference of Phred quality scores is not fully investigated. Thus, a more effective and efficient bioinformatics tool is sorely required.

RESULTS

Here, we introduce a novel tool, SimuSCoP, to reliably emulate complex DNA sequencing data. The base substitution patterns and the statistical behavior of quality scores in Illumina sequencing data are fully explored and integrated into the simulation model for reliably emulating datasets for different applications. In addition, an integrated and easy-to-use pipeline is employed in SimuSCoP to facilitate end-to-end simulation of complex samples, and high runtime efficiency is achieved by implementing the tool to run in multithreading with low memory consumption. These features enable SimuSCoP to gets substantial improvements in reliability, functionality, practicality and runtime efficiency. The tool is comprehensively evaluated in multiple aspects including consistency of profiles, simulation of genomic variations and complex tumor samples, and the results demonstrate the advantages of SimuSCoP over existing tools.

CONCLUSIONS

SimuSCoP, a new bioinformatics tool is developed to learn informative profiles from real sequencing data and reliably mimic complex data by introducing various genomic variations. We believe that the presented work will catalyse new development of downstream bioinformatics methods for analyzing sequencing data.

Collapse

Yuan X, Bai J, Zhang J, Yang L, Duan J, Li Y, Gao M. CONDEL: Detecting Copy Number Variation and Genotyping Deletion Zygosity from Single Tumor Samples Using Sequence Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:1141-1153. [PMID: 30489272 DOI: 10.1109/tcbb.2018.2883333] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Zhao H, Huang T, Li J, Liu G, Yuan X. MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2020;11:434. [PMID: 32499814 PMCID: PMC7243272 DOI: 10.3389/fgene.2020.00434] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 04/08/2020] [Indexed: 11/13/2022] Open

Abstract

Copy number variation (CNV) is a very important phenomenon in tumor genomes and plays a significant role in tumor genesis. Accurate detection of CNVs has become a routine and necessary procedure for a deep investigation of tumor cells and diagnosis of tumor patients. Next-generation sequencing (NGS) technique has provided a wealth of data for the detection of CNVs at base-pair resolution. However, such task is usually influenced by a number of factors, including GC-content bias, sequencing errors, and correlations among adjacent positions within CNVs. Although many existing methods have dealt with some of these artifacts by designing their own strategies, there is still a lack of comprehensive consideration of all the factors. In this paper, we propose a new method, MFCNV, for an accurate detection of CNVs from NGS data. Compared with existing methods, the characteristics of the proposed method include the following: (1) it makes a full consideration of the intrinsic correlations among adjacent positions in the genome to be analyzed, (2) it calculates read depth, GC-content bias, base quality, and correlation value for each genome bin and combines them as multiple features for the evaluation of genome bins, and (3) it addresses the joint effect among the factors via training a neural network algorithm for the prediction of CNVs. We test the performance of the MFCNV method by using simulation and real sequencing data and make comparisons with several peer methods. The results demonstrate that our method is superior to other methods in terms of sensitivity, precision, and F1-score and can detect many CNVs that other methods have not discovered. MFCNV is expected to be a complementary tool in the analysis of mutations in tumor genomes and can be extended to be applied to the analysis of single-cell sequencing data.

Collapse

Wilson-Sánchez D, Lup SD, Sarmiento-Mañús R, Ponce MR, Micol JL. Next-generation forward genetic screens: using simulated data to improve the design of mapping-by-sequencing experiments in Arabidopsis. Nucleic Acids Res 2020;47:e140. [PMID: 31544937 PMCID: PMC6868388 DOI: 10.1093/nar/gkz806] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 09/07/2019] [Accepted: 09/10/2019] [Indexed: 12/25/2022] Open

Yuan X, Gao M, Bai J, Duan J. SVSR: A Program to Simulate Structural Variations and Generate Sequencing Reads for Multiple Platforms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:1082-1091. [PMID: 30334804 DOI: 10.1109/tcbb.2018.2876527] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Yuan X, Li Z, Zhao H, Bai J, Zhang J. Accurate Inference of Tumor Purity and Absolute Copy Numbers From High-Throughput Sequencing Data. Front Genet 2020;11:458. [PMID: 32425990 PMCID: PMC7205152 DOI: 10.3389/fgene.2020.00458] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/14/2020] [Indexed: 02/06/2023] Open

Yu Z, Du F, Sun X, Li A. SCSsim: an integrated tool for simulating single-cell genome sequencing data. Bioinformatics 2020;36:1281-1282. [PMID: 31584615 PMCID: PMC7703785 DOI: 10.1093/bioinformatics/btz713] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 08/20/2019] [Accepted: 09/15/2019] [Indexed: 11/30/2022] Open

SM-RCNV: a statistical method to detect recurrent copy number variations in sequenced samples. Genes Genomics 2019;41:529-536. [PMID: 30779024 DOI: 10.1007/s13258-019-00788-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 01/21/2019] [Indexed: 12/13/2022]

Xia LC, Ai D, Lee H, Andor N, Li C, Zhang NR, Ji HP. SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. Gigascience 2018;7:5049476. [PMID: 29982625 PMCID: PMC6057526 DOI: 10.1093/gigascience/giy081] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 05/22/2018] [Accepted: 06/26/2018] [Indexed: 11/29/2022] Open

Abstract

Background

Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes.

Findings

We developed SVEngine, an open-source tool to address this need. SVEngine simulates next-generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file, and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs), and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions, and translocations. Finally, SVEngine simulates sequence data that replicate the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time.

Conclusions

We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogeneous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift, and neighboring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use .

Collapse

Yuan X, Zhang J, Yang L, Bai J, Fan P. Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data. IEEE Trans Nanobioscience 2018;17:12-20. [PMID: 29570071 DOI: 10.1109/tnb.2017.2783910] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Xia Y, Liu Y, Deng M, Xi R. Pysim-sv: a package for simulating structural variation data with GC-biases. BMC Bioinformatics 2017;18:53. [PMID: 28361688 PMCID: PMC5374556 DOI: 10.1186/s12859-017-1464-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open