1
|
Haas BJ, Dobin A, Li B, Stransky N, Pochet N, Regev A. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol 2019; 20:213. [PMID: 31639029 PMCID: PMC6802306 DOI: 10.1186/s13059-019-1842-9] [Citation(s) in RCA: 331] [Impact Index Per Article: 66.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Accepted: 09/28/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Accurate fusion transcript detection is essential for comprehensive characterization of cancer transcriptomes. Over the last decade, multiple bioinformatic tools have been developed to predict fusions from RNA-seq, based on either read mapping or de novo fusion transcript assembly. RESULTS We benchmark 23 different methods including applications we develop, STAR-Fusion and TrinityFusion, leveraging both simulated and real RNA-seq. Overall, STAR-Fusion, Arriba, and STAR-SEQR are the most accurate and fastest for fusion detection on cancer transcriptomes. CONCLUSION The lower accuracy of de novo assembly-based methods notwithstanding, they are useful for reconstructing fusion isoforms and tumor viruses, both of which are important in cancer research.
Collapse
Affiliation(s)
- Brian J. Haas
- Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724 USA
| | - Bo Li
- Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Center for Immunology and Inflammatory Diseases, Division of Rheumatology, Allergy, and Immunology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02129 USA
| | | | - Nathalie Pochet
- Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Ann Romney Center for Neurologic Diseases, Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115 USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Howard Hughes Medical Institute, and Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02140 USA
| |
Collapse
|
2
|
Timilsina M, Yang H, Sahay R, Rebholz-Schuhmann D. Predicting links between tumor samples and genes using 2-Layered graph based diffusion approach. BMC Bioinformatics 2019; 20:462. [PMID: 31500564 PMCID: PMC6734347 DOI: 10.1186/s12859-019-3056-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 08/26/2019] [Indexed: 12/21/2022] Open
Abstract
Background Determining the association between tumor sample and the gene is demanding because it requires a high cost for conducting genetic experiments. Thus, the discovered association between tumor sample and gene further requires clinical verification and validation. This entire mechanism is time-consuming and expensive. Due to this issue, predicting the association between tumor samples and genes remain a challenge in biomedicine. Results Here we present, a computational model based on a heat diffusion algorithm which can predict the association between tumor samples and genes. We proposed a 2-layered graph. In the first layer, we constructed a graph of tumor samples and genes where these two types of nodes are connected by “hasGene” relationship. In the second layer, the gene nodes are connected by “interaction” relationship. We applied the heat diffusion algorithms in nine different variants of genetic interaction networks extracted from STRING and BioGRID database. The heat diffusion algorithm predicted the links between tumor samples and genes with mean AUC-ROC score of 0.84. This score is obtained by using weighted genetic interactions of fusion or co-occurrence channels from the STRING database. For the unweighted genetic interaction from the BioGRID database, the algorithms predict the links with an AUC-ROC score of 0.74. Conclusions We demonstrate that the gene-gene interaction scores could improve the predictive power of the heat diffusion model to predict the links between tumor samples and genes. We showed the efficient runtime of the heat diffusion algorithm in various genetic interaction network. We statistically validated our prediction quality of the links between tumor samples and genes. Electronic supplementary material The online version of this article (10.1186/s12859-019-3056-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mohan Timilsina
- Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland.
| | - Haixuan Yang
- School of Mathematics Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland
| | - Ratnesh Sahay
- Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland
| | | |
Collapse
|
3
|
Xu M, Zhao Z, Zhang X, Gao A, Wu S, Wang J. Synstable Fusion: A Network-Based Algorithm for Estimating Driver Genes in Fusion Structures. Molecules 2018; 23:molecules23082055. [PMID: 30115851 PMCID: PMC6222865 DOI: 10.3390/molecules23082055] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 08/02/2018] [Accepted: 08/07/2018] [Indexed: 12/22/2022] Open
Abstract
Gene fusion structure is a class of common somatic mutational events in cancer genomes, which are often formed by chromosomal mutations. Identifying the driver gene(s) in a fusion structure is important for many downstream analyses and it contributes to clinical practices. Existing computational approaches have prioritized the importance of oncogenes by incorporating prior knowledge from gene networks. However, different methods sometimes suffer different weaknesses when handling gene fusion data due to multiple issues such as fusion gene representation, network integration, and the effectiveness of the evaluation algorithms. In this paper, Synstable Fusion (SYN), an algorithm for computationally evaluating the fusion genes, is proposed. This algorithm uses network-based strategy by incorporating gene networks as prior information, but estimates the driver genes according to the destructiveness hypothesis. This hypothesis balances the two popular evaluation strategies in the existing studies, thereby providing more comprehensive results. A machine learning framework is introduced to integrate multiple networks and further solve the conflicting results from different networks. In addition, a synchronous stability model is established to reduce the computational complexity of the evaluation algorithm. To evaluate the proposed algorithm, we conduct a series of experiments on both artificial and real datasets. The results demonstrate that the proposed algorithm performs well on different configurations and is robust when altering the internal parameter settings.
Collapse
Affiliation(s)
- Mingzhe Xu
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Department of Automation, College of Intelligent Manufacturing and Automation, Henan University of Animal Husbandry and Economy, Zhengzhou 450011, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Zhongmeng Zhao
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Xuanping Zhang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Aiqing Gao
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Shuyan Wu
- Department of Network Technology, College of Intelligent Manufacturing and Automation, Henan University of Animal Husbandry and Economy, Zhengzhou 450011, China.
| | - Jiayin Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| |
Collapse
|
4
|
Huang R, Liao X, Li Q. Identification and validation of potential prognostic gene biomarkers for predicting survival in patients with acute myeloid leukemia. Onco Targets Ther 2017; 10:5243-5254. [PMID: 29138577 PMCID: PMC5679677 DOI: 10.2147/ott.s147717] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Molecular analysis is a promising source of clinically useful prognostic biomarkers. The aim of this investigation was to identify prognostic biomarkers for patients with acute myeloid leukemia (AML) by using the gene expression profile dataset from public database. Methods The gene expression profile dataset and corresponding overall survival (OS) information of three cohorts of AML patients from GSE12417 and The Cancer Genome Atlas AML project (TCGA-LAML) were included in the present study. Prognostic gene screening was performed by using a survival package, whereas time-dependent receiver operating characteristic (ROC) curve analysis was performed using the survivalROC package. Results In the three cohorts, 11 genes were identified that were significantly associated with AML OS. A linear prognostic model of the 11 genes was constructed and weighted by regression coefficient (β) from the multivariate Cox regression analyses of GSE12417 HG-U133A cohort to divide patients into high- and low-risk groups. GSE12417 HG-U133 plus 2.0 and TCGA-LAML were validation cohorts. Patients assigned to the high-risk group exhibited poor OS compared to patients in the low-risk group. The 11-gene signature is a prognostic marker of AML and demonstrates good performance for predicting 1-, 3-, and 5-year OS as evaluated by survivalROC in the three cohorts. Conclusion Our study has identified an mRNA signature including 11 genes, which may serve as a potential prognostic marker of AML.
Collapse
Affiliation(s)
| | - Xiwen Liao
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, People's Republic of China
| | | |
Collapse
|