1
|
Kalita CA, Gusev A. DeCAF: a novel method to identify cell-type specific regulatory variants and their role in cancer risk. Genome Biol 2022; 23:152. [PMID: 35804456 PMCID: PMC9264694 DOI: 10.1186/s13059-022-02708-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 06/15/2022] [Indexed: 01/09/2023] Open
Abstract
Here, we propose DeCAF (DEconvoluted cell type Allele specific Function), a new method to identify cell-fraction (cf) QTLs in tumors by leveraging both allelic and total expression information. Applying DeCAF to RNA-seq data from TCGA, we identify 3664 genes with cfQTLs (at 10% FDR) in 14 cell types, a 5.63× increase in discovery over conventional interaction-eQTL mapping. cfQTLs replicated in external cell-type-specific eQTL data are more enriched for cancer risk than conventional eQTLs. Our new method, DeCAF, empowers the discovery of biologically meaningful cfQTLs from bulk RNA-seq data in moderately sized studies.
Collapse
Affiliation(s)
- Cynthia A. Kalita
- grid.38142.3c000000041936754XDivision of Population Sciences, Dana–Farber Cancer Institute & Harvard Medical School, Boston, USA
| | - Alexander Gusev
- grid.38142.3c000000041936754XDivision of Population Sciences, Dana–Farber Cancer Institute & Harvard Medical School, Boston, USA ,grid.66859.340000 0004 0546 1623The Broad Institute, Boston, USA ,grid.62560.370000 0004 0378 8294Division of Genetics, Brigham & Women’s Hospital, Boston, USA
| |
Collapse
|
2
|
Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B. Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology. Brief Bioinform 2021; 22:6330938. [PMID: 34329375 DOI: 10.1093/bib/bbab259] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/14/2021] [Accepted: 06/18/2021] [Indexed: 12/13/2022] Open
Abstract
Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.
Collapse
Affiliation(s)
- Amarinder Singh Thind
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Isha Monga
- Columbia University, New York City, NY, USA
| | | | - Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | | | - Marie Ranson
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Bruce Ashford
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| |
Collapse
|
3
|
Wang Y, Zhang X, Ding S, Geng Y, Liu J, Zhao Z, Zhang R, Xiao X, Wang J. A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data. BMC Med Genomics 2019; 12:27. [PMID: 30704456 PMCID: PMC6357344 DOI: 10.1186/s12920-018-0457-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Haplotype phasing is an important step in many bioinformatics workflows. In cancer genomics, it is suggested that reconstructing the clonal haplotypes of a tumor sample could facilitate a comprehensive understanding of its clonal architecture and further provide valuable reference in clinical diagnosis and treatment. However, the sequencing data is an admixture of reads sampled from different clonal haplotypes, which complicates the computational problem by exponentially increasing the solution-space and leads the existing algorithms to an unacceptable time-/space- complexity. In addition, the evolutionary process among clonal haplotypes further weakens those algorithms by bringing indistinguishable candidate solutions. RESULTS To improve the algorithmic performance of phasing clonal haplotypes, in this article, we propose MixSubHap, which is a graph-based computational pipeline working on cancer sequencing data. To reduce the computation complexity, MixSubHap adopts three bounding strategies to limit the solution space and filter out false positive candidates. It first estimates the global clonal structure by clustering the variant allelic frequencies on sampled point mutations. This offers a priori on the number of clonal haplotypes when copy-number variations are not considered. Then, it utilizes a greedy extension algorithm to approximately find the longest linkage of the locally assembled contigs. Finally, it incorporates a read-depth stripping algorithm to filter out false linkages according to the posterior estimation of tumor purity and the estimated percentage of each sub-clone in the sample. A series of experiments are conducted to verify the performance of the proposed pipeline. CONCLUSIONS The results demonstrate that MixSubHap is able to identify about 90% on average of the preset clonal haplotypes under different simulation configurations. Especially, MixSubHap is robust when decreasing the mutation rates, in which cases the longest assembled contig could reach to 10kbps, while the accuracy of assigning a mutation to its haplotype still keeps more than 60% on average. MixSubHap is considered as a practical algorithm to reconstruct clonal haplotypes from cancer sequencing data. The source codes have been uploaded and maintained at https://github.com/YixuanWang1120/MixSubHap for academic use only.
Collapse
Affiliation(s)
- Yixuan Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Xuanping Zhang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Shuai Ding
- School of Management, Ministry of Education Key Laboratory of Process Optimization and Intelligent Decision-Making, Hefei University of Technology, Hefei, 23009 China
| | - Yu Geng
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Jianye Liu
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Zhongmeng Zhao
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Rong Zhang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Xiao Xiao
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Institute of Health Administration and Policy, School of Public Policy and Administration, Xi’an Jiaotong University, Xi’an, 710048 China
| | - Jiayin Wang
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
- Shaanxi Engineering Research Center of Medical and Health Big Data, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710048 China
| |
Collapse
|