1
|
Aparicio B, Theunissen P, Hervas-Stubbs S, Fortes P, Sarobe P. Relevance of mutation-derived neoantigens and non-classical antigens for anticancer therapies. Hum Vaccin Immunother 2024; 20:2303799. [PMID: 38346926 PMCID: PMC10863374 DOI: 10.1080/21645515.2024.2303799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 01/06/2024] [Indexed: 02/15/2024] Open
Abstract
Efficacy of cancer immunotherapies relies on correct recognition of tumor antigens by lymphocytes, eliciting thus functional responses capable of eliminating tumor cells. Therefore, important efforts have been carried out in antigen identification, with the aim of understanding mechanisms of response to immunotherapy and to design safer and more efficient strategies. In addition to classical tumor-associated antigens identified during the last decades, implementation of next-generation sequencing methodologies is enabling the identification of neoantigens (neoAgs) arising from mutations, leading to the development of new neoAg-directed therapies. Moreover, there are numerous non-classical tumor antigens originated from other sources and identified by new methodologies. Here, we review the relevance of neoAgs in different immunotherapies and the results obtained by applying neoAg-based strategies. In addition, the different types of non-classical tumor antigens and the best approaches for their identification are described. This will help to increase the spectrum of targetable molecules useful in cancer immunotherapies.
Collapse
Affiliation(s)
- Belen Aparicio
- Program of Immunology and Immunotherapy, Center for Applied Medical Research (CIMA) University of Navarra, Pamplona, Spain
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
| | - Patrick Theunissen
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
- DNA and RNA Medicine Division, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain
| | - Sandra Hervas-Stubbs
- Program of Immunology and Immunotherapy, Center for Applied Medical Research (CIMA) University of Navarra, Pamplona, Spain
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
| | - Puri Fortes
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
- DNA and RNA Medicine Division, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain
- Spanish Network for Advanced Therapies (TERAV ISCIII), Spain
| | - Pablo Sarobe
- Program of Immunology and Immunotherapy, Center for Applied Medical Research (CIMA) University of Navarra, Pamplona, Spain
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
| |
Collapse
|
2
|
Gaston JM, Alm EJ, Zhang AN. Fast and accurate variant identification tool for sequencing-based studies. BMC Biol 2024; 22:90. [PMID: 38644496 PMCID: PMC11034086 DOI: 10.1186/s12915-024-01891-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 04/17/2024] [Indexed: 04/23/2024] Open
Abstract
BACKGROUND Accurate identification of genetic variants, such as point mutations and insertions/deletions (indels), is crucial for various genetic studies into epidemic tracking, population genetics, and disease diagnosis. Genetic studies into microbiomes often require processing numerous sequencing datasets, necessitating variant identifiers with high speed, accuracy, and robustness. RESULTS We present QuickVariants, a bioinformatics tool that effectively summarizes variant information from read alignments and identifies variants. When tested on diverse bacterial sequencing data, QuickVariants demonstrates a ninefold higher median speed than bcftools, a widely used variant identifier, with higher accuracy in identifying both point mutations and indels. This accuracy extends to variant identification in virus samples, including SARS-CoV-2, particularly with significantly fewer false negative indels than bcftools. The high accuracy of QuickVariants is further demonstrated by its detection of a greater number of Omicron-specific indels (5 versus 0) and point mutations (61 versus 48-54) than bcftools in sewage metagenomes predominated by Omicron variants. Much of the reduced accuracy of bcftools was attributable to its misinterpretation of indels, often producing false negative indels and false positive point mutations at the same locations. CONCLUSIONS We introduce QuickVariants, a fast, accurate, and robust bioinformatics tool designed for identifying genetic variants for microbial studies. QuickVariants is available at https://github.com/caozhichongchong/QuickVariants .
Collapse
Affiliation(s)
| | - Eric J Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, USA
- Department of Biological Engineering, Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, USA
| | - An-Ni Zhang
- Department of Biological Engineering, Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, USA.
| |
Collapse
|
3
|
Hagiwara K, Zhang J. Detecting Somatic Insertions/Deletions (Indels) Using Tumor RNA-Seq Data. Methods Mol Biol 2024; 2812:235-242. [PMID: 39068366 DOI: 10.1007/978-1-0716-3886-6_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Identification of somatic indels remains a major challenge in cancer genomic analysis and is rarely attempted for tumor-only RNA-Seq due to the lack of matching normal data and the complexity of read alignment, which involves mapping of both splice junctions and indels. In this chapter, we introduce RNAIndel, a software tool designed for identifying somatic coding indels using tumor-only RNA-Seq. RNAIndel performs indel realignment and employs a machine learning model to estimate the probability of a coding indel being somatic, germline, or artifact. Its high accuracy has been validated in RNA-Seq generated from multiple tumor types.
Collapse
Affiliation(s)
- Kohei Hagiwara
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA.
| | - Jinghui Zhang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| |
Collapse
|
4
|
Yang L, Wang J, Altreuter J, Jhaveri A, Wong CJ, Song L, Fu J, Taing L, Bodapati S, Sahu A, Tokheim C, Zhang Y, Zeng Z, Bai G, Tang M, Qiu X, Long HW, Michor F, Liu Y, Liu XS. Tutorial: integrative computational analysis of bulk RNA-sequencing data to characterize tumor immunity using RIMA. Nat Protoc 2023; 18:2404-2414. [PMID: 37391666 DOI: 10.1038/s41596-023-00841-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 02/22/2023] [Indexed: 07/02/2023]
Abstract
RNA-sequencing (RNA-seq) has become an increasingly cost-effective technique for molecular profiling and immune characterization of tumors. In the past decade, many computational tools have been developed to characterize tumor immunity from gene expression data. However, the analysis of large-scale RNA-seq data requires bioinformatics proficiency, large computational resources and cancer genomics and immunology knowledge. In this tutorial, we provide an overview of computational analysis of bulk RNA-seq data for immune characterization of tumors and introduce commonly used computational tools with relevance to cancer immunology and immunotherapy. These tools have diverse functions such as evaluation of expression signatures, estimation of immune infiltration, inference of the immune repertoire, prediction of immunotherapy response, neoantigen detection and microbiome quantification. We describe the RNA-seq IMmune Analysis (RIMA) pipeline integrating many of these tools to streamline RNA-seq analysis. We also developed a comprehensive and user-friendly guide in the form of a GitBook with text and video demos to assist users in analyzing bulk RNA-seq data for immune characterization at both individual sample and cohort levels by using RIMA.
Collapse
Affiliation(s)
- Lin Yang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jin Wang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- School of Life Science and Technology, Tongji University, Shanghai, China
| | - Jennifer Altreuter
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Aashna Jhaveri
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Cheryl J Wong
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Li Song
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Jingxin Fu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- School of Life Science and Technology, Tongji University, Shanghai, China
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Len Taing
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sudheshna Bodapati
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Avinash Sahu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Collin Tokheim
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Yi Zhang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Zexian Zeng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Gali Bai
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Ming Tang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Xintao Qiu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Henry W Long
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Franziska Michor
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, MA, USA
- The Ludwig Center at Harvard, Boston, MA, USA
| | - Yang Liu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - X Shirley Liu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
5
|
Song Q, Zhang Y, Zhang M, Ma X, Zhang Q, Zhao C, Zhang Z, Zhao H, Hu W, Zhang X, Ren X, An M, Yang J, Liu Y. Identifying gene variants underlying the pathogenesis of diabetic retinopathy based on integrated genomic and transcriptomic analysis of clinical extreme phenotypes. Front Genet 2022; 13:929049. [PMID: 36035153 PMCID: PMC9399422 DOI: 10.3389/fgene.2022.929049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 07/14/2022] [Indexed: 11/13/2022] Open
Abstract
Diabetic retinopathy (DR) is a common complication and the leading cause of blindness in patients with type 2 diabetes. DR has been shown to be closely correlated with blood glucose levels and the duration of diabetes. However, the onset and progression of DR also display clinical heterogeneity. We applied whole-exome sequencing and RNA-seq approaches to study the gene mutation and transcription profiles in three groups of diabetic patients with extreme clinical phenotypes in DR onset, timing, and disease progression, aiming to identify genetic variants that may play roles in the pathogenesis of DR. We identified 23 putatively pathogenic genes, and ingenuity pathway analysis of these mutated genes reveals their functional association with glucose metabolism, diabetic complications, neural system activity, and dysregulated immune responses. In addition, ten potentially protective genes were also proposed. These findings shed light on the mechanisms underlying the pathogenesis of DR and may provide potential targets for developing new strategies to combat DR.
Collapse
Affiliation(s)
- Qiaoling Song
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
- Innovation Platform of Marine Drug Screening and Evaluation, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| | - Yuchao Zhang
- Department of Endocrinology, Qingdao Municipal Hospital, Qingdao, China
| | - Minghui Zhang
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
- Innovation Platform of Marine Drug Screening and Evaluation, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| | - Xiaoli Ma
- Department of Endocrinology, Qingdao Municipal Hospital, Qingdao, China
| | - Qianyue Zhang
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
- Innovation Platform of Marine Drug Screening and Evaluation, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| | - Chenyang Zhao
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Zhongwen Zhang
- Department of Endocrinology and Metabolism, the First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Huichen Zhao
- Department of Endocrinology, Qingdao Municipal Hospital, Qingdao, China
| | - Wenchao Hu
- Department of Endocrinology, Qilu Hospital (Qingdao), Cheeloo College of Medicine, Shandong University, Qingdao, China
| | - Xinxin Zhang
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
- Innovation Platform of Marine Drug Screening and Evaluation, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| | - Xiwen Ren
- Department of Emergency, Linyi People's Hospital, Linyi, China
| | - Ming An
- Department of Ophthalmology, Qingdao Municipal Hospital, Qingdao, China
| | - Jinbo Yang
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
- Innovation Platform of Marine Drug Screening and Evaluation, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| | - Yuantao Liu
- Department of Endocrinology, Qilu Hospital (Qingdao), Cheeloo College of Medicine, Shandong University, Qingdao, China
| |
Collapse
|
6
|
Wang TY, Yang R. Detecting Medium and Large Insertions and Deletions with transIndel. Methods Mol Biol 2022; 2493:67-75. [PMID: 35751809 DOI: 10.1007/978-1-0716-2293-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Insertions and deletions (indels) are primarily detected from DNA sequencing (DNA-seq) data, but their transcriptional consequences remain unexplored due to challenges in distinguishing medium- and large-sized indels from RNA splicing events in RNA-seq data. We introduce transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. Here, we describe the method and provide a tutorial on the installation and application of transIndel.
Collapse
Affiliation(s)
- Ting-You Wang
- The Hormel Institute, University of Minnesota, Austin, MN, USA
| | - Rendong Yang
- The Hormel Institute, University of Minnesota, Austin, MN, USA.
| |
Collapse
|
7
|
Graim K, Gorenshteyn D, Robinson DG, Carriero NJ, Cahill JA, Chakrabarti R, Goldschmidt MH, Durham AC, Funk J, Storey JD, Kristensen VN, Theesfeld CL, Sorenmo KU, Troyanskaya OG. Modeling molecular development of breast cancer in canine mammary tumors. Genome Res 2021; 31:337-347. [PMID: 33361113 PMCID: PMC7849403 DOI: 10.1101/gr.256388.119] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Accepted: 12/17/2020] [Indexed: 12/30/2022]
Abstract
Understanding the changes in diverse molecular pathways underlying the development of breast tumors is critical for improving diagnosis, treatment, and drug development. Here, we used RNA-profiling of canine mammary tumors (CMTs) coupled with a robust analysis framework to model molecular changes in human breast cancer. Our study leveraged a key advantage of the canine model, the frequent presence of multiple naturally occurring tumors at diagnosis, thus providing samples spanning normal tissue and benign and malignant tumors from each patient. We showed human breast cancer signals, at both expression and mutation level, are evident in CMTs. Profiling multiple tumors per patient enabled by the CMT model allowed us to resolve statistically robust transcription patterns and biological pathways specific to malignant tumors versus those arising in benign tumors or shared with normal tissues. We showed that multiple histological samples per patient is necessary to effectively capture these progression-related signatures, and that carcinoma-specific signatures are predictive of survival for human breast cancer patients. To catalyze and support similar analyses and use of the CMT model by other biomedical researchers, we provide FREYA, a robust data processing pipeline and statistical analyses framework.
Collapse
Affiliation(s)
- Kiley Graim
- Flatiron Institute, Simons Foundation, New York, New York 10010, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Dmitriy Gorenshteyn
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey 08544, USA
| | - David G Robinson
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey 08544, USA
| | | | - James A Cahill
- Laboratory of the Neurogenetics of Language, Rockefeller University, New York, New York 10065, USA
| | - Rumela Chakrabarti
- Department of Biomedical Sciences and the Penn Vet Cancer Center, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Michael H Goldschmidt
- Department of Pathobiology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Amy C Durham
- Department of Pathobiology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Julien Funk
- Flatiron Institute, Simons Foundation, New York, New York 10010, USA
| | - John D Storey
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey 08544, USA
| | - Vessela N Kristensen
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
| | - Chandra L Theesfeld
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Karin U Sorenmo
- Department of Biomedical Sciences and the Penn Vet Cancer Center, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Olga G Troyanskaya
- Flatiron Institute, Simons Foundation, New York, New York 10010, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Department of Computer Science, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
8
|
Quaglieri A, Flensburg C, Speed TP, Majewski IJ. Finding a suitable library size to call variants in RNA-Seq. BMC Bioinformatics 2020; 21:553. [PMID: 33261552 PMCID: PMC7708150 DOI: 10.1186/s12859-020-03860-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 11/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA sequencing allows the study of both gene expression changes and transcribed mutations, providing a highly effective way to gain insight into cancer biology. When planning the sequencing of a large cohort of samples, library size is a fundamental factor affecting both the overall cost and the quality of the results. Here we specifically address how overall library size influences the detection of somatic mutations in RNA-seq data in two acute myeloid leukaemia datasets. RESULTS : We simulated shallower sequencing depths by downsampling 45 acute myeloid leukaemia samples (100 bp PE) that are part of the Leucegene project, which were originally sequenced at high depth. We compared the sensitivity of six methods of recovering validated mutations on the same samples. The methods compared are a combination of three popular callers (MuTect, VarScan, and VarDict) and two filtering strategies. We observed an incremental loss in sensitivity when simulating libraries of 80M, 50M, 40M, 30M and 20M fragments, with the largest loss detected with less than 30M fragments (below 90%, average loss of 7%). The sensitivity in recovering insertions and deletions varied markedly between callers, with VarDict showing the highest sensitivity (60%). Single nucleotide variant sensitivity is relatively consistent across methods, apart from MuTect, whose default filters need adjustment when using RNA-Seq. We also analysed 136 RNA-Seq samples from the TCGA-LAML cohort (50 bp PE) and assessed the change in sensitivity between the initial libraries (average 59M fragments) and after downsampling to 40M fragments. When considering single nucleotide variants in recurrently mutated myeloid genes we found a comparable performance, with a 6% average loss in sensitivity using 40M fragments. CONCLUSIONS Between 30M and 40M 100 bp PE reads are needed to recover 90-95% of the initial variants on recurrently mutated myeloid genes. To extend this result to another cancer type, an exploration of the characteristics of its mutations and gene expression patterns is suggested.
Collapse
Affiliation(s)
- Anna Quaglieri
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia. .,Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Grattan St, Melbourne, 3010, Australia.
| | - Christoffer Flensburg
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia
| | - Terence P Speed
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia.,Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Grattan St, Melbourne, 3010, Australia.,Department of Mathematics and Statistics, The University of Melbourne, 813 Swanston Street, Melbourne, 3010, Australia
| | - Ian J Majewski
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia. .,Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Grattan St, Melbourne, 3010, Australia.
| |
Collapse
|
9
|
Zhang Z, Tang J, He X, Di R, Zhang X, Zhang J, Hu W, Chu M. Identification and Characterization of Hypothalamic Alternative Splicing Events and Variants in Ovine Fecundity-Related Genes. Animals (Basel) 2020; 10:ani10112111. [PMID: 33203033 PMCID: PMC7698220 DOI: 10.3390/ani10112111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Revised: 11/03/2020] [Accepted: 11/10/2020] [Indexed: 12/03/2022] Open
Abstract
Simple Summary Previous studies revealed that alternative splicing (AS) events and gene variants played key roles in reproduction. However, their location and distribution in hypothalamic fecundity-related genes in sheep without the FecB mutation remain largely unknown. In this study, we performed a correlation analysis of transcriptomics and proteomics, and the results suggested several differentially expressed genes (DEGs)/differentially expressed proteins (DEPs), including galectin 3 (LGALS3), aspartoacylase (ASPA) and transthyretin (TTR), could be candidate genes influencing ovine litter size. Further analysis suggested that AS events, single nucleotide polymorphisms (SNPs) and microRNA (miRNA)-binding sites existed in key DEGs/DEPs, such as ASPA and TTR. This study provides a new insight into ovine and even other mammalian reproduction. Abstract Previous studies revealed that alternative splicing (AS) events and gene variants played key roles in reproduction; however, their location and distribution in hypothalamic fecundity-related genes in sheep without the FecB mutation remain largely unknown. Therefore, in this study, we described the hypothalamic AS events and variants in differentially expressed genes (DEGs) in Small Tail Han sheep without the FecB mutation at polytocous sheep in the follicular phase vs. monotocous sheep in the follicular phase (PF vs. MF) and polytocous sheep in the luteal phase vs. monotocous sheep in the luteal phase (PL vs. ML) via an RNA-seq study for the first time. We found 39 DEGs with AS events (AS DEGs) in PF vs. MF, while 42 AS DEGs were identified in PL vs. ML. No DEGs with single nucleotide polymorphisms (SNPs) were observed in PF vs. MF, but five were identified in PL vs. ML. We also performed a correlation analysis of transcriptomics and proteomics, and the results suggested several key DEGs/differentially expressed proteins (DEPs), such as galectin 3 (LGALS3) in PF vs. MF and aspartoacylase (ASPA) and transthyretin (TTR) in PL vs. ML, could be candidate genes influencing ovine litter size. In addition, further analyses suggested that AS events, SNPs and miRNA-binding sites existed in key DEGs/DEPs, such as ASPA and TTR. All in all, this study provides a new insight into ovine and even other mammalian reproduction.
Collapse
Affiliation(s)
- Zhuangbiao Zhang
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (Z.Z.); (J.T.); (X.H.); (R.D.)
| | - Jishun Tang
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (Z.Z.); (J.T.); (X.H.); (R.D.)
- Institute of Animal Husbandry and Veterinary Medicine, Anhui Academy of Agricultural Sciences, Hefei 230031, China
| | - Xiaoyun He
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (Z.Z.); (J.T.); (X.H.); (R.D.)
| | - Ran Di
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (Z.Z.); (J.T.); (X.H.); (R.D.)
| | - Xiaosheng Zhang
- Tianjin Institute of Animal Sciences, Tianjin 300381, China; (X.Z.); (J.Z.)
| | - Jinlong Zhang
- Tianjin Institute of Animal Sciences, Tianjin 300381, China; (X.Z.); (J.Z.)
| | - Wenping Hu
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (Z.Z.); (J.T.); (X.H.); (R.D.)
- Correspondence: (W.H.); (M.C.); Tel.: +86-010-6281-6002 (W.H.); +86-010-6281-9850 (M.C.)
| | - Mingxing Chu
- Key Laboratory of Animal Genetics and Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (Z.Z.); (J.T.); (X.H.); (R.D.)
- Correspondence: (W.H.); (M.C.); Tel.: +86-010-6281-6002 (W.H.); +86-010-6281-9850 (M.C.)
| |
Collapse
|
10
|
Hu Z, Cao J, Liu G, Zhang H, Liu X. Comparative Transcriptome Profiling of Skeletal Muscle from Black Muscovy Duck at Different Growth Stages Using RNA-seq. Genes (Basel) 2020; 11:genes11101228. [PMID: 33092100 PMCID: PMC7590229 DOI: 10.3390/genes11101228] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Revised: 10/13/2020] [Accepted: 10/16/2020] [Indexed: 12/13/2022] Open
Abstract
In China, the production for duck meat is second only to that of chicken, and the demand for duck meat is also increasing. However, there is still unclear on the internal mechanism of regulating skeletal muscle growth and development in duck. This study aimed to identity candidate genes related to growth of duck skeletal muscle and explore the potential regulatory mechanism. RNA-seq technology was used to compare the transcriptome of skeletal muscles in black Muscovy ducks at different developmental stages (day 17, 21, 27, 31, and 34 of embryos and postnatal 6-month-olds). The SNPs and InDels of black Muscovy ducks at different growth stages were mainly in “INTRON”, “SYNONYMOUS_CODING”, “UTR_3_PRIME”, and “DOWNSTREAM”. The average number of AS in each sample was 37,267, mainly concentrated in TSS and TTS. Besides, a total of 19 to 5377 DEGs were detected in each pairwise comparison. Functional analysis showed that the DEGs were mainly involved in the processes of cell growth, muscle development, and cellular activities (junction, migration, assembly, differentiation, and proliferation). Many of DEGs were well known to be related to growth of skeletal muscle in black Muscovy duck, such as MyoG, FBXO1, MEF2A, and FoxN2. KEGG pathway analysis identified that the DEGs were significantly enriched in the pathways related to the focal adhesion, MAPK signaling pathway and regulation of the actin cytoskeleton. Some DEGs assigned to these pathways were potential candidate genes inducing the difference in muscle growth among the developmental stages, such as FAF1, RGS8, GRB10, SMYD3, and TNNI2. Our study identified several genes and pathways that may participate in the regulation of skeletal muscle growth in black Muscovy duck. These results should serve as an important resource revealing the molecular basis of muscle growth and development in duck.
Collapse
|
11
|
Hagiwara K, Ding L, Edmonson MN, Rice SV, Newman S, Easton J, Dai J, Meshinchi S, Ries RE, Rusch M, Zhang J. RNAIndel: discovering somatic coding indels from tumor RNA-Seq data. Bioinformatics 2020; 36:1382-1390. [PMID: 31593214 DOI: 10.1093/bioinformatics/btz753] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Revised: 08/29/2019] [Accepted: 10/01/2019] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Reliable identification of expressed somatic insertions/deletions (indels) is an unmet need due to artifacts generated in PCR-based RNA-Seq library preparation and the lack of normal RNA-Seq data, presenting analytical challenges for discovery of somatic indels in tumor transcriptome. RESULTS We present RNAIndel, a tool for predicting somatic, germline and artifact indels from tumor RNA-Seq data. RNAIndel leverages features derived from indel sequence context and biological effect in a machine-learning framework. Except for tumor samples with microsatellite instability, RNAIndel robustly predicts 88-100% of somatic indels in five diverse test datasets of pediatric and adult cancers, even recovering subclonal (VAF range 0.01-0.15) driver indels missed by targeted deep-sequencing, outperforming the current best-practice for RNA-Seq variant calling which had 57% sensitivity but with 14 times more false positives. AVAILABILITY AND IMPLEMENTATION RNAIndel is freely available at https://github.com/stjude/RNAIndel. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kohei Hagiwara
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Liang Ding
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Michael N Edmonson
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Stephen V Rice
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Scott Newman
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - John Easton
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Juncheng Dai
- Department of Epidemiology, Nanjing Medical University School of Public Health, Jiangning District, Nanjing, 211166, People's Republic of China
| | - Soheil Meshinchi
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Rhonda E Ries
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Michael Rusch
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Jinghui Zhang
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
12
|
Wei N, Song Y, Zhang F, Sun Z, Zhang X. Transcriptome Profiling of Acquired Gefitinib Resistant Lung Cancer Cells Reveals Dramatically Changed Transcription Programs and New Treatment Targets. Front Oncol 2020; 10:1424. [PMID: 32923394 PMCID: PMC7456826 DOI: 10.3389/fonc.2020.01424] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 07/06/2020] [Indexed: 01/24/2023] Open
Abstract
Background: Targeted therapy for lung cancer with epidermal growth factor receptor (EGFR) mutations with tyrosine kinase inhibitors (TKIs) represents one of the major breakthroughs in lung cancer management. However, gradually developed resistance to these drugs prevents sustained clinical benefits and calls for resistant mechanism research and identification of new therapeutic targets. Acquired T790M mutation accounts for the majority of resistance cases, yet transcriptome changes in these cells are less characterized, and it is not known if new treatment targets exist by available drugs. Methods: Transcriptome profiling was performed for lung cancer cell line PC9 and its resistant line PC9GR after long-term exposure to gefitinib through RNA sequencing. Differentially expressed genes and changed pathways were identified along with existing drugs targeting these upregulated genes. Using 144 lung cancer cell lines with both gene expression and drug response data from the cancer cell line encyclopedia (CCLE) and Cancer Therapeutics Response Portal (CTRP), we screened 549 drugs whose response was correlated with these upregulated genes in PC9GR cells, and top drugs were evaluated for their response in both PC9 and PC9GR cells. Results: In addition to the acquired T790M mutation, the resistant PC9GR cells had very different transcription programs from the sensitive PC9 cells. Multiple pathways were changed with the top ones including TNFA signaling, androgen/estrogen response, P53 pathway, MTORC1 signaling, hypoxia, and epithelial mesenchymal transition. Thirty-two upregulated genes had available drugs that can potentially be effective in treating the resistant cells. From the response profiles of CCLE, we found 17 drugs whose responses were associated with at least four of these upregulated genes. Among the four drugs evaluated (dasatinib, KPT-185, trametinib, and pluripotin), all except trametinib demonstrated strong inhibitory effects on the resistant PC9GR cells, among which KPT185 was the most potent. KPT-185 suppressed growth, caused apoptosis, and inhibited migration of the PC9GR cells at similar (or better) rates as the sensitive PC9 cells in a dose-dependent manner. Conclusions: Acquired TKI-resistant lung cancer cells (PC9GR) have dramatically changed transcription and pathway regulation, which expose new treatment targets. Existing drugs may be repurposed to treat those patients with developed resistance to TKIs.
Collapse
Affiliation(s)
- Nan Wei
- Department of Respiratory and Critical Care Medicine, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, China.,Academy of Medical Science, Zhengzhou University, Zhengzhou, China
| | - Yong'an Song
- Department of Respiratory and Critical Care Medicine, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, China.,Academy of Medical Science, Zhengzhou University, Zhengzhou, China
| | - Fan Zhang
- Department of Respiratory and Critical Care Medicine, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, China
| | - Zhifu Sun
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Xiaoju Zhang
- Department of Respiratory and Critical Care Medicine, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, China
| |
Collapse
|
13
|
Serin Harmanci A, Harmanci AO, Zhou X. CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data. Nat Commun 2020; 11:89. [PMID: 31900397 PMCID: PMC6941987 DOI: 10.1038/s41467-019-13779-x] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 11/25/2019] [Indexed: 12/15/2022] Open
Abstract
RNA sequencing experiments generate large amounts of information about expression levels of genes. Although they are mainly used for quantifying expression levels, they contain much more biologically important information such as copy number variants (CNVs). Here, we present CaSpER, a signal processing approach for identification, visualization, and integrative analysis of focal and large-scale CNV events in multiscale resolution using either bulk or single-cell RNA sequencing data. CaSpER integrates the multiscale smoothing of expression signal and allelic shift signals for CNV calling. The allelic shift signal measures the loss-of-heterozygosity (LOH) which is valuable for CNV identification. CaSpER employs an efficient methodology for the generation of a genome-wide B-allele frequency (BAF) signal profile from the reads and utilizes it for correction of CNVs calls. CaSpER increases the utility of RNA-sequencing datasets and complements other tools for complete characterization and visualization of the genomic and transcriptomic landscape of single cell and bulk RNA sequencing data.
Collapse
Affiliation(s)
- Akdes Serin Harmanci
- Center for Computational Systems Medicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Arif O Harmanci
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Xiaobo Zhou
- Center for Computational Systems Medicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Department of Integrative Biology and Pharmacology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- School of Dentistry, University of Texas Health Science Center at Houston, Houston, TX, 77054, USA.
| |
Collapse
|
14
|
Weng J, Li DD, Jiang BG, Yin XF. Temporal changes in the spinal cord transcriptome after peripheral nerve injury. Neural Regen Res 2020; 15:1360-1367. [PMID: 31960825 PMCID: PMC7047785 DOI: 10.4103/1673-5374.272618] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Peripheral nerve injury may trigger changes in mRNA levels in the spinal cord. Finding key mRNAs is important for improving repair after nerve injury. This study aimed to investigate changes in mRNAs in the spinal cord following sciatic nerve injury by transcriptomic analysis. The left sciatic nerve denervation model was established in C57BL/6 mice. The left L4-6 spinal cord segment was obtained at 0, 1, 2, 4 and 8 weeks after severing the sciatic nerve. mRNA expression profiles were generated by RNA sequencing. The sequencing results of spinal cord mRNA at 1, 2, 4, and 8 weeks after severing the sciatic nerve were compared with those at 0 weeks by bioinformatic analysis. We identified 1915 differentially expressed mRNAs in the spinal cord, of which 4, 1909, and 2 were differentially expressed at 1, 4, and 8 weeks after sciatic nerve injury, respectively. Sequencing results indicated that the number of differentially expressed mRNAs in the spinal cord was highest at 4 weeks after sciatic nerve injury. These mRNAs were associated with the cellular response to lipid, ATP metabolism, energy coupled proton transmembrane transport, nuclear transcription factor complex, vacuolar proton-transporting V-type ATPase complex, inner mitochondrial membrane protein complex, tau protein binding, NADH dehydrogenase activity and hydrogen ion transmembrane transporter activity. Of these mRNAs, Sgk1, Neurturin and Gpnmb took part in cell growth and development. Pathway analysis showed that these mRNAs were mainly involved in aldosterone-regulated sodium reabsorption, oxidative phosphorylation and collecting duct acid secretion. Functional assessment indicated that these mRNAs were associated with inflammation and cell morphology development. Our findings show that the number and type of spinal cord mRNAs involved in changes at different time points after peripheral nerve injury were different. The number of differentially expressed mRNAs in the spinal cord was highest at 4 weeks after sciatic nerve injury. These results provide reference data for finding new targets for the treatment of peripheral nerve injury, and for further gene therapy studies of peripheral nerve injury and repair. The study procedures were approved by the Ethics Committee of the Peking University People's Hospital (approval No. 2017PHC004) on March 5, 2017.
Collapse
Affiliation(s)
- Jian Weng
- Department of Orthopedics and Trauma, Peking University People's Hospital, Beijing; Department of Bone & Joint Surgery, Peking University Shenzhen Hospital, Shenzhen, Guangdong Province, China
| | - Dong-Dong Li
- Department of Orthopedics and Trauma, Peking University People's Hospital, Beijing; Department of Surgery, the 517th Hospital of the People's Liberation Army, Xinzhou, Shanxi Province, China
| | - Bao-Guo Jiang
- Department of Orthopedics and Trauma, Peking University People's Hospital, Beijing, China
| | - Xiao-Feng Yin
- Department of Orthopedics and Trauma, Peking University People's Hospital, Beijing, China
| |
Collapse
|
15
|
Grant AD, Vail P, Padi M, Witkiewicz AK, Knudsen ES. Interrogating Mutant Allele Expression via Customized Reference Genomes to Define Influential Cancer Mutations. Sci Rep 2019; 9:12766. [PMID: 31484939 PMCID: PMC6726654 DOI: 10.1038/s41598-019-48967-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 08/12/2019] [Indexed: 11/16/2022] Open
Abstract
Genetic alterations are essential for cancer initiation and progression. However, differentiating mutations that drive the tumor phenotype from mutations that do not affect tumor fitness remains a fundamental challenge in cancer biology. To better understand the impact of a given mutation within cancer, RNA-sequencing data was used to categorize mutations based on their allelic expression. For this purpose, we developed the MAXX (Mutation Allelic Expression Extractor) software, which is highly effective at delineating the allelic expression of both single nucleotide variants and small insertions and deletions. Results from MAXX demonstrated that mutations can be separated into three groups based on their expression of the mutant allele, lack of expression from both alleles, or expression of only the wild-type allele. By taking into consideration the allelic expression patterns of genes that are mutated in PDAC, it was possible to increase the sensitivity of widely used driver mutation detection methods, as well as identify subtypes that have prognostic significance and are associated with sensitivity to select classes of therapeutic agents in cell culture. Thus, differentiating mutations based on their mutant allele expression via MAXX represents a means to parse somatic variants in tumor genomes, helping to elucidate a gene’s respective role in cancer.
Collapse
Affiliation(s)
- Adam D Grant
- University of Arizona Cancer Center, Tucson, AZ, 85719, USA
| | - Paris Vail
- University of Arizona Cancer Center, Tucson, AZ, 85719, USA
| | - Megha Padi
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, 85719, USA
| | | | - Erik S Knudsen
- Department of Molecular and Cellular Biology, Roswell Park Cancer Center, Buffalo, NY, 14263, USA.
| |
Collapse
|
16
|
Mose LE, Perou CM, Parker JS. Improved indel detection in DNA and RNA via realignment with ABRA2. Bioinformatics 2019; 35:2966-2973. [PMID: 30649250 PMCID: PMC6735753 DOI: 10.1093/bioinformatics/btz033] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Revised: 11/08/2018] [Accepted: 01/10/2019] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Genomic variant detection from next-generation sequencing has become established as an extremely important component of research and clinical diagnoses in both cancer and Mendelian disorders. Insertions and deletions (indels) are a common source of variation and can frequently impact functionality, thus making their detection vitally important. While substantial effort has gone into detecting indels from DNA, there is still opportunity for improvement. Further, detection of indels from RNA-Seq data has largely been an afterthought and offers another critical area for variant detection. RESULTS We present here ABRA2, a redesign of the original ABRA implementation that offers support for realignment of both RNA and DNA short reads. The process results in improved accuracy and scalability including support for human whole genomes. Results demonstrate substantial improvement in indel detection for a variety of data types, including those that were not previously supported by ABRA. Further, ABRA2 results in broad improvements to variant calling accuracy across a wide range of post-processing workflows including whole genomes, targeted exomes and transcriptome sequencing. AVAILABILITY AND IMPLEMENTATION ABRA2 is implemented in a combination of Java and C/C++ and is freely available to all from: https://github.com/mozack/abra2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lisle E Mose
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Charles M Perou
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Joel S Parker
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
17
|
Batcha AMN, Bamopoulos SA, Kerbs P, Kumar A, Jurinovic V, Rothenberg-Thurley M, Ksienzyk B, Philippou-Massier J, Krebs S, Blum H, Schneider S, Konstandin N, Bohlander SK, Heckman C, Kontro M, Hiddemann W, Spiekermann K, Braess J, Metzeler KH, Greif PA, Mansmann U, Herold T. Allelic Imbalance of Recurrently Mutated Genes in Acute Myeloid Leukaemia. Sci Rep 2019; 9:11796. [PMID: 31409822 PMCID: PMC6692371 DOI: 10.1038/s41598-019-48167-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 07/29/2019] [Indexed: 12/24/2022] Open
Abstract
The patho-mechanism of somatic driver mutations in cancer usually involves transcription, but the proportion of mutations and wild-type alleles transcribed from DNA to RNA is largely unknown. We systematically compared the variant allele frequencies of recurrently mutated genes in DNA and RNA sequencing data of 246 acute myeloid leukaemia (AML) patients. We observed that 95% of all detected variants were transcribed while the rest were not detectable in RNA sequencing with a minimum read-depth cut-off (10x). Our analysis focusing on 11 genes harbouring recurring mutations demonstrated allelic imbalance (AI) in most patients. GATA2, RUNX1, TET2, SRSF2, IDH2, PTPN11, WT1, NPM1 and CEBPA showed significant AIs. While the effect size was small in general, GATA2 exhibited the largest allelic imbalance. By pooling heterogeneous data from three independent AML cohorts with paired DNA and RNA sequencing (N = 253), we could validate the preferential transcription of GATA2-mutated alleles. Differential expression analysis of the genes with significant AI showed no significant differential gene and isoform expression for the mutated genes, between mutated and wild-type patients. In conclusion, our analyses identified AI in nine out of eleven recurrently mutated genes. AI might be a common phenomenon in AML which potentially contributes to leukaemogenesis.
Collapse
Affiliation(s)
- Aarif M N Batcha
- Institute of Medical Data Processing, Biometrics and Epidemiology (IBE), Faculty of Medicine, LMU Munich, Munich, Germany. .,Data Integration for Future Medicine (DiFuture, www.difuture.de), LMU Munich, Munich, Germany.
| | - Stefanos A Bamopoulos
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
| | - Paul Kerbs
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
| | - Ashwini Kumar
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Vindi Jurinovic
- Institute of Medical Data Processing, Biometrics and Epidemiology (IBE), Faculty of Medicine, LMU Munich, Munich, Germany.,Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
| | - Maja Rothenberg-Thurley
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
| | - Bianka Ksienzyk
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
| | - Julia Philippou-Massier
- Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, University of Munich, Munich, Germany
| | - Stefan Krebs
- Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, University of Munich, Munich, Germany
| | - Helmut Blum
- Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, University of Munich, Munich, Germany
| | - Stephanie Schneider
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,Institute of Human Genetics, University Hospital, LMU Munich, Munich, Germany
| | - Nikola Konstandin
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
| | - Stefan K Bohlander
- Leukaemia and Blood Cancer Research Unit, Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
| | - Caroline Heckman
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Mika Kontro
- Department of Haematology, Helsinki University Hospital Comprehensive Cancer Center, Helsinki, Finland
| | - Wolfgang Hiddemann
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Karsten Spiekermann
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jan Braess
- Department of Oncology and Hematology, Hospital Barmherzige Brüder, Regensburg, Germany
| | - Klaus H Metzeler
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Philipp A Greif
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ulrich Mansmann
- Institute of Medical Data Processing, Biometrics and Epidemiology (IBE), Faculty of Medicine, LMU Munich, Munich, Germany.,Data Integration for Future Medicine (DiFuture, www.difuture.de), LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tobias Herold
- Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany. .,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany. .,German Cancer Research Center (DKFZ), Heidelberg, Germany. .,Research Unit Apoptosis in Hematopoietic Stem Cells, Helmholtz Zentrum München, German Research Center for Environmental Health (HMGU), Munich, Germany.
| |
Collapse
|
18
|
Iacoangeli A, Al Khleifat A, Sproviero W, Shatunov A, Jones AR, Morgan SL, Pittman A, Dobson RJ, Newhouse SJ, Al-Chalabi A. DNAscan: personal computer compatible NGS analysis, annotation and visualisation. BMC Bioinformatics 2019; 20:213. [PMID: 31029080 PMCID: PMC6487045 DOI: 10.1186/s12859-019-2791-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 04/02/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Next Generation Sequencing (NGS) is a commonly used technology for studying the genetic basis of biological processes and it underpins the aspirations of precision medicine. However, there are significant challenges when dealing with NGS data. Firstly, a huge number of bioinformatics tools for a wide range of uses exist, therefore it is challenging to design an analysis pipeline. Secondly, NGS analysis is computationally intensive, requiring expensive infrastructure, and many medical and research centres do not have adequate high performance computing facilities and cloud computing is not always an option due to privacy and ownership issues. Finally, the interpretation of the results is not trivial and most available pipelines lack the utilities to favour this crucial step. RESULTS We have therefore developed a fast and efficient bioinformatics pipeline that allows for the analysis of DNA sequencing data, while requiring little computational effort and memory usage. DNAscan can analyse a whole exome sequencing sample in 1 h and a 40x whole genome sequencing sample in 13 h, on a midrange computer. The pipeline can look for single nucleotide variants, small indels, structural variants, repeat expansions and viral genetic material (or any other organism). Its results are annotated using a customisable variety of databases and are available for an on-the-fly visualisation with a local deployment of the gene.iobio platform. DNAscan is implemented in Python. Its code and documentation are available on GitHub: https://github.com/KHP-Informatics/DNAscan . Instructions for an easy and fast deployment with Docker and Singularity are also provided on GitHub. CONCLUSIONS DNAscan is an extremely fast and computationally efficient pipeline for analysis, visualization and interpretation of NGS data. It is designed to provide a powerful and easy-to-use tool for applications in biomedical research and diagnostic medicine, at minimal computational cost. Its comprehensive approach will maximise the potential audience of users, bringing such analyses within the reach of non-specialist laboratories, and those from centres with limited funding available.
Collapse
Affiliation(s)
- A Iacoangeli
- Department of Biostatistics and Health Informatics, King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK.
| | - A Al Khleifat
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - W Sproviero
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - A Shatunov
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - A R Jones
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - S L Morgan
- Department of Molecular Neuroscience, UCL, Institute of Neurology, London, UK
| | - A Pittman
- Department of Molecular Neuroscience, UCL, Institute of Neurology, London, UK
| | - R J Dobson
- Department of Biostatistics and Health Informatics, King's College London, London, UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, UK
- National Institute for Health Research (NIHR) Biomedical Research Centre and Dementia Unit at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - S J Newhouse
- Department of Biostatistics and Health Informatics, King's College London, London, UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, UK
- National Institute for Health Research (NIHR) Biomedical Research Centre and Dementia Unit at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - A Al-Chalabi
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
- King's College Hospital, Bessemer Road, London, SE5 9RS, UK
| |
Collapse
|
19
|
Prodduturi N, Bhagwate A, Kocher JPA, Sun Z. Indel sensitive and comprehensive variant/mutation detection from RNA sequencing data for precision medicine. BMC Med Genomics 2018; 11:67. [PMID: 30255803 PMCID: PMC6157028 DOI: 10.1186/s12920-018-0391-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Background RNA-seq is the most commonly used sequencing application. Not only does it measure gene expression but it is also an excellent media to detect important structural variants such as single nucleotide variants (SNVs), insertion/deletion (Indels) or fusion transcripts. However, detection of these variants is challenging and complex from RNA-seq. Here we describe a sensitive and accurate analytical pipeline which detects various mutations at once for translational precision medicine. Methods The pipeline incorporates most sensitive aligners for Indels in RNA-Seq, the best practice for data preprocessing and variant calling, and STAR-fusion is for chimeric transcripts. Variants/mutations are annotated, and key genes can be extracted for further investigation and clinical actions. Three datasets were used to evaluate the performance of the pipeline for SNVs, indels and fusion transcripts. Results For the well-defined variants from NA12878 by GIAB project, about 95% and 80% of sensitivities were obtained for SNVs and indels, respectively, in matching RNA-seq. Comparison with other variant specific tools showed good performance of the pipeline. For the lung cancer dataset with 41 known and oncogenic mutations, 39 were detected by the pipeline with STAR aligner and all by the GSNAP aligner. An actionable EML4 and ALK fusion was also detected in one of the tumors, which also demonstrated outlier ALK expression. For 9 fusions spiked-into RNA-seq libraries with different concentrations, the pipeline was able to detect all in unfiltered results although some at very low concentrations may be missed when filtering was applied. Conclusions The new RNA-seq workflow is an accurate and comprehensive mutation profiler from RNA-seq. Key or actionable mutations are reliably detected from RNA-seq, which makes it a practical alternative source for personalized medicine. Electronic supplementary material The online version of this article (10.1186/s12920-018-0391-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Naresh Prodduturi
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First St SW, Rochester, MN, 55905, USA
| | - Aditya Bhagwate
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First St SW, Rochester, MN, 55905, USA
| | - Jean-Pierre A Kocher
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First St SW, Rochester, MN, 55905, USA
| | - Zhifu Sun
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First St SW, Rochester, MN, 55905, USA.
| |
Collapse
|
20
|
Michno JM, Stupar RM. The importance of genotype identity, genetic heterogeneity, and bioinformatic handling for properly assessing genomic variation in transgenic plants. BMC Biotechnol 2018; 18:38. [PMID: 29859067 PMCID: PMC5984819 DOI: 10.1186/s12896-018-0447-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 05/18/2018] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The advent of -omics technologies has enabled the resolution of fine molecular differences among individuals within a species. DNA sequence variations, such as single nucleotide polymorphisms or small deletions, can be tabulated for many kinds of genotype comparisons. However, experimental designs and analytical approaches are replete with ways to overestimate the level of variation present within a given sample. Analytical pipelines that do not apply proper thresholds nor assess reproducibility among samples are susceptible to calling false-positive variants. Furthermore, issues with sample genotype identity or failing to account for heterogeneity in reference genotypes may lead to misinterpretations of standing variants as polymorphisms derived de novo. RESULTS A recent publication that featured the analysis of RNA-sequencing data in three transgenic soybean event series appeared to overestimate the number of sequence variants identified in plants that were exposed to a tissue culture based transformation process. We reanalyzed these data with a stringent set of criteria and demonstrate three different factors that lead to variant overestimation, including issues related to the genetic identity of the background genotype, unaccounted genetic heterogeneity in the reference genome, and insufficient bioinformatics filtering. CONCLUSIONS This study serves as a cautionary tale to users of genomic and transcriptomic data that wish to assess the molecular variation attributable to tissue culture and transformation processes. Moreover, accounting for the factors that lead to sequence variant overestimation is equally applicable to samples derived from other germplasm sources, including chemical or irradiation mutagenesis and genome engineering (e.g., CRISPR) processes.
Collapse
Affiliation(s)
- Jean-Michel Michno
- Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN USA
- Department of Agronomy and Plant Genetics, University of Minnesota, 1991 Upper Buford Circle, 411 Borlaug Hall, Saint Paul, MN 55108 USA
| | - Robert M. Stupar
- Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN USA
- Department of Agronomy and Plant Genetics, University of Minnesota, 1991 Upper Buford Circle, 411 Borlaug Hall, Saint Paul, MN 55108 USA
| |
Collapse
|
21
|
Yang R, Van Etten JL, Dehm SM. Indel detection from DNA and RNA sequencing data with transIndel. BMC Genomics 2018; 19:270. [PMID: 29673323 PMCID: PMC5909256 DOI: 10.1186/s12864-018-4671-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 04/13/2018] [Indexed: 12/18/2022] Open
Abstract
Background Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Results Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Conclusions Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology. Electronic supplementary material The online version of this article (10.1186/s12864-018-4671-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rendong Yang
- The Hormel Institute, University of Minnesota, 801 16th AVE NE, Austin, MN, 55912, USA. .,Masonic Cancer Center, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA.
| | - Jamie L Van Etten
- Masonic Cancer Center, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA
| | - Scott M Dehm
- Masonic Cancer Center, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA. .,Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, 55455, USA.
| |
Collapse
|
22
|
RNA-seq Reveals the Overexpression of IGSF9 in Endometrial Cancer. JOURNAL OF ONCOLOGY 2018; 2018:2439527. [PMID: 29666643 PMCID: PMC5832105 DOI: 10.1155/2018/2439527] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Revised: 01/03/2018] [Accepted: 01/17/2018] [Indexed: 12/20/2022]
Abstract
We performed RNA-seq on an Illumina platform for 7 patients with endometrioid endometrial carcinoma for which both tumor tissue and adjacent noncancer tissue were available. A total of 66 genes were differentially expressed with significance level at adjusted p value < 0.01. Using the gene functional classification tool in the NIH DAVID bioinformatics resource, 5 genes were found to be the only enriched group out of that list of genes. The gene IGSF9 was chosen for further characterization with immunohistochemical staining of a larger cohort of human endometrioid carcinoma tissues. The expression level of IGSF9 in cancer cells was significantly higher than that in control glandular cells in paired tissue samples from the same patients (p = 0.008) or in overall comparison between cancer and the control (p = 0.003). IGSF9 expression is higher in patients with myometrium invasion relative to those without invasion (p = 0.015). Reanalysis of RNA-seq dataset from The Cancer Genome Atlas shows higher expression of IGSF9 in endometrial cancer versus normal control and expression was associated with poor prognosis. These results suggest IGSF9 as a new biomarker in endometrial cancer and warrant further studies on its function, mechanism of action, and potential clinical utility.
Collapse
|
23
|
Rubinsteyn A, Kodysh J, Hodes I, Mondet S, Aksoy BA, Finnigan JP, Bhardwaj N, Hammerbacher J. Computational Pipeline for the PGV-001 Neoantigen Vaccine Trial. Front Immunol 2018; 8:1807. [PMID: 29403468 PMCID: PMC5778604 DOI: 10.3389/fimmu.2017.01807] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2017] [Accepted: 11/30/2017] [Indexed: 12/17/2022] Open
Abstract
This paper describes the sequencing protocol and computational pipeline for the PGV-001 personalized vaccine trial. PGV-001 is a therapeutic peptide vaccine targeting neoantigens identified from patient tumor samples. Peptides are selected by a computational pipeline that identifies mutations from tumor/normal exome sequencing and ranks mutant sequences by a combination of predicted Class I MHC affinity and abundance estimated from tumor RNA. The personalized genomic vaccine (PGV) pipeline is modular and consists of independently usable tools and software libraries. We hope that the functionality of these tools may extend beyond the specifics of the PGV-001 trial and enable other research groups in their own neoantigen investigations.
Collapse
Affiliation(s)
- Alex Rubinsteyn
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Julia Kodysh
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Isaac Hodes
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Sebastien Mondet
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Bulent Arman Aksoy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Department of Microbiology and Immunology, Medical University of South Carolina, Charleston, SC, United States
| | - John P Finnigan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Icahn School of Medicine at Mount Sinai, Tisch Cancer Institute, New York, NY, United States
| | - Nina Bhardwaj
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Icahn School of Medicine at Mount Sinai, Tisch Cancer Institute, New York, NY, United States
| | - Jeffrey Hammerbacher
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Department of Microbiology and Immunology, Medical University of South Carolina, Charleston, SC, United States
| |
Collapse
|
24
|
Cieślik M, Chinnaiyan AM. Cancer transcriptome profiling at the juncture of clinical translation. Nat Rev Genet 2017; 19:93-109. [PMID: 29279605 DOI: 10.1038/nrg.2017.96] [Citation(s) in RCA: 156] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Methodological breakthroughs over the past four decades have repeatedly revolutionized transcriptome profiling. Using RNA sequencing (RNA-seq), it has now become possible to sequence and quantify the transcriptional outputs of individual cells or thousands of samples. These transcriptomes provide a link between cellular phenotypes and their molecular underpinnings, such as mutations. In the context of cancer, this link represents an opportunity to dissect the complexity and heterogeneity of tumours and to discover new biomarkers or therapeutic strategies. Here, we review the rationale, methodology and translational impact of transcriptome profiling in cancer.
Collapse
Affiliation(s)
- Marcin Cieślik
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan
| | - Arul M Chinnaiyan
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan.,Comprehensive Cancer Center, University of Michigan.,Department of Urology, University of Michigan.,Howard Hughes Medical Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
25
|
Audoux J, Salson M, Grosset CF, Beaumeunier S, Holder JM, Commes T, Philippe N. SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines. BMC Bioinformatics 2017; 18:428. [PMID: 28969586 PMCID: PMC5623974 DOI: 10.1186/s12859-017-1831-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 09/08/2017] [Indexed: 11/10/2022] Open
Abstract
Background The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. Results To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. Conclusion Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1831-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jérôme Audoux
- SeqOne, IRMB, CHRU de Montpellier -Hopital St Eloi, 80 avenue Augustin Fliche, Montpellier, 34295, France.,Institute of Computational Biology, Montpellier, 860, Rue Saint-Priest, Montpellier Cedex 5, 34095, France
| | - Mikaël Salson
- University Lille, CNRS, Centrale Lille, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Lille, F-59000, France
| | | | - Sacha Beaumeunier
- SeqOne, IRMB, CHRU de Montpellier -Hopital St Eloi, 80 avenue Augustin Fliche, Montpellier, 34295, France.,Institute of Computational Biology, Montpellier, 860, Rue Saint-Priest, Montpellier Cedex 5, 34095, France
| | - Jean-Marc Holder
- SeqOne, IRMB, CHRU de Montpellier -Hopital St Eloi, 80 avenue Augustin Fliche, Montpellier, 34295, France.,Institute of Computational Biology, Montpellier, 860, Rue Saint-Priest, Montpellier Cedex 5, 34095, France
| | - Thérèse Commes
- SeqOne, IRMB, CHRU de Montpellier -Hopital St Eloi, 80 avenue Augustin Fliche, Montpellier, 34295, France.,Institute of Computational Biology, Montpellier, 860, Rue Saint-Priest, Montpellier Cedex 5, 34095, France
| | - Nicolas Philippe
- SeqOne, IRMB, CHRU de Montpellier -Hopital St Eloi, 80 avenue Augustin Fliche, Montpellier, 34295, France. .,Institute of Computational Biology, Montpellier, 860, Rue Saint-Priest, Montpellier Cedex 5, 34095, France.
| |
Collapse
|
26
|
Oikkonen L, Lise S. Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection. Wellcome Open Res 2017; 2:6. [PMID: 28239666 PMCID: PMC5322827 DOI: 10.12688/wellcomeopenres.10501.2] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
RNA-seq (transcriptome sequencing) is primarily considered a method of gene expression analysis but it can also be used to detect DNA variants in expressed regions of the genome. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.
Collapse
Affiliation(s)
- Laura Oikkonen
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Stefano Lise
- Centre for Evolution and Cancer, The Institute of Cancer Research, Sutton, UK
| |
Collapse
|
27
|
Oikkonen L, Lise S. Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection. Wellcome Open Res 2017. [PMID: 28239666 DOI: 10.12688/wellcomeopenres.10501.1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Identifying variants from RNA-seq (transcriptome sequencing) data is a cost-effective and versatile alternative to whole-genome sequencing. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.
Collapse
Affiliation(s)
- Laura Oikkonen
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Stefano Lise
- Centre for Evolution and Cancer, The Institute of Cancer Research, Sutton, UK
| |
Collapse
|