1
|
Lin Z, Zhang D, Tao Q, Shi D, Haffari G, Wu Q, He M, Ge Z. Medical visual question answering: A survey. Artif Intell Med 2023; 143:102611. [PMID: 37673579 DOI: 10.1016/j.artmed.2023.102611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 05/25/2023] [Accepted: 06/06/2023] [Indexed: 09/08/2023]
Abstract
Medical Visual Question Answering (VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer. Although the general-domain VQA has been extensively studied, the medical VQA still needs specific investigation and exploration due to its task features. In the first part of this survey, we collect and discuss the publicly available medical VQA datasets up-to-date about the data source, data quantity, and task feature. In the second part, we review the approaches used in medical VQA tasks. We summarize and discuss their techniques, innovations, and potential improvements. In the last part, we analyze some medical-specific challenges for the field and discuss future research directions. Our goal is to provide comprehensive and helpful information for researchers interested in the medical visual question answering field and encourage them to conduct further research in this field.
Collapse
Affiliation(s)
- Zhihong Lin
- Faculty of Engineering, Monash University, Clayton, VIC, 3800, Australia.
| | - Donghao Zhang
- eResearch Center, Monash University, Clayton, VIC, 3800, Australia.
| | - Qingyi Tao
- NVIDIA AI Technology Center, 038988, Singapore.
| | - Danli Shi
- Centre for Eye and Vision Research, The Hong Kong Polytechnic University, Kowloon, TU428, Hong Kong SAR.
| | - Gholamreza Haffari
- Faculty of Information Technology, Monash University, Clayton, 3800, VIC, Australia.
| | - Qi Wu
- Australian Centre for Robotic Vision, The University of Adelaide, Adelaide, SA 5005, Australia.
| | - Mingguang He
- Centre for Eye and Vision Research, The Hong Kong Polytechnic University, Kowloon, TU428, Hong Kong SAR.
| | - Zongyuan Ge
- Faculty of Information Technology, Monash University, Clayton, 3800, VIC, Australia; Airdoc Research, Melbourne, VIC, 3000, Australia; Monash-NVIDIA AI Tech Centre, Melbourne, VIC, 3000, Australia.
| |
Collapse
|
2
|
Akbarzadeh Khorshidi H, Hassani-Mahmooei B, Haffari G. An Interpretable Algorithm on Post-injury Health Service Utilization Patterns to Predict Injury Outcomes. J Occup Rehabil 2020; 30:331-342. [PMID: 31620997 DOI: 10.1007/s10926-019-09863-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Purpose Post-injury health service utilization (HSU) contributes to injury outcomes, but limited studies investigated their relationship. This study aims to group injured patients in transport accidents based on minimal historical information of their HSU so that the groups are meaningfully associated with the outcome of interest. Methods The data include 20,692 injured patients who had compensation claims over 3 years. We propose a hybrid approach, combining unsupervised and supervised machine learning methods. Based on the first week post-injury data, we identify a proper clustering of patients best associated with total cost to recovery, as well as the discovery of HSU patterns. This allows developing models to accurately predict the outcome of interest using the discovered patterns. Furthermore, we propose to use decision tree classifiers to accurately classify future patients into the discovered clusters using their first week post-injury information. Results Our hybrid approach has identified eight patient groups. The compactness of the resulted clusters, assessed by Average Silhouette Width metric, is 0.71 indicating well-defined clusters. The resulted patient groups are highly predictive of injury outcomes. They improve the cost predictability more than twice in comparison with predictors such as gender, age and injury type. These groups also have substantial association with patients' recovery. The transparency and interpretability of decision trees allow integrating the resulting classification rules conveniently in operational processes. Conclusions This study provides a framework to discover knowledge and useful insights for health service providers and policy makers to control injury outcomes, and consequently to reduce the severity of transport accidents.
Collapse
Affiliation(s)
- Hadi Akbarzadeh Khorshidi
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.
- Institute for Safety Compensation and Recovery Research, Monash University, Melbourne, Australia.
| | - Behrooz Hassani-Mahmooei
- Insurance, Work and Health Group, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC, Australia
| | - Gholamreza Haffari
- Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
3
|
Li F, Wang Y, Li C, Marquez-Lago TT, Leier A, Rawlings ND, Haffari G, Revote J, Akutsu T, Chou KC, Purcell AW, Pike RN, Webb GI, Ian Smith A, Lithgow T, Daly RJ, Whisstock JC, Song J. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods. Brief Bioinform 2019; 20:2150-2166. [PMID: 30184176 PMCID: PMC6954447 DOI: 10.1093/bib/bby077] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 07/26/2018] [Accepted: 08/01/2018] [Indexed: 01/06/2023] Open
Abstract
The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Biology, Institute of Molecular Systems Biology,ETH Zürich, Zürich 8093, Switzerland
| | - Tatiana T Marquez-Lago
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Wellcome Trust Genome Campus,Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gholamreza Haffari
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jerico Revote
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Anthony W Purcell
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Robert N Pike
- La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria 3800, Australia
| | - Roger J Daly
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - James C Whisstock
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
4
|
Aryal S, Ting KM, Washio T, Haffari G. A comparative study of data-dependent approaches without learning in measuring similarities of data objects. Data Min Knowl Discov 2019. [DOI: 10.1007/s10618-019-00660-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
5
|
Khorshidi HA, Haffari G, Aickelin U, Hassani-Mahmooei B. Early Identification of Undesirable Outcomes for Transport Accident Injured Patients Using Semi-Supervised Clustering. Stud Health Technol Inform 2019; 266:1-6. [PMID: 31397293 DOI: 10.3233/shti190764] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Identifying those patient groups, who have unwanted outcomes, in the early stages is crucial to providing the most appropriate level of care. In this study, we intend to find distinctive patterns in health service use (HSU) of transport accident injured patients within the first week post-injury. Aiming those patterns that are associated with the outcome of interest. To recognize these patterns, we propose a multi-objective optimization model that minimizes the k-medians cost function and regression error simultaneously. Thus, we use a semi-supervised clustering approach to identify patient groups based on HSU patterns and their association with total cost. To solve the optimization problem, we introduce an evolutionary algorithm using stochastic gradient descent and Pareto optimal solutions. As a result, we find the best optimal clusters by minimizing both objective functions. The results show that the proposed semi-supervised approach identifies distinct groups of HSUs and contributes to predict total cost. Also, the experiments prove the performance of the multi-objective approach in comparison with single- objective approaches.
Collapse
Affiliation(s)
- Hadi A Khorshidi
- School of Computing & Information Systems, The University of Melbourne, Australia
| | | | - Uwe Aickelin
- School of Computing & Information Systems, The University of Melbourne, Australia
| | | |
Collapse
|
6
|
Akbarzadeh Khorshidi H, Aickelin U, Haffari G, Hassani-Mahmooei B. Multi-objective semi-supervised clustering to identify health service patterns for injured patients. Health Inf Sci Syst 2019; 7:18. [PMID: 31523422 DOI: 10.1007/s13755-019-0080-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 08/17/2019] [Indexed: 10/26/2022] Open
Abstract
Purpose This study develops a pattern recognition method that identifies patterns based on their similarity and their association with the outcome of interest. The practical purpose of developing this pattern recognition method is to group patients, who are injured in transport accidents, in the early stages post-injury. This grouping is based on distinctive patterns in health service use within the first week post-injury. The groups also provide predictive information towards the total cost of medication process. As a result, the group of patients who have undesirable outcomes are identified as early as possible based health service use patterns. Methods We propose a multi-objective optimization model to group patients. An objective function is the cost function of k-medians clustering to recognize the similar patterns. Another objective function is the cross-validated root-mean-square error to examine the association with the total cost. The best grouping is obtained by minimizing both objective functions. As a result, the multi-objective optimization model is a semi-supervised clustering which learns health service use patterns in both unsupervised and supervised ways. We also introduce an evolutionary computation approach includes stochastic gradient descent and Pareto optimal solutions to find the optimal solution. In addition, we use the decision tree method to reproduce the optimal groups using an interpretable classification model. Results The results show that the proposed multi-objective semi-supervised clustering identifies distinct groups of health service uses and contributes to predict the total cost. The performance of the multi-objective model has been examined using two metrics such as the average silhouette width and the cross-validation error. The examination proves that the multi-objective model outperforms the single-objective ones. In addition, the interpretable classification model shows that imaging and therapeutic services are critical services in the first-week post-injury to group injured patients. Conclusion The proposed multi-objective semi-supervised clustering finds the optimal clusters that not only are well-separated from each other but can provide informative insights regarding the outcome of interest. It also overcomes two drawback of clustering methods such as being sensitive to the initial cluster centers and need for specifying the number of clusters.
Collapse
Affiliation(s)
- Hadi Akbarzadeh Khorshidi
- 1School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC 3010 Australia
| | - Uwe Aickelin
- 1School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC 3010 Australia
| | - Gholamreza Haffari
- 2Faculty of Information Technology, Monash University, Melbourne, VIC Australia
| | - Behrooz Hassani-Mahmooei
- 3Insurance, Work and Health Group, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC Australia
| |
Collapse
|
7
|
Rahman MS, Haffari G. Analyzing Tumor Heterogeneity by Incorporating Long-Range Mutational Influences and Multiple Sample Data into Heterogeneity Factorial Hidden Markov Model. J Comput Biol 2019; 26:985-1002. [PMID: 31120348 DOI: 10.1089/cmb.2018.0242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of the clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on the short read data of the sample generated using the next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. Existing methods to infer clones from noisy NGS data do not consider the presence of long-range mutational influences. Therefore, we develop a new model, called extended multiple sample tumor heterogeneity prediction by factorial Hidden Markov model (emHetFHMM), based on factorial hidden Markov models to infer clones and their proportions by capturing the long-range mutational influences. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from the previous work (PyClone, PhyloSub, and HetFHMM) based on both synthetic data and real cancer data from acute myeloid leukemia. Empirical results confirm that emHetFHMM infers clonal composition of a tumor sample more accurately than previous studies.
Collapse
Affiliation(s)
- Mohammad S Rahman
- Clayton School of Information Technology, Monash University, Clayton, Australia
| | - Gholamreza Haffari
- Clayton School of Information Technology, Monash University, Clayton, Australia
| |
Collapse
|
8
|
Song J, Li F, Leier A, Marquez-Lago TT, Akutsu T, Haffari G, Chou KC, Webb GI, Pike RN, Hancock J. PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy. Bioinformatics 2019; 34:684-687. [PMID: 29069280 DOI: 10.1093/bioinformatics/btx670] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 10/18/2017] [Indexed: 11/13/2022] Open
Abstract
Summary Proteases are enzymes that specifically cleave the peptide backbone of their target proteins. As an important type of irreversible post-translational modification, protein cleavage underlies many key physiological processes. When dysregulated, proteases' actions are associated with numerous diseases. Many proteases are highly specific, cleaving only those target substrates that present certain particular amino acid sequence patterns. Therefore, tools that successfully identify potential target substrates for proteases may also identify previously unknown, physiologically relevant cleavage sites, thus providing insights into biological processes and guiding hypothesis-driven experiments aimed at verifying protease-substrate interaction. In this work, we present PROSPERous, a tool for rapid in silico prediction of protease-specific cleavage sites in substrate sequences. Our tool is based on logistic regression models and uses different scoring functions and their pairwise combinations to subsequently predict potential cleavage sites. PROSPERous represents a state-of-the-art tool that enables fast, accurate and high-throughput prediction of substrate cleavage sites for 90 proteases. Availability and implementation http://prosperous.erc.monash.edu/. Contact jiangning.song@monash.edu or geoff.webb@monash.edu or r.pike@latrobe.edu.au. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology.,Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Clayton, VIC 3800, Australia
| | - Fuyi Li
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA.,Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA.,Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | | | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology
| | - Robert N Pike
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Clayton, VIC 3800, Australia.,La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | | |
Collapse
|
9
|
Song J, Li F, Takemoto K, Haffari G, Akutsu T, Chou KC, Webb GI. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J Theor Biol 2018; 443:125-137. [DOI: 10.1016/j.jtbi.2018.01.023] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 01/17/2018] [Accepted: 01/18/2018] [Indexed: 10/18/2022]
|
10
|
Abstract
Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on short read data of the sample generated using next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. We develop a new model called HetFHMM, based on factorial hidden Markov models, to infer clones and their proportions from noisy NGS data. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from previous work (PyClone and PhyloSub) based on both synthetic data and real cancer data on acute myeloid leukemia. Empirical results confirm that HetFHMM infers clonal composition of a tumor sample more accurately than previous work.
Collapse
Affiliation(s)
- Mohammad S Rahman
- Clayton School of Information Technology, Monash University , Clayton, Australia
| | - Ann E Nicholson
- Clayton School of Information Technology, Monash University , Clayton, Australia
| | - Gholamreza Haffari
- Clayton School of Information Technology, Monash University , Clayton, Australia
| |
Collapse
|
11
|
Shrestha R, Hodzic E, Sauerwald T, Dao P, Wang K, Yeung J, Anderson S, Vandin F, Haffari G, Collins CC, Sahinalp SC. HIT'nDRIVE: patient-specific multidriver gene prioritization for precision oncology. Genome Res 2017; 27:1573-1588. [PMID: 28768687 PMCID: PMC5580716 DOI: 10.1101/gr.221218.117] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 07/06/2017] [Indexed: 12/12/2022]
Abstract
Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the "random walk facility location" (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: "multihitting time," the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients' survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology.
Collapse
Affiliation(s)
- Raunak Shrestha
- Bioinformatics Training Program, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4.,Laboratory for Advanced Genome Analysis, Vancouver Prostate Centre, Vancouver, British Columbia, Canada V6H 3Z6
| | - Ermin Hodzic
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada V5A 1S6
| | - Thomas Sauerwald
- Computer Laboratory, University of Cambridge, Cambridge CB3 0FD, United Kingdom
| | - Phuong Dao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Kendric Wang
- Laboratory for Advanced Genome Analysis, Vancouver Prostate Centre, Vancouver, British Columbia, Canada V6H 3Z6
| | - Jake Yeung
- Laboratory for Advanced Genome Analysis, Vancouver Prostate Centre, Vancouver, British Columbia, Canada V6H 3Z6
| | - Shawn Anderson
- Laboratory for Advanced Genome Analysis, Vancouver Prostate Centre, Vancouver, British Columbia, Canada V6H 3Z6
| | - Fabio Vandin
- Department of Information Engineering, University of Padova, 35131 Padova, Italy
| | - Gholamreza Haffari
- Faculty of Information Technology, Monash University, Melbourne 3800, Australia
| | - Colin C Collins
- Laboratory for Advanced Genome Analysis, Vancouver Prostate Centre, Vancouver, British Columbia, Canada V6H 3Z6.,Department of Urologic Sciences, University of British Columbia, Vancouver, British Columbia, Canada V5Z 1M9
| | - S Cenk Sahinalp
- Laboratory for Advanced Genome Analysis, Vancouver Prostate Centre, Vancouver, British Columbia, Canada V6H 3Z6.,School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada V5A 1S6.,School of Informatics and Computing, Indiana University, Bloomington, Indiana 47408, USA
| |
Collapse
|
12
|
Chai RC, McDonald MM, Terry RL, Kovačić N, Down JM, Pettitt JA, Mohanty ST, Shah S, Haffari G, Xu J, Gillespie MT, Rogers MJ, Price JT, Croucher PI, Quinn JMW. Melphalan modifies the bone microenvironment by enhancing osteoclast formation. Oncotarget 2017; 8:68047-68058. [PMID: 28978095 PMCID: PMC5620235 DOI: 10.18632/oncotarget.19152] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 06/02/2017] [Indexed: 11/25/2022] Open
Abstract
Melphalan is a cytotoxic chemotherapy used to treat patients with multiple myeloma (MM). Bone resorption by osteoclasts, by remodeling the bone surface, can reactivate dormant MM cells held in the endosteal niche to promote tumor development. Dormant MM cells can be reactivated after melphalan treatment; however, it is unclear whether melphalan treatment increases osteoclast formation to modify the endosteal niche. Melphalan treatment of mice for 14 days decreased bone volume and the endosteal bone surface, and this was associated with increases in osteoclast numbers. Bone marrow cells (BMC) from melphalan-treated mice formed more osteoclasts than BMCs from vehicle-treated mice, suggesting that osteoclast progenitors were increased. Melphalan also increased osteoclast formation in BMCs and RAW264.7 cells in vitro, which was prevented with the cell stress response (CSR) inhibitor KNK437. Melphalan also increased expression of the osteoclast regulator the microphthalmia-associated transcription factor (MITF), but not nuclear factor of activated T cells 1 (NFATc1). Melphalan increased expression of MITF-dependent cell fusion factors, dendritic cell-specific transmembrane protein (Dc-stamp) and osteoclast-stimulatory transmembrane protein (Oc-stamp) and increased cell fusion. Expression of osteoclast stimulator receptor activator of NFκB ligand (RANKL) was unaffected by melphalan treatment. These data suggest that melphalan stimulates osteoclast formation by increasing osteoclast progenitor recruitment and differentiation in a CSR-dependent manner. Melphalan-induced osteoclast formation is associated with bone loss and reduced endosteal bone surface. As well as affecting bone structure this may contribute to dormant tumor cell activation, which has implications for how melphalan is used to treat patients with MM.
Collapse
Affiliation(s)
- Ryan C Chai
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Michelle M McDonald
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Rachael L Terry
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Nataša Kovačić
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia.,Department of Anatomy, University of Zagreb, School of Medicine, Zagreb, Croatia
| | - Jenny M Down
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia.,Bone Biology Group, Department of Human Metabolism, Medical School, University of Sheffield, Sheffield, United Kingdom
| | - Jessica A Pettitt
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Sindhu T Mohanty
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Shruti Shah
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Gholamreza Haffari
- Faculty of Information Technology, Monash University, Clayton, Australia
| | - Jiake Xu
- School of Pathology and Laboratory Medicine, The University of Western Australia, Nedlands, Australia
| | - Matthew T Gillespie
- Faculty of Medicine and Health Sciences, Monash University, Clayton, Australia
| | - Michael J Rogers
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - John T Price
- College of Health and Biomedicine, Victoria University, St Albans, Australia.,Australian Institute for Musculoskeletal Science (AIMSS), University of Melbourne, Victoria University and Western Health, St. Albans, Australia
| | - Peter I Croucher
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia.,St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Sydney, Australia.,School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, Australia
| |
Collapse
|
13
|
Kocbek S, Cavedon L, Martinez D, Bain C, Manus CM, Haffari G, Zukerman I, Verspoor K. Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources. J Biomed Inform 2016; 64:158-167. [PMID: 27742349 DOI: 10.1016/j.jbi.2016.10.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 08/20/2016] [Accepted: 10/10/2016] [Indexed: 10/20/2022]
Abstract
OBJECTIVE Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance. METHODS Support Vector Machine classifiers are built for eight data source combinations, and evaluated using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records. We use radiology reports as an initial data source and add other sources, such as pathology reports and patient and hospital admission data, in order to assess the research question regarding the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A second set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. We explore the impact of feature selection; analyse the learning curve; examine the effect of restricting admissions to only those containing reports from all data sources; and examine the impact of reducing the sub-sampling. These experiments provide better understanding of how to best apply text classification in the context of imbalanced data of variable completeness. RESULTS Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports. CONCLUSION Overall, linking data sources significantly improved classification performance for all the diseases examined. However, there is no single approach that suits all scenarios; the choice of the most effective combination of data sources depends on the specific disease to be classified.
Collapse
Affiliation(s)
- Simon Kocbek
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, Australia; School of Science, RMIT University, Melbourne, Australia; Department of Computing and Information Systems, University of Melbourne, Melbourne, Australia.
| | | | - David Martinez
- Department of Computing and Information Systems, University of Melbourne, Melbourne, Australia
| | - Christopher Bain
- Mercy Health, Heidelberg, Australia; Faculty of Information Technology, Monash University, Clayton, Australia
| | - Chris Mac Manus
- Health Informatics Department, Alfred Hospital, Melbourne, Australia; Now with OzeScribe, Melbourne, Australia
| | - Gholamreza Haffari
- Faculty of Information Technology, Monash University, Clayton, Australia
| | - Ingrid Zukerman
- Faculty of Information Technology, Monash University, Clayton, Australia
| | - Karin Verspoor
- Department of Computing and Information Systems, University of Melbourne, Melbourne, Australia
| |
Collapse
|
14
|
|
15
|
Anglesio MS, Talhouk A, Kalloger SE, Haffari G, Mackenzie R, Cheung M, Senz J, Chow C, Lau S, Intermaggio M, Ramus SJ, Bois AD, Pfisterer J, McAlpine JN, Kommoss F, Gilks B, Kommoss S, Huntsman DG. Abstract B14: Rapid RNA-based histotyping of ovarian carcinomas. Clin Cancer Res 2013. [DOI: 10.1158/1078-0432.ovca13-b14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Background: Ovarian cancer is a series of distinct diseases typically identified by their histopathological appearance as high-grade serous (HGSC; 70% of cases), low-grade serous (LGSC; 5%), endometrioid (ENOCa, 8%), clear cell (CCC, 12%), and mucinous (MC; 5%) carcinomas. Each type has defining molecular events, gene/protein expression patterns, genetic risk factors, sites of origin, and responses to treatment. Gold standard treatment is surgery followed by platinum-taxane chemotherapy despite mounting evidence suggesting CCCs, MCs, and LGSCs are largely platinum-taxane resistant. If outcomes are to be improved, it is critical to adopt a type specific strategy. Retrospective review studies have suggested histotype may be misdiagnosed or omitted in up to 30% of cases. However, pathological diagnosis of histotypes has been greatly refined in recent years and the use of biomarkers as aides is becoming more widespread. Nonetheless a rapid and fully objective classifier of histotypes will undoubtedly improve diagnostic accuracy, especially in the case of pre-surgical biopsies where small amounts of material present a challenge.
Methods: Over 1000 ovarian carcinoma samples underwent expert gynecopathological review to establish a gold standard diagnosis for the 5 major carcinoma types. RNA was extracted from FFPE tissues and levels of a pre-selected set of >100 genes were quantified using the NanoString GX system. Cohort was split with ~1/3 set aside for independent validation. Several statistical models were tested to generate a prediction algorithm for histological type including PAM, Random Forest, Lasso, Recursive Partitioning, and Discriminant Analysis. Feature selection methods and prediction error were examined using cross-validation in the train /test series prior to validation in the independent set.
Results: Preliminary analysis suggests classification of the 5 major histotypes is possible using NanoString derived RNA expression levels. Accuracy appears to be equivalent to interobserver variation amongst expert gynecopathologist.
Conclusions: The NanoString GX platform provides a stable and reproducible platform on which a robust single sample histological type classifier can be established. Our algorithm combined with the NanoString platform provides a rapid, and cost-effective option that does not require modification to current pathology lab tissue processing protocols. Diagnostic prediction require little material and is applicable to pre- and post- surgical specimens where an objective measure is desired to confirm diagnosis or aide in especially challenging cases.
Citation Format: Michael S. Anglesio, Aline Talhouk, Steve E. Kalloger, Gholamreza Haffari, Robertson Mackenzie, Martin Cheung, Janine Senz, Christine Chow, Sherman Lau, Maria Intermaggio, Susan J. Ramus, Andreas du Bois, Jacobus Pfisterer, Jessica N. McAlpine, Friedrich Kommoss, Blake Gilks, Stefan Kommoss, David G. Huntsman. Rapid RNA-based histotyping of ovarian carcinomas. [abstract]. In: Proceedings of the AACR Special Conference on Advances in Ovarian Cancer Research: From Concept to Clinic; Sep 18-21, 2013; Miami, FL. Philadelphia (PA): AACR; Clin Cancer Res 2013;19(19 Suppl):Abstract nr B14.
Collapse
Affiliation(s)
| | - Aline Talhouk
- 1University of British Columbia, Vancouver, BC, Canada,
| | | | | | | | - Martin Cheung
- 3British Columbia Cancer Agency, Vancouver, BC, Canada,
| | - Janine Senz
- 1University of British Columbia, Vancouver, BC, Canada,
| | | | - Sherman Lau
- 4Vancouver General Hospital, Vancouver, Canada,
| | | | | | | | | | | | | | - Blake Gilks
- 1University of British Columbia, Vancouver, BC, Canada,
| | | | | |
Collapse
|
16
|
Abstract
It has been shown that minimum free-energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble-based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering those structures by Sfold has proven to be more reliable than minimum free energy structure prediction. The main obstacle for ensemble-based approaches is the computational complexity of the partition function and base-pairing probabilities. For instance, the space complexity of the partition function for RNA-RNA interaction is O(n4) and the time complexity is O(n6), which is prohibitively large. Our goal in this article is to present a fast algorithm, based on sparse folding, to calculate an upper bound on the partition function. Our work is based on the recent algorithm of Hazan and Jaakkola (2012). The space complexity of our algorithm is the same as that of sparse folding algorithms, and the time complexity of our algorithm is O(MFE(n)ℓ) for single RNA and O(MFE(m, n)ℓ) for RNA-RNA interaction in practice, in which MFE is the running time of sparse folding and ℓ≤n (ℓ≤n+m) is a sequence-dependent parameter.
Collapse
Affiliation(s)
- Hamidreza Chitsaz
- Department of Computer Science, Wayne State University, Detroit, Michigan 48202, USA.
| | | | | |
Collapse
|
17
|
Zare H, Haffari G, Gupta A, Brinkman RR. Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis. BMC Genomics 2013; 14 Suppl 1:S14. [PMID: 23369194 PMCID: PMC3549810 DOI: 10.1186/1471-2164-14-s1-s14] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
One challenge in applying bioinformatic tools to clinical or biological data is high number of features that might be provided to the learning algorithm without any prior knowledge on which ones should be used. In such applications, the number of features can drastically exceed the number of training instances which is often limited by the number of available samples for the study. The Lasso is one of many regularization methods that have been developed to prevent overfitting and improve prediction performance in high-dimensional settings. In this paper, we propose a novel algorithm for feature selection based on the Lasso and our hypothesis is that defining a scoring scheme that measures the "quality" of each feature can provide a more robust feature selection method. Our approach is to generate several samples from the training data by bootstrapping, determine the best relevance-ordering of the features for each sample, and finally combine these relevance-orderings to select highly relevant features. In addition to the theoretical analysis of our feature scoring scheme, we provided empirical evaluations on six real datasets from different fields to confirm the superiority of our method in exploratory data analysis and prediction performance. For example, we applied FeaLect, our feature scoring algorithm, to a lymphoma dataset, and according to a human expert, our method led to selecting more meaningful features than those commonly used in the clinics. This case study built a basis for discovering interesting new criteria for lymphoma diagnosis. Furthermore, to facilitate the use of our algorithm in other applications, the source code that implements our algorithm was released as FeaLect, a documented R package in CRAN.
Collapse
Affiliation(s)
- Habil Zare
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA.
| | | | | | | |
Collapse
|
18
|
Bashashati A, Haffari G, Ding J, Ha G, Lui K, Rosner J, Huntsman DG, Caldas C, Aparicio SA, Shah SP. DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol 2012; 13:R124. [PMID: 23383675 PMCID: PMC4056374 DOI: 10.1186/gb-2012-13-12-r124] [Citation(s) in RCA: 187] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 11/19/2012] [Accepted: 12/22/2012] [Indexed: 01/27/2023] Open
Abstract
Simultaneous interrogation of tumor genomes and transcriptomes is underway in unprecedented global efforts. Yet, despite the essential need to separate driver mutations modulating gene expression networks from transcriptionally inert passenger mutations, robust computational methods to ascertain the impact of individual mutations on transcriptional networks are underdeveloped. We introduce a novel computational framework, DriverNet, to identify likely driver mutations by virtue of their effect on mRNA expression networks. Application to four cancer datasets reveals the prevalence of rare candidate driver mutations associated with disrupted transcriptional networks and a simultaneous modulation of oncogenic and metabolic networks, induced by copy number co-modification of adjacent oncogenic and metabolic drivers. DriverNet is available on Bioconductor or at http://compbio.bccrc.ca/software/drivernet/.
Collapse
Affiliation(s)
- Ali Bashashati
- Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada
| | - Gholamreza Haffari
- Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada
- Faculty of Information Technology, Monash University, Wellington Road, Clayton, VIC 3800, Australia
| | - Jiarui Ding
- Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada
- Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, BC, V6T 1Z4, Canada
| | - Gavin Ha
- Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada
- Bioinformatics Training Program, University of British Columbia, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Kenneth Lui
- Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada
| | - Jamie Rosner
- Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada
| | - David G Huntsman
- Department of Pathology and Laboratory Medicine, University of British Columbia, 2211 Wesbrook Mall, Vancouver, BC, V6T 2B5, Canada
- Centre for Translational and Applied Genomics, BC Cancer Agency, 600 West 10th Avenue, Vancouver, BC, V5Z 4E6 Canada
| | - Carlos Caldas
- Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
| | - Samuel A Aparicio
- Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, 2211 Wesbrook Mall, Vancouver, BC, V6T 2B5, Canada
| | - Sohrab P Shah
- Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada
- Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, BC, V6T 1Z4, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, 2211 Wesbrook Mall, Vancouver, BC, V6T 2B5, Canada
| |
Collapse
|
19
|
Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, Gräf S, Ha G, Haffari G, Bashashati A, Russell R, McKinney S, Langerød A, Green A, Provenzano E, Wishart G, Pinder S, Watson P, Markowetz F, Murphy L, Ellis I, Purushotham A, Børresen-Dale AL, Brenton JD, Tavaré S, Caldas C, Aparicio S. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486:346-52. [PMID: 22522925 PMCID: PMC3440846 DOI: 10.1038/nature10983] [Citation(s) in RCA: 3887] [Impact Index Per Article: 323.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2011] [Accepted: 02/22/2012] [Indexed: 12/16/2022]
Abstract
The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.
Collapse
Affiliation(s)
- Christina Curtis
- Department of Oncology, University of Cambridge, Hills Road, Cambridge CB2 2XZ, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Zare H, Bashashati A, Kridel R, Aghaeepour N, Haffari G, Connors JM, Gascoyne RD, Gupta A, Brinkman RR, Weng AP. Automated analysis of multidimensional flow cytometry data improves diagnostic accuracy between mantle cell lymphoma and small lymphocytic lymphoma. Am J Clin Pathol 2012; 137:75-85. [PMID: 22180480 DOI: 10.1309/ajcpmmlq67yomgew] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Mantle cell lymphoma (MCL) and small lymphocytic lymphoma (SLL) exhibit similar but distinct immunophenotypic profiles. Many cases can be diagnosed readily by flow cytometry (FCM) alone; however, ambiguous cases are frequently encountered and necessitate additional studies, including immunohistochemical staining for cyclin D1 and fluorescence in situ hybridization for IgH-CCND1 rearrangement. To determine if greater diagnostic accuracy could be achieved from FCM data alone, we developed an unbiased, machine-based algorithm to identify features that best distinguish between the 2 diseases. By applying conventional diagnostic criteria to the flow cytometry data, we were able to assign 28 of 44 (64%) MCL and 48 of 70 (69%) SLL cases correctly. In contrast, we were able to assign all 44 (100%) MCL and 68 of 70 (97%) SLL cases correctly using a novel set of criteria, as identified by our automated approach. The most discriminating feature was the CD20/CD23 mean fluorescence intensity ratio, and we found unexpectedly that inclusion of FMC7 expression in the diagnostic algorithm actually reduced its accuracy. This study demonstrates that computational methods can be used on existing clinical FCM data to improve diagnostic accuracy and suggests similar computational approaches could be used to identify novel prognostic markers and perhaps subdivide existing or define new diagnostic entities.
Collapse
|
21
|
Ding J, Bashashati A, Roth A, Oloumi A, Tse K, Zeng T, Haffari G, Hirst M, Marra MA, Condon A, Aparicio S, Shah SP. Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. ACTA ACUST UNITED AC 2011; 28:167-75. [PMID: 22084253 PMCID: PMC3259434 DOI: 10.1093/bioinformatics/btr629] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Motivation: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge. Results: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth ‘false positive’ predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study. Availability: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca. Contact:saparicio@bccrc.ca Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiarui Ding
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, BC, Canada
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
|