1
|
Zhu J, Wang Y, Chang WY, Malewska A, Napolitano F, Gahan JC, Unni N, Zhao M, Yuan R, Wu F, Yue L, Guo L, Zhao Z, Chen DZ, Hannan R, Zhang S, Xiao G, Mu P, Hanker AB, Strand D, Arteaga CL, Desai N, Wang X, Xie Y, Wang T. Mapping cellular interactions from spatially resolved transcriptomics data. Nat Methods 2024; 21:1830-1842. [PMID: 39227721 DOI: 10.1038/s41592-024-02408-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 08/02/2024] [Indexed: 09/05/2024]
Abstract
Cell-cell communication (CCC) is essential to how life forms and functions. However, accurate, high-throughput mapping of how expression of all genes in one cell affects expression of all genes in another cell is made possible only recently through the introduction of spatially resolved transcriptomics (SRT) technologies, especially those that achieve single-cell resolution. Nevertheless, substantial challenges remain to analyze such highly complex data properly. Here, we introduce a multiple-instance learning framework, Spacia, to detect CCCs from data generated by SRTs, by uniquely exploiting their spatial modality. We highlight Spacia's power to overcome fundamental limitations of popular analytical tools for inference of CCCs, including losing single-cell resolution, limited to ligand-receptor relationships and prior interaction databases, high false positive rates and, most importantly, the lack of consideration of the multiple-sender-to-one-receiver paradigm. We evaluated the fitness of Spacia for three commercialized single-cell resolution SRT technologies: MERSCOPE/Vizgen, CosMx/NanoString and Xenium/10x. Overall, Spacia represents a notable step in advancing quantitative theories of cellular communications.
Collapse
Affiliation(s)
- James Zhu
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Yunguan Wang
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Division of Pediatric Gastroenterology, Hepatology and Nutrition, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Woo Yong Chang
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Alicia Malewska
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Fabiana Napolitano
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jeffrey C Gahan
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Nisha Unni
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Min Zhao
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Rongqing Yuan
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Fangjiang Wu
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Lauren Yue
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Lei Guo
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Zhuo Zhao
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
| | - Raquibul Hannan
- Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Siyuan Zhang
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ping Mu
- Department of Molecular Biology, UT Southwestern Medical Center, Dallas, TX, USA
- Hamon Center for Regenerative Science and Medicine, UT Southwestern Medical Center, Dallas, TX, USA
| | - Ariella B Hanker
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Douglas Strand
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Carlos L Arteaga
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Neil Desai
- Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Xinlei Wang
- Department of Mathematics, University of Texas at Arlington, Arlington, TX, USA.
- Division of Data Science, College of Science, University of Texas at Arlington, Arlington, TX, USA.
| | - Yang Xie
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| | - Tao Wang
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
2
|
Olawade DB, Teke J, Fapohunda O, Weerasinghe K, Usman SO, Ige AO, Clement David-Olawade A. Leveraging artificial intelligence in vaccine development: A narrative review. J Microbiol Methods 2024; 224:106998. [PMID: 39019262 DOI: 10.1016/j.mimet.2024.106998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/12/2024] [Accepted: 07/12/2024] [Indexed: 07/19/2024]
Abstract
Vaccine development stands as a cornerstone of public health efforts, pivotal in curbing infectious diseases and reducing global morbidity and mortality. However, traditional vaccine development methods are often time-consuming, costly, and inefficient. The advent of artificial intelligence (AI) has ushered in a new era in vaccine design, offering unprecedented opportunities to expedite the process. This narrative review explores the role of AI in vaccine development, focusing on antigen selection, epitope prediction, adjuvant identification, and optimization strategies. AI algorithms, including machine learning and deep learning, leverage genomic data, protein structures, and immune system interactions to predict antigenic epitopes, assess immunogenicity, and prioritize antigens for experimentation. Furthermore, AI-driven approaches facilitate the rational design of immunogens and the identification of novel adjuvant candidates with optimal safety and efficacy profiles. Challenges such as data heterogeneity, model interpretability, and regulatory considerations must be addressed to realize the full potential of AI in vaccine development. Integrating emerging technologies, such as single-cell omics and synthetic biology, promises to enhance vaccine design precision and scalability. This review underscores the transformative impact of AI on vaccine development and highlights the need for interdisciplinary collaborations and regulatory harmonization to accelerate the delivery of safe and effective vaccines against infectious diseases.
Collapse
Affiliation(s)
- David B Olawade
- Department of Allied and Public Health, School of Health, Sport and Bioscience, University of East London, London, United Kingdom; Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, United Kingdom.
| | - Jennifer Teke
- Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, United Kingdom; Faculty of Medicine, Health and Social Care, Canterbury Christ Church University, United Kingdom
| | | | - Kusal Weerasinghe
- Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, United Kingdom
| | - Sunday O Usman
- Department of Systems and Industrial Engineering, University of Arizona, USA
| | - Abimbola O Ige
- Department of Chemistry, Faculty of Science, University of Ibadan, Ibadan, Nigeria
| | | |
Collapse
|
3
|
Park S, Kim J, Wang X, Lim J. Variable Selection in Bayesian Multiple Instance Regression using Shotgun Stochastic Search. Comput Stat Data Anal 2024; 196:107954. [PMID: 38646418 PMCID: PMC11027161 DOI: 10.1016/j.csda.2024.107954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
In multiple instance learning (MIL), a bag represents a sample that has a set of instances, each of which is described by a vector of explanatory variables, but the entire bag only has one label/response. Though many methods for MIL have been developed to date, few have paid attention to interpretability of models and results. The proposed Bayesian regression model stands on two levels of hierarchy, which transparently show how explanatory variables explain and instances contribute to bag responses. Moreover, two selection problems are simultaneously addressed; the instance selection to find out the instances in each bag responsible for the bag response, and the variable selection to search for the important covariates. To explore a joint discrete space of indicator variables created for selection of both explanatory variables and instances, the shotgun stochastic search algorithm is modified to fit in the MIL context. Also, the proposed model offers a natural and rigorous way to quantify uncertainty in coefficient estimation and outcome prediction, which many modern MIL applications call for. The simulation study shows the proposed regression model can select variables and instances with high performance (AUC greater than 0.86), thus predicting responses well. The proposed method is applied to the musk data for prediction of binding strengths (labels) between molecules (bags) with different conformations (instances) and target receptors. It outperforms all existing methods, and can identify variables relevant in modeling responses.
Collapse
Affiliation(s)
- Seongoh Park
- School of Mathematics, Statistics and Data Science, Sungshin Women’s University, Seoul, Korea
- Data Science Center, Sungshin Women’s University, Seoul, Korea
| | - Joungyoun Kim
- Department of Artificial Intelligence, University of Seoul, Seoul, Korea
| | - Xinlei Wang
- Center for Data Science Research and Education, College of Science, University of Texas at Arlington, Arlington, TX, USA
- Department of Mathematics, University of Texas at Arlington, Arlington, TX, USA
| | - Johan Lim
- Department of Statistics, Seoul National University, Seoul, 08826, Korea
| |
Collapse
|
4
|
Zhu J, Wang Y, Chang WY, Malewska A, Napolitano F, Gahan JC, Unni N, Zhao M, Yuan R, Wu F, Yue L, Guo L, Zhao Z, Chen DZ, Hannan R, Zhang S, Xiao G, Mu P, Hanker AB, Strand D, Arteaga CL, Desai N, Wang X, Xie Y, Wang T. Mapping Cellular Interactions from Spatially Resolved Transcriptomics Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.18.558298. [PMID: 37781617 PMCID: PMC10541142 DOI: 10.1101/2023.09.18.558298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
Cell-cell communication (CCC) is essential to how life forms and functions. However, accurate, high-throughput mapping of how expression of all genes in one cell affects expression of all genes in another cell is made possible only recently, through the introduction of spatially resolved transcriptomics technologies (SRTs), especially those that achieve single cell resolution. However, significant challenges remain to analyze such highly complex data properly. Here, we introduce a Bayesian multi-instance learning framework, spacia, to detect CCCs from data generated by SRTs, by uniquely exploiting their spatial modality. We highlight spacia's power to overcome fundamental limitations of popular analytical tools for inference of CCCs, including losing single-cell resolution, limited to ligand-receptor relationships and prior interaction databases, high false positive rates, and most importantly the lack of consideration of the multiple-sender-to-one-receiver paradigm. We evaluated the fitness of spacia for all three commercialized single cell resolution ST technologies: MERSCOPE/Vizgen, CosMx/Nanostring, and Xenium/10X. Spacia unveiled how endothelial cells, fibroblasts and B cells in the tumor microenvironment contribute to Epithelial-Mesenchymal Transition and lineage plasticity in prostate cancer cells. We deployed spacia in a set of pan-cancer datasets and showed that B cells also participate in PDL1/PD1 signaling in tumors. We demonstrated that a CD8+ T cell/PDL1 effectiveness signature derived from spacia analyses is associated with patient survival and response to immune checkpoint inhibitor treatments in 3,354 patients. We revealed differential spatial interaction patterns between γδ T cells and liver hepatocytes in healthy and cancerous contexts. Overall, spacia represents a notable step in advancing quantitative theories of cellular communications.
Collapse
Affiliation(s)
- James Zhu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Yunguan Wang
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Division of Pediatric Gastroenterology, Hepatology and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
- Department of Pediatrics, University of Cincinnati, OH, 45221, USA
| | - Woo Yong Chang
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Alicia Malewska
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Fabiana Napolitano
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Jeffrey C. Gahan
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Nisha Unni
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Min Zhao
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Rongqing Yuan
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Fangjiang Wu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Lauren Yue
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Lei Guo
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Zhuo Zhao
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Danny Z. Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Raquibul Hannan
- Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Siyuan Zhang
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Ping Mu
- Department of Molecular Biology, UT Southwestern Medical Center, Dallas, TX, 75390, USA
- Hamon Center for Regenerative Science and Medicine, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Ariella B. Hanker
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Douglas Strand
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Carlos L. Arteaga
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Neil Desai
- Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Xinlei Wang
- Department of Mathematics, University of Texas at Arlington, Arlington, TX, 76019, USA
- Center for Data Science Research and Education, College of Science, University of Texas at Arlington, Arlington, TX, 76019, USA
| | - Yang Xie
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Tao Wang
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| |
Collapse
|
5
|
Han Y, Yang Y, Tian Y, Fattah FJ, von Itzstein MS, Hu Y, Zhang M, Kang X, Yang DM, Liu J, Xue Y, Liang C, Raman I, Zhu C, Xiao O, Dowell JE, Homsi J, Rashdan S, Yang S, Gwin ME, Hsiehchen D, Gloria-McCutchen Y, Pan K, Wu F, Gibbons D, Wang X, Yee C, Huang J, Reuben A, Cheng C, Zhang J, Gerber DE, Wang T. pan-MHC and cross-Species Prediction of T Cell Receptor-Antigen Binding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.01.569599. [PMID: 38105939 PMCID: PMC10723300 DOI: 10.1101/2023.12.01.569599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Profiling the binding of T cell receptors (TCRs) of T cells to antigenic peptides presented by MHC proteins is one of the most important unsolved problems in modern immunology. Experimental methods to probe TCR-antigen interactions are slow, labor-intensive, costly, and yield moderate throughput. To address this problem, we developed pMTnet-omni, an Artificial Intelligence (AI) system based on hybrid protein sequence and structure information, to predict the pairing of TCRs of αβ T cells with peptide-MHC complexes (pMHCs). pMTnet-omni is capable of handling peptides presented by both class I and II pMHCs, and capable of handling both human and mouse TCR-pMHC pairs, through information sharing enabled this hybrid design. pMTnet-omni achieves a high overall Area Under the Curve of Receiver Operator Characteristics (AUROC) of 0.888, which surpasses competing tools by a large margin. We showed that pMTnet-omni can distinguish binding affinity of TCRs with similar sequences. Across a range of datasets from various biological contexts, pMTnet-omni characterized the longitudinal evolution and spatial heterogeneity of TCR-pMHC interactions and their functional impact. We successfully developed a biomarker based on pMTnet-omni for predicting immune-related adverse events of immune checkpoint inhibitor (ICI) treatment in a cohort of 57 ICI-treated patients. pMTnet-omni represents a major advance towards developing a clinically usable AI system for TCR-pMHC pairing prediction that can aid the design and implementation of TCR-based immunotherapeutics.
Collapse
|
6
|
Lang F, Sorn P, Schrörs B, Weber D, Kramer S, Sahin U, Löwer M. Multiple instance learning to predict immune checkpoint blockade efficacy using neoantigen candidates. iScience 2023; 26:108014. [PMID: 37965155 PMCID: PMC10641489 DOI: 10.1016/j.isci.2023.108014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 10/28/2022] [Accepted: 09/18/2023] [Indexed: 11/16/2023] Open
Abstract
Previous studies showed that the neoantigen candidate load is an imperfect predictor of immune checkpoint blockade (ICB) efficacy. Further studies provided evidence that the response to ICB is also affected by the qualitative properties of a few or even single candidates, limiting the predictive power based on candidate quantity alone. Here, we predict ICB efficacy based on neoantigen candidates and their neoantigen features in the context of the mutation type, using Multiple-Instance Learning via Embedded Instance Selection (MILES). Multiple instance learning is a type of supervised machine learning that classifies labeled bags that are formed by a set of unlabeled instances. MILES performed better compared with neoantigen candidate load alone for low-abundant fusion genes in renal cell carcinoma. Our findings suggest that MILES is an appropriate method to predict the efficacy of ICB therapy based on neoantigen candidates without requiring direct T cell response information.
Collapse
Affiliation(s)
- Franziska Lang
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, 55131 Mainz, Germany
| | - Patrick Sorn
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, 55131 Mainz, Germany
| | - Barbara Schrörs
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, 55131 Mainz, Germany
| | - David Weber
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, 55131 Mainz, Germany
| | - Stefan Kramer
- Institute of Computer Science, Johannes Gutenberg University, 55128 Mainz, Germany
| | - Ugur Sahin
- BioNTech SE, 55131 Mainz, Germany
- University Medical Center of the Johannes Gutenberg University, 55131 Mainz, Germany
| | - Martin Löwer
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, 55131 Mainz, Germany
| |
Collapse
|
7
|
Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences. BMC Bioinformatics 2022; 23:469. [PMID: 36348271 PMCID: PMC9644450 DOI: 10.1186/s12859-022-05012-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/26/2022] [Indexed: 11/11/2022] Open
Abstract
Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.
Collapse
|
8
|
Xiong D, Zhang Z, Wang T, Wang X. A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences. Comput Struct Biotechnol J 2021; 19:3255-3268. [PMID: 34141144 PMCID: PMC8192570 DOI: 10.1016/j.csbj.2021.05.038] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 05/12/2021] [Accepted: 05/20/2021] [Indexed: 11/02/2022] Open
Abstract
As a branch of machine learning, multiple instance learning (MIL) learns from a collection of labeled bags, each containing a set of instances. The learning process is weakly supervised due to ambiguous instance labels. Since its emergence, MIL has been applied to solve various problems including content-based image retrieval, object tracking/detection, and computer-aided diagnosis. In biomedical research, the use of MIL has been focused on medical image analysis and molecule activity prediction. We review and apply 16 methods to investigate the applicability of MIL to a novel biomedical application, cancer detection using T-cell receptor (TCR) sequences. This important application can be a viable approach for large-scale cancer screening, as TCRs can be easily profiled from a subject's peripheral blood. We consider two feasible data-generating mechanisms, and for the purpose of performance evaluation, we simulate data under each mechanism, where we vary potentially important factors to mimic realistic situations. We also apply the methods to sequencing data of ten cancer types from The Cancer Genome Atlas, as an early proof of concept for distinguishing tumor patients from healthy individuals via TCR sequencing of peripheral blood. We find that given an appropriate MIL method is used, satisfactory performance with Area Under the Receiver Operating Characteristic Curve above 80% can be achieved for five in the ten cancers. Based on our numerical results, we make suggestions about selection of a proper method and avoidance of any method with poor performance. We further point out directions of future research as well as identify a pressing need of new MIL methodologies for improved performance (for some cancer types) and more explainable outcomes.
Collapse
Affiliation(s)
- Danyi Xiong
- Department of Statistical Science, Southern Methodist University, 3225 Daniel Avenue, Dallas 75275, TX, USA
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas 75390, TX, USA
| | - Ze Zhang
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas 75390, TX, USA
| | - Tao Wang
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas 75390, TX, USA
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, 3225 Daniel Avenue, Dallas 75275, TX, USA
| |
Collapse
|