1
|
Kirchoff KE, Wellnitz J, Hochuli JE, Maxfield T, Popov KI, Gomez S, Tropsha A. Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search. Adv Inf Retr 2024; 14609:34-49. [PMID: 38585224 PMCID: PMC10998712 DOI: 10.1007/978-3-031-56060-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding-SmallSA-for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.
Collapse
Affiliation(s)
| | | | | | | | | | - Shawn Gomez
- Department of Pharmacology, UNC Chapel Hill
- Joint Department of Biomedical Engineering at UNC Chapel Hill and NCSU
| | | |
Collapse
|
2
|
Chen KA, Kirchoff KE, Butler LR, Holloway AD, Kapadia MR, Kuzmiak CM, Downs-Canner SM, Spanheimer PM, Gallagher KK, Gomez SM. ASO Visual Abstract: Analysis of Specimen Mammography with Artificial Intelligence to Predict Margin Status. Ann Surg Oncol 2023; 30:7153. [PMID: 37644247 DOI: 10.1245/s10434-023-14225-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Affiliation(s)
- Kevin A Chen
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kathryn E Kirchoff
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Logan R Butler
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alexa D Holloway
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Muneera R Kapadia
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Cherie M Kuzmiak
- Department of Radiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Stephanie M Downs-Canner
- Department of Surgery, Breast Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Phillip M Spanheimer
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kristalyn K Gallagher
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Shawn M Gomez
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
3
|
Chen KA, Kirchoff KE, Butler LR, Holloway AD, Kapadia MR, Kuzmiak CM, Downs-Canner SM, Spanheimer PM, Gallagher KK, Gomez SM. Analysis of Specimen Mammography with Artificial Intelligence to Predict Margin Status. Ann Surg Oncol 2023; 30:7107-7115. [PMID: 37563337 PMCID: PMC10592216 DOI: 10.1245/s10434-023-14083-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 07/17/2023] [Indexed: 08/12/2023]
Abstract
BACKGROUND Intraoperative specimen mammography is a valuable tool in breast cancer surgery, providing immediate assessment of margins for a resected tumor. However, the accuracy of specimen mammography in detecting microscopic margin positivity is low. We sought to develop an artificial intelligence model to predict the pathologic margin status of resected breast tumors using specimen mammography. METHODS A dataset of specimen mammography images matched with pathologic margin status was collected from our institution from 2017 to 2020. The dataset was randomly split into training, validation, and test sets. Specimen mammography models pretrained on radiologic images were developed and compared with models pretrained on nonmedical images. Model performance was assessed using sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC). RESULTS The dataset included 821 images, and 53% had positive margins. For three out of four model architectures tested, models pretrained on radiologic images outperformed nonmedical models. The highest performing model, InceptionV3, showed sensitivity of 84%, specificity of 42%, and AUROC of 0.71. Model performance was better among patients with invasive cancers, less dense breasts, and non-white race. CONCLUSIONS This study developed and internally validated artificial intelligence models that predict pathologic margins status for partial mastectomy from specimen mammograms. The models' accuracy compares favorably with published literature on surgeon and radiologist interpretation of specimen mammography. With further development, these models could more precisely guide the extent of resection, potentially improving cosmesis and reducing reoperations.
Collapse
Affiliation(s)
- Kevin A Chen
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kathryn E Kirchoff
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Logan R Butler
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alexa D Holloway
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Muneera R Kapadia
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Cherie M Kuzmiak
- Department of Radiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Stephanie M Downs-Canner
- Department of Surgery, Breast Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Phillip M Spanheimer
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kristalyn K Gallagher
- Division of Surgical Oncology, Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Shawn M Gomez
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
4
|
Chen KA, Kirchoff KE, Butler LR, Holloway AD, Kapadia MR, Gallagher KK, Gomez SM. Computer Vision Analysis of Specimen Mammography to Predict Margin Status. medRxiv 2023:2023.03.06.23286864. [PMID: 36945565 PMCID: PMC10029028 DOI: 10.1101/2023.03.06.23286864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
Intra-operative specimen mammography is a valuable tool in breast cancer surgery, providing immediate assessment of margins for a resected tumor. However, the accuracy of specimen mammography in detecting microscopic margin positivity is low. We sought to develop a deep learning-based model to predict the pathologic margin status of resected breast tumors using specimen mammography. A dataset of specimen mammography images matched with pathology reports describing margin status was collected. Models pre-trained on radiologic images were developed and compared with models pre-trained on non-medical images. Model performance was assessed using sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC). The dataset included 821 images and 53% had positive margins. For three out of four model architectures tested, models pre-trained on radiologic images outperformed domain-agnostic models. The highest performing model, InceptionV3, showed a sensitivity of 84%, a specificity of 42%, and AUROC of 0.71. These results compare favorably with the published literature on surgeon and radiologist interpretation of specimen mammography. With further development, these models could assist clinicians with identifying positive margins intra-operatively and decrease the rate of positive margins and re-operation in breast-conserving surgery.
Collapse
Affiliation(s)
- Kevin A Chen
- Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Kathryn E Kirchoff
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Logan R Butler
- Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Alexa D Holloway
- Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Muneera R Kapadia
- Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | | | - Shawn M Gomez
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, NC
| |
Collapse
|
5
|
Kirchoff KE, Gomez SM. EMBER: multi-label prediction of kinase-substrate phosphorylation events through deep learning. Bioinformatics 2022; 38:2119-2126. [PMID: 35157015 PMCID: PMC9004653 DOI: 10.1093/bioinformatics/btac083] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 12/09/2021] [Accepted: 02/09/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Kinase-catalyzed phosphorylation of proteins forms the backbone of signal transduction within the cell, enabling the coordination of numerous processes such as the cell cycle, apoptosis, and differentiation. Although on the order of 105 phosphorylation events have been described, we know the specific kinase performing these functions for <5% of cases. The ability to predict which kinases initiate specific individual phosphorylation events has the potential to greatly enhance the design of downstream experimental studies, while simultaneously creating a preliminary map of the broader phosphorylation network that controls cellular signaling. RESULTS We describe Embedding-based multi-label prediction of phosphorylation events (EMBER), a deep learning method that integrates kinase phylogenetic information and motif-dissimilarity information into a multi-label classification model for the prediction of kinase-motif phosphorylation events. Unlike previous deep learning methods that perform single-label classification, we restate the task of kinase-motif phosphorylation prediction as a multi-label problem, allowing us to train a single unified model rather than a separate model for each of the 134 kinase families. We utilize a Siamese neural network to generate novel vector representations, or an embedding, of peptide motif sequences, and we compare our novel embedding to a previously proposed peptide embedding. Our motif vector representations are used, along with one-hot encoded motif sequences, as input to a classification neural network while also leveraging kinase phylogenetic relationships into our model via a kinase phylogeny-weighted loss function. Results suggest that this approach holds significant promise for improving the known map of phosphorylation relationships that underlie kinome signaling. AVAILABILITY AND IMPLEMENTATION The data and code underlying this article are available in a GitHub repository at https://github.com/gomezlab/EMBER. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kathryn E Kirchoff
- Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Shawn M Gomez
- Joint Department of Biomedical Engineering, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- North Carolina State University, Raleigh, NC, USA
- Department of Pharmacology, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
6
|
Shorter JR, Najarian ML, Bell TA, Blanchard M, Ferris MT, Hock P, Kashfeen A, Kirchoff KE, Linnertz CL, Sigmon JS, Miller DR, McMillan L, Pardo-Manuel de Villena F. Whole Genome Sequencing and Progress Toward Full Inbreeding of the Mouse Collaborative Cross Population. G3 (Bethesda) 2019; 9:1303-1311. [PMID: 30858237 PMCID: PMC6505143 DOI: 10.1534/g3.119.400039] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 03/08/2019] [Indexed: 12/20/2022]
Abstract
Two key features of recombinant inbred panels are well-characterized genomes and reproducibility. Here we report on the sequenced genomes of six additional Collaborative Cross (CC) strains and on inbreeding progress of 72 CC strains. We have previously reported on the sequences of 69 CC strains that were publicly available, bringing the total of CC strains with whole genome sequence up to 75. The sequencing of these six CC strains updates the efforts toward inbreeding undertaken by the UNC Systems Genetics Core. The timing reflects our competing mandates to release to the public as many CC strains as possible while achieving an acceptable level of inbreeding. The new six strains have a higher than average founder contribution from non-domesticus strains than the previously released CC strains. Five of the six strains also have high residual heterozygosity (>14%), which may be related to non-domesticus founder contributions. Finally, we report on updated estimates on residual heterozygosity across the entire CC population using a novel, simple and cost effective genotyping platform on three mice from each strain. We observe a reduction in residual heterozygosity across all previously released CC strains. We discuss the optimal use of different genetic resources available for the CC population.
Collapse
Affiliation(s)
| | | | - Timothy A Bell
- Department of Genetics
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27599
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|