1
|
Rando HM, Graim K, Hampikian G, Greene CS. Many direct-to-consumer canine genetic tests can identify the breed of purebred dogs. J Am Vet Med Assoc 2024; 262:1-8. [PMID: 38417257 DOI: 10.2460/javma.23.07.0372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/24/2024] [Indexed: 03/01/2024]
Abstract
OBJECTIVE To compare pedigree documentation and genetic test results to evaluate whether user-provided photographs influence the breed ancestry predictions of direct-to-consumer (DTC) genetic tests for dogs. ANIMALS 12 registered purebred pet dogs representing 12 different breeds. METHODS Each dog owner submitted 6 buccal swabs, 1 to each of 6 DTC genetic testing companies. Experimenters registered each sample per manufacturer instructions. For half of the dogs, the registration included a photograph of the DNA donor. For the other half of the dogs, photographs were swapped between dogs. DNA analysis and breed ancestry prediction were conducted by each company. The effect of condition (ie, matching vs shuffled photograph) was evaluated for each company's breed predictions. As a positive control, a convolutional neural network was also used to predict breed based solely on the photograph. RESULTS Results from 5 of the 6 tests always included the dog's registered breed. One test and the convolutional neural network were unlikely to identify the registered breed and frequently returned results that were more similar to the photograph than the DNA. Additionally, differences in the predictions made across all tests underscored the challenge of identifying breed ancestry, even in purebred dogs. CLINICAL RELEVANCE Veterinarians are likely to encounter patients who have conducted DTC genetic testing and may be asked to explain the results of genetic tests they did not order. This systematic comparison of commercially available tests provides context for interpreting results from consumer-grade DTC genetic testing kits.
Collapse
Affiliation(s)
- Halie M Rando
- 1Department of Biomedical Informatics, Anschutz School of Medicine, University of Colorado, Aurora, CO
- 2Department of Computer Science, Smith College, Northampton, MA
| | - Kiley Graim
- 3Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL
| | - Greg Hampikian
- 4Department of Biological Sciences, College of Arts and Sciences, Boise State University, Boise, ID
| | - Casey S Greene
- 1Department of Biomedical Informatics, Anschutz School of Medicine, University of Colorado, Aurora, CO
| |
Collapse
|
2
|
Davidson NR, Zhang F, Greene CS. BuDDI: BulkDeconvolution withDomainInvariance to predict cell-type-specific perturbations from bulk. bioRxiv 2024:2023.07.20.549951. [PMID: 37503097 PMCID: PMC10370205 DOI: 10.1101/2023.07.20.549951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
While single-cell experiments provide deep cellular resolution within a single sample, some single-cell experiments are inherently more challenging than bulk experiments due to dissociation difficulties, cost, or limited tissue availability. This creates a situation where we have deep cellular profiles of one sample or condition, and bulk profiles across multiple samples and conditions. To bridge this gap, we propose BuDDI (BUlk Deconvolution with Domain Invariance). BuDDI utilizes domain adaptation techniques to effectively integrate available corpora of case-control bulk and reference scRNA-seq observations to infer cell-type-specific perturbation effects. BuDDI achieves this by learning independent latent spaces within a single variational autoencoder (VAE) encompassing at least four sources of variability: 1) cell type proportion, 2) perturbation effect, 3) structured experimental variability, and 4) remaining variability. Since each latent space is encouraged to be independent, we simulate perturbation responses by independently composing each latent space to simulate cell-type-specific perturbation responses. We evaluated BuDDI's performance on simulated and real data with experimental designs of increasing complexity. We first validated that BuDDI could learn domain invariant latent spaces on data with matched samples across each source of variability. Then we validated that BuDDI could accurately predict cell-type-specific perturbation response when no single-cell perturbed profiles were used during training; instead, only bulk samples had both perturbed and non-perturbed observations. Finally, we validated BuDDI on predicting sex-specific differences, an experimental design where it is not possible to have matched samples. In each experiment, BuDDI outperformed all other comparative methods and baselines. As more reference atlases are completed, BuDDI provides a path to combine these resources with bulk-profiled treatment or disease signatures to study perturbations, sex differences, or other factors at single-cell resolution.
Collapse
Affiliation(s)
- Natalie R Davidson
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America · Funded by the Gordon and Betty Moore Foundation (GBMF 4552), NHGRI of the National Institutes of Health (K99HG012945), NCI of the National Institutes of Health (R01CA237170, R01CA243188, R01CA200854)
| | - Fan Zhang
- Department of Medicine Rheumatology, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America; Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America · Funded by the Arthritis National Research Foundation Award, the PhRMA foundation, and the University of Colorado Translational Research Scholars Program Award
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America · Funded by the Gordon and Betty Moore Foundation (GBMF 4552), NCI of the National Institutes of Health (R01CA237170, R01CA243188, R01CA200854)
| |
Collapse
|
3
|
Neff SL, Doing G, Reiter T, Hampton TH, Greene CS, Hogan DA. Pseudomonas aeruginosa transcriptome analysis of metal restriction in ex vivo cystic fibrosis sputum. Microbiol Spectr 2024; 12:e0315723. [PMID: 38385740 DOI: 10.1128/spectrum.03157-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024] Open
Abstract
Chronic Pseudomonas aeruginosa lung infections are a feature of cystic fibrosis (CF) that many patients experience even with the advent of highly effective modulator therapies. Identifying factors that impact P. aeruginosa in the CF lung could yield novel strategies to eradicate infection or otherwise improve outcomes. To complement published P. aeruginosa studies using laboratory models or RNA isolated from sputum, we analyzed transcripts of strain PAO1 after incubation in sputum from different CF donors prior to RNA extraction. We compared PAO1 gene expression in this "spike-in" sputum model to that for P. aeruginosa grown in synthetic cystic fibrosis sputum medium to determine key genes, which are among the most differentially expressed or most highly expressed. Using the key genes, gene sets with correlated expression were determined using the gene expression analysis tool eADAGE. Gene sets were used to analyze the activity of specific pathways in P. aeruginosa grown in sputum from different individuals. Gene sets that we found to be more active in sputum showed similar activation in published data that included P. aeruginosa RNA isolated from sputum relative to corresponding in vitro reference cultures. In the ex vivo samples, P. aeruginosa had increased levels of genes related to zinc and iron acquisition which were suppressed by metal amendment of sputum. We also found a significant correlation between expression of the H1-type VI secretion system and CFTR corrector use by the sputum donor. An ex vivo sputum model or synthetic sputum medium formulation that imposes metal restriction may enhance future CF-related studies.IMPORTANCEIdentifying the gene expression programs used by Pseudomonas aeruginosa to colonize the lungs of people with cystic fibrosis (CF) will illuminate new therapeutic strategies. To capture these transcriptional programs, we cultured the common P. aeruginosa laboratory strain PAO1 in expectorated sputum from CF patient donors. Through bioinformatic analysis, we defined sets of genes that are more transcriptionally active in real CF sputum compared to a synthetic cystic fibrosis sputum medium. Many of the most differentially active gene sets contained genes related to metal acquisition, suggesting that these gene sets play an active role in scavenging for metals in the CF lung environment which may be inadequately represented in some models. Future studies of P. aeruginosa transcript abundance in CF may benefit from the use of an expectorated sputum model or media supplemented with factors that induce metal restriction.
Collapse
Affiliation(s)
- Samuel L Neff
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Georgia Doing
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Taylor Reiter
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Thomas H Hampton
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Deborah A Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| |
Collapse
|
4
|
Crawford J, Chikina M, Greene CS. Optimizer's dilemma: optimization strongly influences model selection in transcriptomic prediction. Bioinform Adv 2024; 4:vbae004. [PMID: 38282973 PMCID: PMC10822580 DOI: 10.1093/bioadv/vbae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 11/09/2023] [Accepted: 01/13/2024] [Indexed: 01/30/2024]
Abstract
Motivation Most models can be fit to data using various optimization approaches. While model choice is frequently reported in machine-learning-based research, optimizers are not often noted. We applied two different implementations of LASSO logistic regression implemented in Python's scikit-learn package, using two different optimization approaches (coordinate descent, implemented in the liblinear library, and stochastic gradient descent, or SGD), to predict mutation status and gene essentiality from gene expression across a variety of pan-cancer driver genes. For varying levels of regularization, we compared performance and model sparsity between optimizers. Results After model selection and tuning, we found that liblinear and SGD tended to perform comparably. liblinear models required more extensive tuning of regularization strength, performing best for high model sparsities (more nonzero coefficients), but did not require selection of a learning rate parameter. SGD models required tuning of the learning rate to perform well, but generally performed more robustly across different model sparsities as regularization strength decreased. Given these tradeoffs, we believe that the choice of optimizers should be clearly reported as a part of the model selection and validation process, to allow readers and reviewers to better understand the context in which results have been generated. Availability and implementation The code used to carry out the analyses in this study is available at https://github.com/greenelab/pancancer-evaluation/tree/master/01_stratified_classification. Performance/regularization strength curves for all genes in the Vogelstein et al. (2013) dataset are available at https://doi.org/10.6084/m9.figshare.22728644.
Collapse
Affiliation(s)
- Jake Crawford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Maria Chikina
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, United States
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, United States
| |
Collapse
|
5
|
Wiley LK, Shortt JA, Roberts ER, Lowery J, Kudron E, Lin M, Mayer D, Wilson M, Brunetti TM, Chavan S, Phang TL, Pozdeyev N, Lesny J, Wicks SJ, Moore ET, Morgenstern JL, Roff AN, Shalowitz EL, Stewart A, Williams C, Edelmann MN, Hull M, Patton JT, Axell L, Ku L, Lee YM, Jirikowic J, Tanaka A, Todd E, White S, Peterson B, Hearst E, Zane R, Greene CS, Mathias R, Coors M, Taylor M, Ghosh D, Kahn MG, Brooks IM, Aquilante CL, Kao D, Rafaels N, Crooks KR, Hess S, Barnes KC, Gignoux CR. Building a vertically integrated genomic learning health system: The biobank at the Colorado Center for Personalized Medicine. Am J Hum Genet 2024; 111:11-23. [PMID: 38181729 PMCID: PMC10806731 DOI: 10.1016/j.ajhg.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 11/30/2023] [Accepted: 12/01/2023] [Indexed: 01/07/2024] Open
Abstract
Precision medicine initiatives across the globe have led to a revolution of repositories linking large-scale genomic data with electronic health records, enabling genomic analyses across the entire phenome. Many of these initiatives focus solely on research insights, leading to limited direct benefit to patients. We describe the biobank at the Colorado Center for Personalized Medicine (CCPM Biobank) that was jointly developed by the University of Colorado Anschutz Medical Campus and UCHealth to serve as a unique, dual-purpose research and clinical resource accelerating personalized medicine. This living resource currently has more than 200,000 participants with ongoing recruitment. We highlight the clinical, laboratory, regulatory, and HIPAA-compliant informatics infrastructure along with our stakeholder engagement, consent, recontact, and participant engagement strategies. We characterize aspects of genetic and geographic diversity unique to the Rocky Mountain region, the primary catchment area for CCPM Biobank participants. We leverage linked health and demographic information of the CCPM Biobank participant population to demonstrate the utility of the CCPM Biobank to replicate complex trait associations in the first 33,674 genotyped individuals across multiple disease domains. Finally, we describe our current efforts toward return of clinical genetic test results, including high-impact pathogenic variants and pharmacogenetic information, and our broader goals as the CCPM Biobank continues to grow. Bringing clinical and research interests together fosters unique clinical and translational questions that can be addressed from the large EHR-linked CCPM Biobank resource within a HIPAA- and CLIA-certified environment.
Collapse
Affiliation(s)
- Laura K Wiley
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jonathan A Shortt
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Emily R Roberts
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jan Lowery
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; University of Colorado Cancer Center, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Community and Behavioral Health, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Elizabeth Kudron
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Meng Lin
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - David Mayer
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Melissa Wilson
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Tonya M Brunetti
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Sameer Chavan
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Tzu L Phang
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Nikita Pozdeyev
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Joseph Lesny
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Stephen J Wicks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Ethan T Moore
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Joshua L Morgenstern
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Alanna N Roff
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Elise L Shalowitz
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Adrian Stewart
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Cole Williams
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Michelle N Edelmann
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Madelyne Hull
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - J Tacker Patton
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Lisen Axell
- CU Cancer Center, Hereditary Cancer Clinic, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Lisa Ku
- CU Cancer Center, Hereditary Cancer Clinic, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Yee Ming Lee
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Clinical Pharmacy, University of Colorado Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | | | - Emily Todd
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; UCHealth, Aurora, CO 80045, USA
| | | | - Brett Peterson
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Richard Zane
- UCHealth, Aurora, CO 80045, USA; University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Casey S Greene
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Rasika Mathias
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Marilyn Coors
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Matthew Taylor
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Division of Cardiology, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USA
| | - Michael G Kahn
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Ian M Brooks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christina L Aquilante
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Pharmaceutical Sciences, University of Colorado Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - David Kao
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Division of Cardiology, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; CARE Innovation Center, UCHealth, Aurora, CO 80045, USA
| | - Nicholas Rafaels
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristy R Crooks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Pathology, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Kathleen C Barnes
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
| |
Collapse
|
6
|
Zietz M, Himmelstein DS, Kloster K, Williams C, Nagle MW, Greene CS. The probability of edge existence due to node degree: a baseline for network-based predictions. Gigascience 2024; 13:giae001. [PMID: 38323677 PMCID: PMC10848215 DOI: 10.1093/gigascience/giae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 09/25/2023] [Accepted: 01/02/2024] [Indexed: 02/08/2024] Open
Abstract
Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections using network permutation to generate features that depend only on degree. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Researchers seeking to predict new or missing edges in biological networks should use our permutation approach to obtain a baseline for performance that may be nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).
Collapse
Affiliation(s)
- Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Related Sciences, Denver, CO 80202, USA
| | - Kyle Kloster
- Carbon, Inc., Redwood City, CA 94063, USA
- Department of Computer Science, North Carolina State University, Raleigh, NC 27606, USA
| | - Christopher Williams
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Michael W Nagle
- Internal Medicine Research Unit, Pfizer Worldwide Research, Development, and Medical, Cambridge, MA 02139, USA
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc., Cambridge, MA 02139, USA
- Human Biology Integration Foundation, Deep Human Biology Learning, Eisai Inc., Cambridge, MA 02140, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
7
|
Davidson NR, Barnard ME, Hippen AA, Campbell A, Johnson CE, Way GP, Dalley BK, Berchuck A, Salas LA, Peres LC, Marks JR, Schildkraut JM, Greene CS, Doherty JA. Molecular subtypes of high-grade serous ovarian cancer across racial groups and gene expression platforms. bioRxiv 2023:2023.11.01.565179. [PMID: 37961178 PMCID: PMC10635053 DOI: 10.1101/2023.11.01.565179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Introduction High-grade serous carcinoma (HGSC) gene expression subtypes are associated with differential survival. We characterized HGSC gene expression in Black individuals and considered whether gene expression differences by race may contribute to poorer HGSC survival among Black versus non-Hispanic White individuals. Methods We included newly generated RNA-Seq data from Black and White individuals, and array-based genotyping data from four existing studies of White and Japanese individuals. We assigned subtypes using K-means clustering. Cluster- and dataset-specific gene expression patterns were summarized by moderated t-scores. We compared cluster-specific gene expression patterns across datasets by calculating the correlation between the summarized vectors of moderated t-scores. Following mapping to The Cancer Genome Atlas (TCGA)-derived HGSC subtypes, we used Cox proportional hazards models to estimate subtype-specific survival by dataset. Results Cluster-specific gene expression was similar across gene expression platforms. Comparing the Black study population to the White and Japanese study populations, the immunoreactive subtype was more common (39% versus 23%-28%) and the differentiated subtype less common (7% versus 22%-31%). Patterns of subtype-specific survival were similar between the Black and White populations with RNA-Seq data; compared to mesenchymal cases, the risk of death was similar for proliferative and differentiated cases and suggestively lower for immunoreactive cases (Black population HR=0.79 [0.55, 1.13], White population HR=0.86 [0.62, 1.19]). Conclusions A single, platform-agnostic pipeline can be used to assign HGSC gene expression subtypes. While the observed prevalence of HGSC subtypes varied by race, subtype-specific survival was similar.
Collapse
Affiliation(s)
- Natalie R. Davidson
- Department of Biomedical Informatics, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Mollie E. Barnard
- Huntsman Cancer Institute and the Department of Population Health Sciences at the Spencer Fox Eccles School of Medicine, University of Utah, Salt Lake City, UT, USA
- Slone Epidemiology Center, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Ariel A. Hippen
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Amy Campbell
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Courtney E. Johnson
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Gregory P. Way
- Department of Biomedical Informatics, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brian K. Dalley
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Andrew Berchuck
- Department of Obstetrics and Gynecology, Duke University, Durham, NC
| | - Lucas A. Salas
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Lauren C. Peres
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Jeffrey R. Marks
- Department of Surgery, Duke University School of Medicine, Durham, NC 27710, USA
| | - Joellen M. Schildkraut
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Casey S. Greene
- Department of Biomedical Informatics, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jennifer A. Doherty
- Huntsman Cancer Institute and the Department of Population Health Sciences at the Spencer Fox Eccles School of Medicine, University of Utah, Salt Lake City, UT, USA
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| |
Collapse
|
8
|
Hippen AA, Omran DK, Weber LM, Jung E, Drapkin R, Doherty JA, Hicks SC, Greene CS. Performance of computational algorithms to deconvolve heterogeneous bulk ovarian tumor tissue depends on experimental factors. Genome Biol 2023; 24:239. [PMID: 37864274 PMCID: PMC10588129 DOI: 10.1186/s13059-023-03077-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 09/29/2023] [Indexed: 10/22/2023] Open
Abstract
BACKGROUND Single-cell gene expression profiling provides unique opportunities to understand tumor heterogeneity and the tumor microenvironment. Because of cost and feasibility, profiling bulk tumors remains the primary population-scale analytical strategy. Many algorithms can deconvolve these tumors using single-cell profiles to infer their composition. While experimental choices do not change the true underlying composition of the tumor, they can affect the measurements produced by the assay. RESULTS We generated a dataset of high-grade serous ovarian tumors with paired expression profiles from using multiple strategies to examine the extent to which experimental factors impact the results of downstream tumor deconvolution methods. We find that pooling samples for single-cell sequencing and subsequent demultiplexing has a minimal effect. We identify dissociation-induced differences that affect cell composition, leading to changes that may compromise the assumptions underlying some deconvolution algorithms. We also observe differences across mRNA enrichment methods that introduce additional discrepancies between the two data types. We also find that experimental factors change cell composition estimates and that the impact differs by method. CONCLUSIONS Previous benchmarks of deconvolution methods have largely ignored experimental factors. We find that methods vary in their robustness to experimental factors. We provide recommendations for methods developers seeking to produce the next generation of deconvolution approaches and for scientists designing experiments using deconvolution to study tumor heterogeneity.
Collapse
Affiliation(s)
- Ariel A Hippen
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Dalia K Omran
- Penn Ovarian Cancer Research Center, Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Lukas M Weber
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Euihye Jung
- Penn Ovarian Cancer Research Center, Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ronny Drapkin
- Penn Ovarian Cancer Research Center, Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
9
|
Abdill RJ, Graham SP, Rubinetti V, Albert FW, Greene CS, Davis S, Blekhman R. Integration of 168,000 samples reveals global patterns of the human gut microbiome. bioRxiv 2023:2023.10.11.560955. [PMID: 37873416 PMCID: PMC10592789 DOI: 10.1101/2023.10.11.560955] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Understanding the factors that shape variation in the human microbiome is a major goal of research in biology. While other genomics fields have used large, pre-compiled compendia to extract systematic insights requiring otherwise impractical sample sizes, there has been no comparable resource for the 16S rRNA sequencing data commonly used to quantify microbiome composition. To help close this gap, we have assembled a set of 168,484 publicly available human gut microbiome samples, processed with a single pipeline and combined into the largest unified microbiome dataset to date. We use this resource, which is freely available at microbiomap.org, to shed light on global variation in the human gut microbiome. We find that Firmicutes, particularly Bacilli and Clostridia, are almost universally present in the human gut. At the same time, the relative abundance of the 65 most common microbial genera differ between at least two world regions. We also show that gut microbiomes in undersampled world regions, such as Central and Southern Asia, differ significantly from the more thoroughly characterized microbiomes of Europe and Northern America. Moreover, humans in these overlooked regions likely harbor hundreds of taxa that have not yet been discovered due to this undersampling, highlighting the need for diversity in microbiome studies. We anticipate that this new compendium can serve the community and enable advanced applied and methodological research.
Collapse
Affiliation(s)
- Richard J. Abdill
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, USA
| | - Samantha P. Graham
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, USA
| | - Vincent Rubinetti
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Frank W. Albert
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, USA
| | - Casey S. Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Sean Davis
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Ran Blekhman
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
10
|
Pividori M, Lu S, Li B, Su C, Johnson ME, Wei WQ, Feng Q, Namjou B, Kiryluk K, Kullo IJ, Luo Y, Sullivan BD, Voight BF, Skarke C, Ritchie MD, Grant SFA, Greene CS. Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms. Nat Commun 2023; 14:5562. [PMID: 37689782 PMCID: PMC10492839 DOI: 10.1038/s41467-023-41057-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 08/18/2023] [Indexed: 09/11/2023] Open
Abstract
Genes act in concert with each other in specific contexts to perform their functions. Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions. It has been shown that this insight is critical for developing new therapies. Transcriptome-wide association studies have helped uncover the role of individual genes in disease-relevant mechanisms. However, modern models of the architecture of complex traits predict that gene-gene interactions play a crucial role in disease origin and progression. Here we introduce PhenoPLIER, a computational approach that maps gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis. This representation is based on modules of genes with similar expression patterns across the same conditions. We observe that diseases are significantly associated with gene modules expressed in relevant cell types, and our approach is accurate in predicting known drug-disease pairs and inferring mechanisms of action. Furthermore, using a CRISPR screen to analyze lipid regulation, we find that functionally important players lack associations but are prioritized in trait-associated modules by PhenoPLIER. By incorporating groups of co-expressed genes, PhenoPLIER can contextualize genetic associations and reveal potential targets missed by single-gene strategies.
Collapse
Affiliation(s)
- Milton Pividori
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Sumei Lu
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Binglan Li
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Chun Su
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Matthew E Johnson
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Wei-Qi Wei
- Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Qiping Feng
- Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Bahram Namjou
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Krzysztof Kiryluk
- Department of Medicine, Division of Nephrology, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, 10032, USA
| | | | - Yuan Luo
- Northwestern University, Chicago, IL, 60611, USA
| | - Blair D Sullivan
- Kahlert School of Computing, University of Utah, Salt Lake City, UT, 84112, USA
| | - Benjamin F Voight
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Carsten Skarke
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Marylyn D Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Struan F A Grant
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Division of Endocrinology and Diabetes, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
11
|
Neff SL, Doing G, Reiter T, Hampton TH, Greene CS, Hogan DA. Analysis of Pseudomonas aeruginosa transcription in an ex vivo cystic fibrosis sputum model identifies metal restriction as a gene expression stimulus. bioRxiv 2023:2023.08.21.554169. [PMID: 37662412 PMCID: PMC10473638 DOI: 10.1101/2023.08.21.554169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Chronic Pseudomonas aeruginosa lung infections are a distinctive feature of cystic fibrosis (CF) pathology, that challenge adults with CF even with the advent of highly effective modulator therapies. Characterizing P. aeruginosa transcription in the CF lung and identifying factors that drive gene expression could yield novel strategies to eradicate infection or otherwise improve outcomes. To complement published P. aeruginosa gene expression studies in laboratory culture models designed to model the CF lung environment, we employed an ex vivo sputum model in which laboratory strain PAO1 was incubated in sputum from different CF donors. As part of the analysis, we compared PAO1 gene expression in this "spike-in" sputum model to that for P. aeruginosa grown in artificial sputum medium (ASM). Analyses focused on genes that were differentially expressed between sputum and ASM and genes that were most highly expressed in sputum. We present a new approach that used sets of genes with correlated expression, identified by the gene expression analysis tool eADAGE, to analyze the differential activity of pathways in P. aeruginosa grown in CF sputum from different individuals. A key characteristic of P. aeruginosa grown in expectorated CF sputum was related to zinc and iron acquisition, but this signal varied by donor sputum. In addition, a significant correlation between P. aeruginosa expression of the H1-type VI secretion system and corrector use by the sputum donor was observed. These methods may be broadly useful in looking for variable signals across clinical samples.
Collapse
Affiliation(s)
- Samuel L. Neff
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Georgia Doing
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Taylor Reiter
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Thomas H. Hampton
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Casey S. Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Deborah A. Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| |
Collapse
|
12
|
Zhang S, Heil BJ, Mao W, Chikina M, Greene CS, Heller EA. MousiPLIER: A Mouse Pathway-Level Information Extractor Model. bioRxiv 2023:2023.07.31.551386. [PMID: 37577575 PMCID: PMC10418102 DOI: 10.1101/2023.07.31.551386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
High throughput gene expression profiling is a powerful approach to generate hypotheses on the underlying causes of biological function and disease. Yet this approach is limited by its ability to infer underlying biological pathways and burden of testing tens of thousands of individual genes. Machine learning models that incorporate prior biological knowledge are necessary to extract meaningful pathways and generate rational hypothesis from the vast amount of gene expression data generated to date. We adopted an unsupervised machine learning method, Pathway-level information extractor (PLIER), to train the first mouse PLIER model on 190,111 mouse brain RNA-sequencing samples, the greatest amount of training data ever used by PLIER. mousiPLER converted gene expression data into a latent variables that align to known pathway or cell maker gene sets, substantially reducing data dimensionality and improving interpretability. To determine the utility of mousiPLIER, we applied it to a mouse brain aging study of microglia and astrocyte transcriptomic profiling. We found a specific set of latent variables that are significantly associated with aging, including one latent variable (LV41) corresponding to striatal signal. We next performed k-means clustering on the training data to identify studies that respond strongly to LV41, finding that the variable is relevant to striatum and aging across the scientific literature. Finally, we built a web server (http://mousiplier.greenelab.com/) for users to easily explore the learned latent variables. Taken together this study provides proof of concept that mousiPLIER can uncover meaningful biological processes in mouse transcriptomic studies.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Weiguang Mao
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Casey S. Greene
- Department of Pharmacology, University of Colorado School of Medicine, Denver, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO 80045, USA
| | - Elizabeth A. Heller
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
13
|
Shapiro JA, Gaonkar KS, Spielman SJ, Savonen CL, Bethell CJ, Jin R, Rathi KS, Zhu Y, Egolf LE, Farrow BK, Miller DP, Yang Y, Koganti T, Noureen N, Koptyra MP, Duong N, Santi M, Kim J, Robins S, Storm PB, Mack SC, Lilly JV, Xie HM, Jain P, Raman P, Rood BR, Lulla RR, Nazarian J, Kraya AA, Vaksman Z, Heath AP, Kline C, Scolaro L, Viaene AN, Huang X, Way GP, Foltz SM, Zhang B, Poetsch AR, Mueller S, Ennis BM, Prados M, Diskin SJ, Zheng S, Guo Y, Kannan S, Waanders AJ, Margol AS, Kim MC, Hanson D, Van Kuren N, Wong J, Kaufman RS, Coleman N, Blackden C, Cole KA, Mason JL, Madsen PJ, Koschmann CJ, Stewart DR, Wafula E, Brown MA, Resnick AC, Greene CS, Rokita JL, Taroni JN. OpenPBTA: The Open Pediatric Brain Tumor Atlas. Cell Genom 2023; 3:100340. [PMID: 37492101 PMCID: PMC10363844 DOI: 10.1016/j.xgen.2023.100340] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 02/28/2023] [Accepted: 05/04/2023] [Indexed: 07/27/2023]
Abstract
Pediatric brain and spinal cancers are collectively the leading disease-related cause of death in children; thus, we urgently need curative therapeutic strategies for these tumors. To accelerate such discoveries, the Children's Brain Tumor Network (CBTN) and Pacific Pediatric Neuro-Oncology Consortium (PNOC) created a systematic process for tumor biobanking, model generation, and sequencing with immediate access to harmonized data. We leverage these data to establish OpenPBTA, an open collaborative project with over 40 scalable analysis modules that genomically characterize 1,074 pediatric brain tumors. Transcriptomic classification reveals universal TP53 dysregulation in mismatch repair-deficient hypermutant high-grade gliomas and TP53 loss as a significant marker for poor overall survival in ependymomas and H3 K28-mutant diffuse midline gliomas. Already being actively applied to other pediatric cancers and PNOC molecular tumor board decision-making, OpenPBTA is an invaluable resource to the pediatric oncology community.
Collapse
Affiliation(s)
- Joshua A. Shapiro
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
| | - Krutika S. Gaonkar
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Stephanie J. Spielman
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Rowan University, Glassboro, NJ 08028, USA
| | - Candace L. Savonen
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
| | - Chante J. Bethell
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
| | - Run Jin
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Komal S. Rathi
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yuankun Zhu
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Laura E. Egolf
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Bailey K. Farrow
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Daniel P. Miller
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yang Yang
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL 60637, USA
| | - Tejaswi Koganti
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Nighat Noureen
- Greehey Children’s Cancer Research Institute, UT Health San Antonio, San Antonio, TX 78229, USA
| | - Mateusz P. Koptyra
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Nhat Duong
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Mariarita Santi
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Jung Kim
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA
| | - Shannon Robins
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Phillip B. Storm
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Stephen C. Mack
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Jena V. Lilly
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Hongbo M. Xie
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Payal Jain
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Pichai Raman
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Brian R. Rood
- Children’s National Research Institute, Washington, DC 20012, USA
- George Washington University School of Medicine and Health Sciences, Washington, DC 20052, USA
| | - Rishi R. Lulla
- Division of Hematology/Oncology, Hasbro Children’s Hospital, Providence, RI 02903, USA
- Department of Pediatrics, The Warren Alpert School of Brown University, Providence, RI 02912, USA
| | - Javad Nazarian
- Children’s National Research Institute, Washington, DC 20012, USA
- George Washington University School of Medicine and Health Sciences, Washington, DC 20052, USA
- Department of Pediatrics, University of Zurich, Zurich, Switzerland
| | - Adam A. Kraya
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Zalman Vaksman
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Allison P. Heath
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Cassie Kline
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Laura Scolaro
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Angela N. Viaene
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Xiaoyan Huang
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Gregory P. Way
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Steven M. Foltz
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Bo Zhang
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Anna R. Poetsch
- Biotechnology Center, Technical University Dresden, Dresden, Germany
- National Center for Tumor Diseases, Dresden, Germany
| | - Sabine Mueller
- Department of Neurology, Neurosurgery and Pediatrics, University of California, San Francisco, San Francisco, CA 94115, USA
| | - Brian M. Ennis
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Michael Prados
- University of California, San Francisco, San Francisco, CA 94115, USA
| | - Sharon J. Diskin
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Siyuan Zheng
- Greehey Children’s Cancer Research Institute, UT Health San Antonio, San Antonio, TX 78229, USA
| | - Yiran Guo
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Shrivats Kannan
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Angela J. Waanders
- Division of Hematology, Oncology, Neuro-Oncology, and Stem Cell Transplant, Ann & Robert H Lurie Children’s Hospital of Chicago, Chicago, IL 60611, USA
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Ashley S. Margol
- Division of Hematology and Oncology, Children’s Hospital of Los Angeles, Los Angeles, CA 90027, USA
- Department of Pediatrics, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
| | - Meen Chul Kim
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Derek Hanson
- Hackensack Meridian School of Medicine, Nutley, NJ 07110, USA
- Hackensack University Medical Center, Hackensack, NJ 07601, USA
| | - Nicholas Van Kuren
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jessica Wong
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Rebecca S. Kaufman
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Noel Coleman
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Christopher Blackden
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kristina A. Cole
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Abramson Family Cancer Research Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jennifer L. Mason
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Peter J. Madsen
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Carl J. Koschmann
- Department of Pediatrics, University of Michigan Health, Ann Arbor, MI 48105, USA
- Pediatric Hematology Oncology, Mott Children’s Hospital, Ann Arbor, MI 48109, USA
| | - Douglas R. Stewart
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA
| | - Eric Wafula
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Miguel A. Brown
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Adam C. Resnick
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Casey S. Greene
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jo Lynne Rokita
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jaclyn N. Taroni
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
| | - Children’s Brain Tumor Network
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Rowan University, Glassboro, NJ 08028, USA
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL 60637, USA
- Greehey Children’s Cancer Research Institute, UT Health San Antonio, San Antonio, TX 78229, USA
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Children’s National Research Institute, Washington, DC 20012, USA
- George Washington University School of Medicine and Health Sciences, Washington, DC 20052, USA
- Division of Hematology/Oncology, Hasbro Children’s Hospital, Providence, RI 02903, USA
- Department of Pediatrics, The Warren Alpert School of Brown University, Providence, RI 02912, USA
- Department of Pediatrics, University of Zurich, Zurich, Switzerland
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Biotechnology Center, Technical University Dresden, Dresden, Germany
- National Center for Tumor Diseases, Dresden, Germany
- Department of Neurology, Neurosurgery and Pediatrics, University of California, San Francisco, San Francisco, CA 94115, USA
- University of California, San Francisco, San Francisco, CA 94115, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Hematology, Oncology, Neuro-Oncology, and Stem Cell Transplant, Ann & Robert H Lurie Children’s Hospital of Chicago, Chicago, IL 60611, USA
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
- Division of Hematology and Oncology, Children’s Hospital of Los Angeles, Los Angeles, CA 90027, USA
- Department of Pediatrics, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
- Hackensack Meridian School of Medicine, Nutley, NJ 07110, USA
- Hackensack University Medical Center, Hackensack, NJ 07601, USA
- Abramson Family Cancer Research Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Michigan Health, Ann Arbor, MI 48105, USA
- Pediatric Hematology Oncology, Mott Children’s Hospital, Ann Arbor, MI 48109, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Pacific Pediatric Neuro-Oncology Consortium
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Rowan University, Glassboro, NJ 08028, USA
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL 60637, USA
- Greehey Children’s Cancer Research Institute, UT Health San Antonio, San Antonio, TX 78229, USA
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Children’s National Research Institute, Washington, DC 20012, USA
- George Washington University School of Medicine and Health Sciences, Washington, DC 20052, USA
- Division of Hematology/Oncology, Hasbro Children’s Hospital, Providence, RI 02903, USA
- Department of Pediatrics, The Warren Alpert School of Brown University, Providence, RI 02912, USA
- Department of Pediatrics, University of Zurich, Zurich, Switzerland
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Biotechnology Center, Technical University Dresden, Dresden, Germany
- National Center for Tumor Diseases, Dresden, Germany
- Department of Neurology, Neurosurgery and Pediatrics, University of California, San Francisco, San Francisco, CA 94115, USA
- University of California, San Francisco, San Francisco, CA 94115, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Hematology, Oncology, Neuro-Oncology, and Stem Cell Transplant, Ann & Robert H Lurie Children’s Hospital of Chicago, Chicago, IL 60611, USA
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
- Division of Hematology and Oncology, Children’s Hospital of Los Angeles, Los Angeles, CA 90027, USA
- Department of Pediatrics, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
- Hackensack Meridian School of Medicine, Nutley, NJ 07110, USA
- Hackensack University Medical Center, Hackensack, NJ 07601, USA
- Abramson Family Cancer Research Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Michigan Health, Ann Arbor, MI 48105, USA
- Pediatric Hematology Oncology, Mott Children’s Hospital, Ann Arbor, MI 48109, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
14
|
Dang MT, Gonzalez MV, Gaonkar KS, Rathi KS, Young P, Arif S, Zhai L, Alam Z, Devalaraja S, To TKJ, Folkert IW, Raman P, Rokita JL, Martinez D, Taroni JN, Shapiro JA, Greene CS, Savonen C, Mafra F, Hakonarson H, Curran T, Haldar M. Macrophages in SHH subgroup medulloblastoma display dynamic heterogeneity that varies with treatment modality. Cell Rep 2023; 42:112600. [PMID: 37235472 PMCID: PMC10592430 DOI: 10.1016/j.celrep.2023.112600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023] Open
|
15
|
Nicholson DN, Alquaddoomi F, Rubinetti V, Greene CS. Changing word meanings in biomedical literature reveal pandemics and new technologies. BioData Min 2023; 16:16. [PMID: 37147665 PMCID: PMC10161184 DOI: 10.1186/s13040-023-00332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/24/2023] [Indexed: 05/07/2023] Open
Abstract
While we often think of words as having a fixed meaning that we use to describe a changing world, words are also dynamic and changing. Scientific research can also be remarkably fast-moving, with new concepts or approaches rapidly gaining mind share. We examined scientific writing, both preprint and pre-publication peer-reviewed text, to identify terms that have changed and examine their use. One particular challenge that we faced was that the shift from closed to open access publishing meant that the size of available corpora changed by over an order of magnitude in the last two decades. We developed an approach to evaluate semantic shift by accounting for both intra- and inter-year variability using multiple integrated models. This analysis revealed thousands of change points in both corpora, including for terms such as 'cas9', 'pandemic', and 'sars'. We found that the consistent change-points between pre-publication peer-reviewed and preprinted text are largely related to the COVID-19 pandemic. We also created a web app for exploration that allows users to investigate individual terms ( https://greenelab.github.io/word-lapse/ ). To our knowledge, our research is the first to examine semantic shift in biomedical preprints and pre-publication peer-reviewed text, and provides a foundation for future work to understand how terms acquire new meanings and how peer review affects this process.
Collapse
Affiliation(s)
- David N Nicholson
- Genomics and Computational Biology Program, University of Pennsylvania, Philadelpia, PA, USA
| | - Faisal Alquaddoomi
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Vincent Rubinetti
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA.
- Center for Health Artificial Intelligence (CHAI), University of Colorado School of Medicine, Aurora, CO, USA.
| |
Collapse
|
16
|
Abstract
In the 21st century, several emergent viruses have posed a global threat. Each pathogen has emphasized the value of rapid and scalable vaccine development programs. The ongoing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has made the importance of such efforts especially clear. New biotechnological advances in vaccinology allow for recent advances that provide only the nucleic acid building blocks of an antigen, eliminating many safety concerns. During the COVID-19 pandemic, these DNA and RNA vaccines have facilitated the development and deployment of vaccines at an unprecedented pace. This success was attributable at least in part to broader shifts in scientific research relative to prior epidemics: the genome of SARS-CoV-2 was available as early as January 2020, facilitating global efforts in the development of DNA and RNA vaccines within 2 weeks of the international community becoming aware of the new viral threat. Additionally, these technologies that were previously only theoretical are not only safe but also highly efficacious. Although historically a slow process, the rapid development of vaccines during the COVID-19 crisis reveals a major shift in vaccine technologies. Here, we provide historical context for the emergence of these paradigm-shifting vaccines. We describe several DNA and RNA vaccines in terms of their efficacy, safety, and approval status. We also discuss patterns in worldwide distribution. The advances made since early 2020 provide an exceptional illustration of how rapidly vaccine development technology has advanced in the last 2 decades in particular and suggest a new era in vaccines against emerging pathogens. IMPORTANCE The SARS-CoV-2 pandemic has caused untold damage globally, presenting unusual demands on but also unique opportunities for vaccine development. The development, production, and distribution of vaccines are imperative to saving lives, preventing severe illness, and reducing the economic and social burdens caused by the COVID-19 pandemic. Although vaccine technologies that provide the DNA or RNA sequence of an antigen had never previously been approved for use in humans, they have played a major role in the management of SARS-CoV-2. In this review, we discuss the history of these vaccines and how they have been applied to SARS-CoV-2. Additionally, given that the evolution of new SARS-CoV-2 variants continues to present a significant challenge in 2022, these vaccines remain an important and evolving tool in the biomedical response to the pandemic.
Collapse
Affiliation(s)
- Halie M. Rando
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado Anschutz School of Medicine, Aurora, Colorado, USA
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, USA
| | - Ronan Lordan
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Likhitha Kolla
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Elizabeth Sell
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Alexandra J. Lee
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Nils Wellhausen
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Amruta Naik
- Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Jeremy P. Kamil
- Department of Microbiology and Immunology, Louisiana State University Health Sciences Center Shreveport, Shreveport, Louisiana, USA
| | - COVID-19 Review Consortium
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado Anschutz School of Medicine, Aurora, Colorado, USA
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
- Department of Microbiology and Immunology, Louisiana State University Health Sciences Center Shreveport, Shreveport, Louisiana, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin, USA
- Morgridge Institute for Research, Madison, Wisconsin, USA
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin, USA
- Morgridge Institute for Research, Madison, Wisconsin, USA
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado Anschutz School of Medicine, Aurora, Colorado, USA
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, USA
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| |
Collapse
|
17
|
Rando HM, Lordan R, Lee AJ, Naik A, Wellhausen N, Sell E, Kolla L, Gitter A, Greene CS. Application of Traditional Vaccine Development Strategies to SARS-CoV-2. mSystems 2023; 8:e0092722. [PMID: 36861991 PMCID: PMC10134813 DOI: 10.1128/msystems.00927-22] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023] Open
Abstract
Over the past 150 years, vaccines have revolutionized the relationship between people and disease. During the COVID-19 pandemic, technologies such as mRNA vaccines have received attention due to their novelty and successes. However, more traditional vaccine development platforms have also yielded important tools in the worldwide fight against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A variety of approaches have been used to develop COVID-19 vaccines that are now authorized for use in countries around the world. In this review, we highlight strategies that focus on the viral capsid and outwards, rather than on the nucleic acids inside. These approaches fall into two broad categories: whole-virus vaccines and subunit vaccines. Whole-virus vaccines use the virus itself, in either an inactivated or an attenuated state. Subunit vaccines contain instead an isolated, immunogenic component of the virus. Here, we highlight vaccine candidates that apply these approaches against SARS-CoV-2 in different ways. In a companion article (H. M. Rando, R. Lordan, L. Kolla, E. Sell, et al., mSystems 8:e00928-22, 2023, https://doi.org/10.1128/mSystems.00928-22), we review the more recent and novel development of nucleic acid-based vaccine technologies. We further consider the role that these COVID-19 vaccine development programs have played in prophylaxis at the global scale. Well-established vaccine technologies have proved especially important to making vaccines accessible in low- and middle-income countries. Vaccine development programs that use established platforms have been undertaken in a much wider range of countries than those using nucleic acid-based technologies, which have been led by wealthy Western countries. Therefore, these vaccine platforms, though less novel from a biotechnological standpoint, have proven to be extremely important to the management of SARS-CoV-2. IMPORTANCE The development, production, and distribution of vaccines is imperative to saving lives, preventing illness, and reducing the economic and social burdens caused by the COVID-19 pandemic. Vaccines that use cutting-edge biotechnology have played an important role in mitigating the effects of SARS-CoV-2. However, more traditional methods of vaccine development that were refined throughout the 20th century have been especially critical to increasing vaccine access worldwide. Effective deployment is necessary to reducing the susceptibility of the world's population, which is especially important in light of emerging variants. In this review, we discuss the safety, immunogenicity, and distribution of vaccines developed using established technologies. In a separate review, we describe the vaccines developed using nucleic acid-based vaccine platforms. From the current literature, it is clear that the well-established vaccine technologies are also highly effective against SARS-CoV-2 and are being used to address the challenges of COVID-19 globally, including in low- and middle-income countries. This worldwide approach is critical for reducing the devastating impact of SARS-CoV-2.
Collapse
Affiliation(s)
- Halie M. Rando
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Ronan Lordan
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, USA
| | - Alexandra J. Lee
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Amruta Naik
- Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Nils Wellhausen
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Elizabeth Sell
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, USA
| | - Likhitha Kolla
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, USA
| | - COVID-19 Review Consortium
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, USA
- Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin, USA
- Morgridge Institute for Research, Madison, Wisconsin, USA
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin, USA
- Morgridge Institute for Research, Madison, Wisconsin, USA
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| |
Collapse
|
18
|
Sanders LM, Scott RT, Yang JH, Qutub AA, Garcia Martin H, Berrios DC, Hastings JJA, Rask J, Mackintosh G, Hoarfrost AL, Chalk S, Kalantari J, Khezeli K, Antonsen EL, Babdor J, Barker R, Baranzini SE, Beheshti A, Delgado-Aparicio GM, Glicksberg BS, Greene CS, Haendel M, Hamid AA, Heller P, Jamieson D, Jarvis KJ, Komarova SV, Komorowski M, Kothiyal P, Mahabal A, Manor U, Mason CE, Matar M, Mias GI, Miller J, Myers JG, Nelson C, Oribello J, Park SM, Parsons-Wingerter P, Prabhu RK, Reynolds RJ, Saravia-Butler A, Saria S, Sawyer A, Singh NK, Snyder M, Soboczenski F, Soman K, Theriot CA, Van Valen D, Venkateswaran K, Warren L, Worthey L, Zitnik M, Costes SV. Biological research and self-driving labs in deep space supported by artificial intelligence. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00618-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
19
|
Scott RT, Sanders LM, Antonsen EL, Hastings JJA, Park SM, Mackintosh G, Reynolds RJ, Hoarfrost AL, Sawyer A, Greene CS, Glicksberg BS, Theriot CA, Berrios DC, Miller J, Babdor J, Barker R, Baranzini SE, Beheshti A, Chalk S, Delgado-Aparicio GM, Haendel M, Hamid AA, Heller P, Jamieson D, Jarvis KJ, Kalantari J, Khezeli K, Komarova SV, Komorowski M, Kothiyal P, Mahabal A, Manor U, Garcia Martin H, Mason CE, Matar M, Mias GI, Myers JG, Nelson C, Oribello J, Parsons-Wingerter P, Prabhu RK, Qutub AA, Rask J, Saravia-Butler A, Saria S, Singh NK, Snyder M, Soboczenski F, Soman K, Van Valen D, Venkateswaran K, Warren L, Worthey L, Yang JH, Zitnik M, Costes SV. Biomonitoring and precision health in deep space supported by artificial intelligence. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00617-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
20
|
Heil BJ, Crawford J, Greene CS. The effect of non-linear signal in classification problems using gene expression. PLoS Comput Biol 2023; 19:e1010984. [PMID: 36972227 PMCID: PMC10079219 DOI: 10.1371/journal.pcbi.1010984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 04/06/2023] [Accepted: 02/28/2023] [Indexed: 03/29/2023] Open
Abstract
Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.
Collapse
Affiliation(s)
- Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, United States of America
| | - Jake Crawford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, United States of America
| | - Casey S. Greene
- Department of Pharmacology, University of Colorado School of Medicine, Colorado, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Colorado, United States of America
| |
Collapse
|
21
|
Foltz SM, Greene CS, Taroni JN. Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously. Commun Biol 2023; 6:222. [PMID: 36841852 PMCID: PMC9968332 DOI: 10.1038/s42003-023-04588-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 02/13/2023] [Indexed: 02/27/2023] Open
Abstract
Large compendia of gene expression data have proven valuable for the discovery of novel biological relationships. Historically, most available RNA assays were run on microarray, while RNA-seq is now the platform of choice for many new experiments. The data structure and distributions between the platforms differ, making it challenging to combine them directly. Here we perform supervised and unsupervised machine learning evaluations to assess which existing normalization methods are best suited for combining microarray and RNA-seq data. We find that quantile and Training Distribution Matching normalization allow for supervised and unsupervised model training on microarray and RNA-seq data simultaneously. Nonparanormal normalization and z-scores are also appropriate for some applications, including pathway analysis with Pathway-Level Information Extractor (PLIER). We demonstrate that it is possible to perform effective cross-platform normalization using existing methods to combine microarray and RNA-seq data for machine learning applications.
Collapse
Affiliation(s)
- Steven M Foltz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Wynnewood, PA, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA.
| | - Jaclyn N Taroni
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Wynnewood, PA, USA.
| |
Collapse
|
22
|
Doing G, Lee AJ, Neff SL, Reiter T, Holt JD, Stanton BA, Greene CS, Hogan DA. Computationally Efficient Assembly of Pseudomonas aeruginosa Gene Expression Compendia. mSystems 2023; 8:e0034122. [PMID: 36541761 PMCID: PMC9948711 DOI: 10.1128/msystems.00341-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 11/09/2022] [Indexed: 12/24/2022] Open
Abstract
Thousands of Pseudomonas aeruginosa RNA sequencing (RNA-seq) gene expression profiles are publicly available via the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). In this work, the transcriptional profiles from hundreds of studies performed by over 75 research groups were reanalyzed in aggregate to create a powerful tool for hypothesis generation and testing. Raw sequence data were uniformly processed using the Salmon pseudoaligner, and this read mapping method was validated by comparison to a direct alignment method. We developed filtering criteria to exclude samples with aberrant levels of housekeeping gene expression or an unexpected number of genes with no reported values and normalized the filtered compendia using the ratio-of-medians method. The filtering and normalization steps greatly improved gene expression correlations for genes within the same operon or regulon across the 2,333 samples. Since the RNA-seq data were generated using diverse strains, we report the effects of mapping samples to noncognate reference genomes by separately analyzing all samples mapped to cDNA reference genomes for strains PAO1 and PA14, two divergent strains that were used to generate most of the samples. Finally, we developed an algorithm to incorporate new data as they are deposited into the SRA. Our processing and quality control methods provide a scalable framework for taking advantage of the troves of biological information hibernating in the depths of microbial gene expression data and yield useful tools for P. aeruginosa RNA-seq data to be leveraged for diverse research goals. IMPORTANCE Pseudomonas aeruginosa is a causative agent of a wide range of infections, including chronic infections associated with cystic fibrosis. These P. aeruginosa infections are difficult to treat and often have negative outcomes. To aid in the study of this problematic pathogen, we mapped, filtered for quality, and normalized thousands of P. aeruginosa RNA-seq gene expression profiles that were publicly available via the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). The resulting compendia facilitate analyses across experiments, strains, and conditions. Ultimately, the workflow that we present could be applied to analyses of other microbial species.
Collapse
Affiliation(s)
- Georgia Doing
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Alexandra J. Lee
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Samuel L. Neff
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Taylor Reiter
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, Colorado, USA
| | - Jacob D. Holt
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Bruce A. Stanton
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Casey S. Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, Colorado, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Deborah A. Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| |
Collapse
|
23
|
Abstract
In this work we investigate how models with advanced natural language processing capabilities can be used to reduce the time-consuming process of writing and revising scholarly manuscripts. To this end, we integrate large language models into the Manubot publishing ecosystem to suggest revisions for scholarly text. We tested our AI-based revision workflow in three case studies of existing manuscripts, including the present one. Our results suggest that these models can capture the concepts in the scholarly text and produce high-quality revisions that improve clarity. Given the amount of time that researchers put into crafting prose, we anticipate that this advance will revolutionize the type of knowledge work performed by academics.
Collapse
Affiliation(s)
- Milton Pividori
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA · Funded by The National Human Genome Research Institute, K99 HG011898; The Eunice Kennedy Shriver National Institute of Child Health and Human Development, R01 HD109765
| | - Casey S. Greene
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA · Funded by The Gordon and Betty Moore Foundation, GBMF4552; The National Human Genome Research Institute, R01 HG010067; The Eunice Kennedy Shriver National Institute of Child Health and Human Development, R01 HD109765
| |
Collapse
|
24
|
Rando HM, Lordan R, Lee AJ, Naik A, Wellhausen N, Sell E, Kolla L, Gitter A, Greene CS. Application of Traditional Vaccine Development Strategies to SARS-CoV-2. ArXiv 2023:2208.08907. [PMID: 36034485 PMCID: PMC9413721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Over the past 150 years, vaccines have revolutionized the relationship between people and disease. During the COVID-19 pandemic, technologies such as mRNA vaccines have received attention due to their novelty and successes. However, more traditional vaccine development platforms have also yielded important tools in the worldwide fight against the SARS-CoV-2 virus. A variety of approaches have been used to develop COVID-19 vaccines that are now authorized for use in countries around the world. In this review, we highlight strategies that focus on the viral capsid and outwards, rather than on the nucleic acids inside. These approaches fall into two broad categories: whole-virus vaccines and subunit vaccines. Whole-virus vaccines use the virus itself, either in an inactivated or attenuated state. Subunit vaccines contain instead an isolated, immunogenic component of the virus. Here, we highlight vaccine candidates that apply these approaches against SARS-CoV-2 in different ways. In a companion manuscript, we review the more recent and novel development of nucleic-acid based vaccine technologies. We further consider the role that these COVID-19 vaccine development programs have played in prophylaxis at the global scale. Well-established vaccine technologies have proved especially important to making vaccines accessible in low- and middle-income countries. Vaccine development programs that use established platforms have been undertaken in a much wider range of countries than those using nucleic-acid-based technologies, which have been led by wealthy Western countries. Therefore, these vaccine platforms, though less novel from a biotechnological standpoint, have proven to be extremely important to the management of SARS-CoV-2.
Collapse
Affiliation(s)
- Halie M Rando
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America; Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America · Funded by the Gordon and Betty Moore Foundation (GBMF 4552); the National Human Genome Research Institute (R01 HG010067)
| | - Ronan Lordan
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-5158, USA; Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| | - Alexandra J Lee
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America · Funded by the Gordon and Betty Moore Foundation (GBMF 4552)
| | - Amruta Naik
- Children's Hospital of Philadelphia, Philadelphia, PA, United States of America
| | - Nils Wellhausen
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Elizabeth Sell
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Likhitha Kolla
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America · Funded by NIH Medical Scientist Training Program T32 GM07170
| | | | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America; Morgridge Institute for Research, Madison, Wisconsin, United States of America · Funded by John W. and Jeanne M. Rowe Center for Research in Virology
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, Pennsylvania, United States of America; Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America; Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America · Funded by the Gordon and Betty Moore Foundation (GBMF 4552); the National Human Genome Research Institute (R01 HG010067)
| |
Collapse
|
25
|
Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how two biomedical entities are related. bioRxiv 2023:2023.01.05.522941. [PMID: 36711546 PMCID: PMC9882000 DOI: 10.1101/2023.01.05.522941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Hetnets, short for "heterogeneous networks", contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search . We provide an open source implementation of these methods in our new Python package named hetmatpy .
Collapse
Affiliation(s)
- Daniel S. Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Related Sciences
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Kyle Kloster
- Carbon, Inc.; Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania
| | - Faisal Alquaddoomi
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Dongbo Hu
- Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
| | - David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
| | - Yun Hao
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia PA, USA
| | | | - Michael W. Nagle
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, Massachusetts, United States of America; Neurogenomics, Translational Sciences, Neurology Business Group, Eisai Inc, Cambridge, Massachusetts, United States of America
| | - Casey S. Greene
- Correspondence possible via GitHub Issues or Casey S. Greene <>
| |
Collapse
|
26
|
Heil BJ, Greene CS. The Field-Dependent Nature of PageRank Values in Citation Networks. bioRxiv 2023:2023.01.05.522943. [PMID: 36711900 PMCID: PMC9881996 DOI: 10.1101/2023.01.05.522943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The value of scientific research can be easier to assess at the collective level than at the level of individual contributions. Several journal-level and article-level metrics aim to measure the importance of journals or individual manuscripts. However, many are citation-based and citation practices vary between fields. To account for these differences, scientists have devised normalization schemes to make metrics more comparable across fields. We use PageRank as an example metric and examine the extent to which field-specific citation norms drive estimated importance differences. In doing so, we recapitulate differences in journal and article PageRanks between fields. We also find that manuscripts shared between fields have different PageRanks depending on which field's citation network the metric is calculated in. We implement a degree-preserving graph shuffling algorithm to generate a null distribution of similar networks and find differences more likely attributed to field-specific preferences than citation norms. Our results suggest that while differences exist between fields' metric distributions, applying metrics in a field-aware manner rather than using normalized global metrics avoids losing important information about article preferences. They also imply that assigning a single importance value to a manuscript may not be a useful construct, as the importance of each manuscript varies by the reader's field.
Collapse
Affiliation(s)
- Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania
| | - Casey S. Greene
- Department of Pharmacology, University of Colorado School of Medicine; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine
| |
Collapse
|
27
|
Zietz M, Himmelstein DS, Kloster K, Williams C, Nagle MW, Greene CS. The probability of edge existence due to node degree: a baseline for network-based predictions. bioRxiv 2023:2023.01.05.522939. [PMID: 36711569 PMCID: PMC9881952 DOI: 10.1101/2023.01.05.522939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degree's predictive performance diminishes when the networks used for training and testing-despite measuring the same biological relationships-were generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).
Collapse
Affiliation(s)
- Michael Zietz
- Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Kyle Kloster
- Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Christopher Williams
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Michael W Nagle
- Internal Medicine Research Unit, Pfizer Worldwide Research, Development, and Medical
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
28
|
Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how biomedical entities are related. Gigascience 2022; 12:giad047. [PMID: 37503959 PMCID: PMC10375517 DOI: 10.1093/gigascience/giad047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 04/14/2023] [Accepted: 06/06/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND Hetnets, short for "heterogeneous networks," contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet, connects 11 types of nodes-including genes, diseases, drugs, pathways, and anatomical structures-with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious about not only how metformin is related to breast cancer but also how a given gene might be involved in insomnia. FINDINGS We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any 2 nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. CONCLUSION We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open-source implementation of these methods in our new Python package named hetmatpy.
Collapse
Affiliation(s)
- Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Related Sciences, Denver, CO 80202, USA
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Kyle Kloster
- Carbon, Inc., Redwood City, CA 94063, USA
- Department of Computer Science, North Carolina State University, Raleigh, NC 27606, USA
| | - Benjamin J Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Faisal Alquaddoomi
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Dongbo Hu
- Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia, PA 19104, USA
| | - David N Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yun Hao
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Blair D Sullivan
- School of Computing, University of Utah, Salt Lake City, UT 84112, USA
| | - Michael W Nagle
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, MA 02139, USA
- Human Biology Integration Foundation, Deep Human Biology Learning, Eisai Inc., Cambridge, MA 02140, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
29
|
Nicholson DN, Himmelstein DS, Greene CS. Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts. BioData Min 2022; 15:26. [PMID: 36258252 PMCID: PMC9578183 DOI: 10.1186/s13040-022-00311-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 09/17/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Knowledge graphs support biomedical research efforts by providing contextual information for biomedical entities, constructing networks, and supporting the interpretation of high-throughput analyses. These databases are populated via manual curation, which is challenging to scale with an exponentially rising publication rate. Data programming is a paradigm that circumvents this arduous manual process by combining databases with simple rules and heuristics written as label functions, which are programs designed to annotate textual data automatically. Unfortunately, writing a useful label function requires substantial error analysis and is a nontrivial task that takes multiple days per function. This bottleneck makes populating a knowledge graph with multiple nodes and edge types practically infeasible. Thus, we sought to accelerate the label function creation process by evaluating how label functions can be re-used across multiple edge types. RESULTS We obtained entity-tagged abstracts and subsetted these entities to only contain compounds, genes, and disease mentions. We extracted sentences containing co-mentions of certain biomedical entities contained in a previously described knowledge graph, Hetionet v1. We trained a baseline model that used database-only label functions and then used a sampling approach to measure how well adding edge-specific or edge-mismatch label function combinations improved over our baseline. Next, we trained a discriminator model to detect sentences that indicated a biomedical relationship and then estimated the number of edge types that could be recalled and added to Hetionet v1. We found that adding edge-mismatch label functions rarely improved relationship extraction, while control edge-specific label functions did. There were two exceptions to this trend, Compound-binds-Gene and Gene-interacts-Gene, which both indicated physical relationships and showed signs of transferability. Across the scenarios tested, discriminative model performance strongly depends on generated annotations. Using the best discriminative model for each edge type, we recalled close to 30% of established edges within Hetionet v1. CONCLUSIONS Our results show that this framework can incorporate novel edges into our source knowledge graph. However, results with label function transfer were mixed. Only label functions describing very similar edge types supported improved performance when transferred. We expect that the continued development of this strategy may provide essential building blocks to populating biomedical knowledge graphs with discoveries, ensuring that these resources include cutting-edge results.
Collapse
Affiliation(s)
- David N. Nicholson
- grid.25879.310000 0004 1936 8972Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA USA
| | - Daniel S. Himmelstein
- grid.25879.310000 0004 1936 8972Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA USA
| | - Casey S. Greene
- grid.430503.10000 0001 0703 675XDepartment of Biomedical Informatics, University of Colorado School of Medicine and Center for Health Artificial Intellegence (CHAI), University of Colorado School of Medicine, Aurora, USA
| |
Collapse
|
30
|
Hippen AA, Crawford J, Gardner JR, Greene CS. wenda_gpu: fast domain adaptation for genomic data. Bioinformatics 2022; 38:5129-5130. [PMID: 36193991 PMCID: PMC9665854 DOI: 10.1093/bioinformatics/btac663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 08/23/2022] [Accepted: 10/03/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Domain adaptation allows for the development of predictive models even in cases with limited sample data. Weighted elastic net domain adaptation specifically leverages features of genomic data to maximize transferability but the method is too computationally demanding to apply to many genome-sized datasets. RESULTS We developed wenda_gpu, which uses GPyTorch to train models on genomic data within hours on a single GPU-enabled machine. We show that wenda_gpu returns comparable results to the original wenda implementation, and that it can be used for improved prediction of cancer mutation status on small sample sizes than regular elastic net. AVAILABILITY AND IMPLEMENTATION wenda_gpu is available on GitHub at https://github.com/greenelab/wenda_gpu/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ariel A Hippen
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jake Crawford
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jacob R Gardner
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | |
Collapse
|
31
|
Lee AJ, Mould DL, Crawford J, Hu D, Powers RK, Doing G, Costello JC, Hogan DA, Greene CS. SOPHIE: Generative Neural Networks Separate Common and Specific Transcriptional Responses. Genomics Proteomics Bioinformatics 2022; 20:912-927. [PMID: 36216026 PMCID: PMC10025681 DOI: 10.1016/j.gpb.2022.09.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 09/09/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022]
Abstract
Genome-wide transcriptome profiling identifies genes that are prone to differential expression (DE) across contexts, as well as genes with changes specific to the experimental manipulation. Distinguishing genes that are specifically changed in a context of interest from common differentially expressed genes (DEGs) allows more efficient prediction of which genes are specific to a given biological process under scrutiny. Currently, common DEGs or pathways can only be identified through the laborious manual curation of experiments, an inordinately time-consuming endeavor. Here we pioneer an approach, Specific cOntext Pattern Highlighting In Expression data (SOPHIE), for distinguishing between common and specific transcriptional patterns using a generative neural network to create a background set of experiments from which a null distribution of gene and pathway changes can be generated. We apply SOPHIE to diverse datasets including those from human, human cancer, and bacterial pathogen Pseudomonas aeruginosa. SOPHIE identifies common DEGs in concordance with previously described, manually and systematically determined common DEGs. Further molecular validation indicates that SOPHIE detects highly specific but low-magnitude biologically relevant transcriptional changes. SOPHIE's measure of specificity can complement log2 fold change values generated from traditional DE analyses. For example, by filtering the set of DEGs, one can identify genes that are specifically relevant to the experimental condition of interest. Consequently, these results can inform future research directions. All scripts used in these analyses are available at https://github.com/greenelab/generic-expression-patterns. Users can access https://github.com/greenelab/sophie to run SOPHIE on their own data.
Collapse
Affiliation(s)
- Alexandra J Lee
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Dallas L Mould
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Jake Crawford
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Dongbo Hu
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Rani K Powers
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Georgia Doing
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - James C Costello
- Department of Pharmacology, University of Colorado School of Medicine, Denver, CO 80045, USA
| | - Deborah A Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA; Center for Health AI, University of Colorado School of Medicine, Denver, CO 80045, USA; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO 80045, USA.
| |
Collapse
|
32
|
Oh S, Geistlinger L, Ramos M, Blankenberg D, van den Beek M, Taroni JN, Carey VJ, Greene CS, Waldron L, Davis S. GenomicSuperSignature facilitates interpretation of RNA-seq experiments through robust, efficient comparison to public databases. Nat Commun 2022; 13:3695. [PMID: 35760813 PMCID: PMC9237024 DOI: 10.1038/s41467-022-31411-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 06/14/2022] [Indexed: 02/04/2023] Open
Abstract
Millions of transcriptomic profiles have been deposited in public archives, yet remain underused for the interpretation of new experiments. We present a method for interpreting new transcriptomic datasets through instant comparison to public datasets without high-performance computing requirements. We apply Principal Component Analysis on 536 studies comprising 44,890 human RNA sequencing profiles and aggregate sufficiently similar loading vectors to form Replicable Axes of Variation (RAV). RAVs are annotated with metadata of originating studies and by gene set enrichment analysis. Functionality to associate new datasets with RAVs, extract interpretable annotations, and provide intuitive visualization are implemented as the GenomicSuperSignature R/Bioconductor package. We demonstrate the efficient and coherent database search, robustness to batch effects and heterogeneous training data, and transfer learning capacity of our method using TCGA and rare diseases datasets. GenomicSuperSignature aids in analyzing new gene expression data in the context of existing databases using minimal computing resources.
Collapse
Affiliation(s)
- Sehyun Oh
- grid.212340.60000000122985718Graduate School of Public Health and Health Policy and Institute for Implementation Sciences in Public Health, City University of New York, New York, NY USA
| | - Ludwig Geistlinger
- grid.38142.3c000000041936754XCenter for Computational Biomedicine, Harvard Medical School, Boston, MA USA
| | - Marcel Ramos
- grid.212340.60000000122985718Graduate School of Public Health and Health Policy and Institute for Implementation Sciences in Public Health, City University of New York, New York, NY USA
| | - Daniel Blankenberg
- grid.239578.20000 0001 0675 4725Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH USA ,grid.67105.350000 0001 2164 3847Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH USA
| | - Marius van den Beek
- grid.29857.310000 0001 2097 4281The Pennsylvania State University, State College, PA USA
| | - Jaclyn N. Taroni
- grid.430722.0Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA USA
| | - Vincent J. Carey
- grid.38142.3c000000041936754XChanning Division of Network Medicine, Mass General Brigham, Harvard Medical School, Boston, MA USA
| | - Casey S. Greene
- grid.241116.10000000107903411Center for Health AI, University of Colorado Anschutz School of Medicine, Denver, CO USA
| | - Levi Waldron
- grid.212340.60000000122985718Graduate School of Public Health and Health Policy and Institute for Implementation Sciences in Public Health, City University of New York, New York, NY USA
| | - Sean Davis
- grid.241116.10000000107903411Center for Health AI, University of Colorado Anschutz School of Medicine, Denver, CO USA
| |
Collapse
|
33
|
Crawford J, Christensen BC, Chikina M, Greene CS. Widespread redundancy in -omics profiles of cancer mutation states. Genome Biol 2022; 23:137. [PMID: 35761387 PMCID: PMC9238138 DOI: 10.1186/s13059-022-02705-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 06/14/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND In studies of cellular function in cancer, researchers are increasingly able to choose from many -omics assays as functional readouts. Choosing the correct readout for a given study can be difficult, and which layer of cellular function is most suitable to capture the relevant signal remains unclear. RESULTS We consider prediction of cancer mutation status (presence or absence) from functional -omics data as a representative problem that presents an opportunity to quantify and compare the ability of different -omics readouts to capture signals of dysregulation in cancer. From the TCGA Pan-Cancer Atlas that contains genetic alteration data, we focus on RNA sequencing, DNA methylation arrays, reverse phase protein arrays (RPPA), microRNA, and somatic mutational signatures as -omics readouts. Across a collection of genes recurrently mutated in cancer, RNA sequencing tends to be the most effective predictor of mutation state. We find that one or more other data types for many of the genes are approximately equally effective predictors. Performance is more variable between mutations than that between data types for the same mutation, and there is little difference between the top data types. We also find that combining data types into a single multi-omics model provides little or no improvement in predictive ability over the best individual data type. CONCLUSIONS Based on our results, for the design of studies focused on the functional outcomes of cancer mutations, there are often multiple -omics types that can serve as effective readouts, although gene expression seems to be a reasonable default option.
Collapse
Affiliation(s)
- Jake Crawford
- grid.25879.310000 0004 1936 8972Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | - Brock C. Christensen
- grid.254880.30000 0001 2179 2404Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Lebanon, NH USA ,grid.254880.30000 0001 2179 2404Department of Molecular and Systems Biology, Geisel School of Medicine, Dartmouth College, Lebanon, NH USA
| | - Maria Chikina
- grid.21925.3d0000 0004 1936 9000Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA USA
| | - Casey S. Greene
- grid.430503.10000 0001 0703 675XDepartment of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO USA ,grid.430503.10000 0001 0703 675XCenter for Health AI, University of Colorado School of Medicine, Aurora, CO USA
| |
Collapse
|
34
|
Hippen AA, Crawford J, Gardner JR, Greene CS. Abstract 1222: Efficient domain adaptation for cancer mutation prediction. Cancer Res 2022. [DOI: 10.1158/1538-7445.am2022-1222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Prediction models have been widely used for many purposes in cancer research, including calling mutation status, identifying cancer subtype, and performing prognostic analysis. A fundamental assumption in supervised machine learning is that the data to be classified is derived from the same distribution as the data used to train the classifier. However, challenges in data acquisition often mean few or no labeled examples are available for the distribution of interest. For instance, sample sizes may be insufficient to train on rare cancer types, and technological limitations can hinder label generation, for instance a lack of simultaneous profiling of gene expression and mutation information in single-cell data. For such situations where labeled target data is limited, the field of domain adaptation and transfer learning has established principled ways to develop predictors for the data of interest (target data) using labeled data from a similar but distinct distribution (source data). One recent method, weighted elastic net domain adaptation (wenda), leverages the complex interactions between features (such as genes) to optimize a model’s predictive power on both source and target datasets. It learns the dependency structure between features and prioritizes those that are similar across distributions. This has previously been shown to significantly improve accuracy on predictions from a mismatched distribution, overcoming the limitations of traditional supervised models. Unfortunately, wenda requires training a Gaussian process model for each feature separately, which is computationally expensive and resists parallelization, making it infeasible for researchers to use at genome-scale. We have developed and implemented a modified form of the underlying algorithm, called wenda_gpu, which allows for fast, efficient model training for genome-scale datasets on a single GPU-enabled computer. Our implementation exploits both quasi-Newtonian parameter optimization and the computational power of GPUs for significant speedups without sacrificing accuracy. Our implementation is able to tackle training tasks on data at the scale of The Cancer Genome Atlas (TCGA), which was infeasible without our technical advances. We demonstrate the use of wenda_gpu on a range of TCGA-scale prediction tasks, making it possible to build accurate, predictive models that generalize to target datasets where supervised models could not be trained due to the lack of labeled data. We also trained models from gene expression data for cross-cancer type mutation prediction, which outperformed a regular elastic net. We anticipate that wenda_gpu will enable researchers to build accurate predictive models in cases where supervised models were previously not possible due to lack of labeled data, including rare cancers.
Citation Format: Ariel A. Hippen, Jake Crawford, Jacob R. Gardner, Casey S. Greene. Efficient domain adaptation for cancer mutation prediction [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1222.
Collapse
|
35
|
Rando HM, Brueffer C, Lordan R, Dattoli AA, Manheim D, Meyer JG, Mundo AI, Perrin D, Mai D, Wellhausen N, Gitter A, Greene CS. Molecular and Serologic Diagnostic Technologies for SARS-CoV-2. ArXiv 2022:arXiv:2204.12598v2. [PMID: 35547240 PMCID: PMC9094103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Revised: 04/28/2022] [Indexed: 01/09/2023]
Abstract
The COVID-19 pandemic has presented many challenges that have spurred biotechnological research to address specific problems. Diagnostics is one area where biotechnology has been critical. Diagnostic tests play a vital role in managing a viral threat by facilitating the detection of infected and/or recovered individuals. From the perspective of what information is provided, these tests fall into two major categories, molecular and serological. Molecular diagnostic techniques assay whether a virus is present in a biological sample, thus making it possible to identify individuals who are currently infected. Additionally, when the immune system is exposed to a virus, it responds by producing antibodies specific to the virus. Serological tests make it possible to identify individuals who have mounted an immune response to a virus of interest and therefore facilitate the identification of individuals who have previously encountered the virus. These two categories of tests provide different perspectives valuable to understanding the spread of SARS-CoV-2. Within these categories, different biotechnological approaches offer specific advantages and disadvantages. Here we review the categories of tests developed for the detection of the SARS-CoV-2 virus or antibodies against SARS-CoV-2 and discuss the role of diagnostics in the COVID-19 pandemic.
Collapse
Affiliation(s)
- Halie M Rando
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America · Funded by the Gordon and Betty Moore Foundation (GBMF 4552); the National Human Genome Research Institute (R01 HG010067)
| | | | - Ronan Lordan
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-5158, USA; Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| | - Anna Ada Dattoli
- Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Systems Pharmacology & Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - David Manheim
- 1DaySooner, Delaware, United States of America; Risk and Health Communication Research Center, School of Public Health, University of Haifa, Haifa, Israel; Technion, Israel Institute of Technology, Haifa, Israel · Funded by Center for Effective Altruism, Long Term Future Fund
| | - Jesse G Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America · Funded by National Institute of General Medical Sciences (R35 GM142502)
| | - Ariel I Mundo
- Department of Biomedical Engineering, University of Arkansas, Fayetteville, Arkansas, USA
| | - Dimitri Perrin
- School of Computer Science, Queensland University of Technology, Brisbane, Australia; Centre for Data Science, Queensland University of Technology, Brisbane, Australia
| | - David Mai
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA; Center for Cellular Immunotherapies, Perelman School of Medicine, and Parker Institute for Cancer Immunotherapy at University of Pennsylvania, Philadelphia, PA, USA
| | - Nils Wellhausen
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America; Morgridge Institute for Research, Madison, Wisconsin, United States of America · Funded by John W. and Jeanne M. Rowe Center for Research in Virology
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, Pennsylvania, United States of America; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America · Funded by the Gordon and Betty Moore Foundation (GBMF 4552); the National Human Genome Research Institute (R01 HG010067)
| |
Collapse
|
36
|
Lee BD, Gitter A, Greene CS, Raschka S, Maguire F, Titus AJ, Kessler MD, Lee AJ, Chevrette MG, Stewart PA, Britto-Borges T, Cofer EM, Yu KH, Carmona JJ, Fertig EJ, Kalinin AA, Signal B, Lengerich BJ, Triche TJ, Boca SM. Ten quick tips for deep learning in biology. PLoS Comput Biol 2022; 18:e1009803. [PMID: 35324884 PMCID: PMC8946751 DOI: 10.1371/journal.pcbi.1009803] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Affiliation(s)
- Benjamin D. Lee
- In-Q-Tel Labs, Arlington, Virginia, United States of America
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Sebastian Raschka
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Finlay Maguire
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Alexander J. Titus
- University of New Hampshire, Manchester, New Hampshire, United States of America
- Bioeconomy.XYZ, Manchester, New Hampshire, United States of America
| | - Michael D. Kessler
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Alexandra J. Lee
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Marc G. Chevrette
- Wisconsin Institute for Discovery and Department of Plant Pathology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Paul Allen Stewart
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Thiago Britto-Borges
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, Heidelberg, Germany
- Department of Internal Medicine III (Cardiology, Angiology, and Pneumology), University Hospital Heidelberg, Heidelberg, Germany
| | - Evan M. Cofer
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Pathology, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Juan Jose Carmona
- Philips Healthcare, Cambridge, Massachusetts, United States of America
| | - Elana J. Fertig
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Biomedical Engineering, Department of Applied Mathematics and Statistics, Convergence Institute, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Alexandr A. Kalinin
- Medical Big Data Group, Shenzhen Research Institute of Big Data, Shenzhen, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Brandon Signal
- School of Medicine, College of Health and Medicine, University of Tasmania, Hobart, Australia
| | - Benjamin J. Lengerich
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Timothy J. Triche
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, United States of America
- Department of Pediatrics, College of Human Medicine, Michigan State University, East Lansing, Michigan, United States of America
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Simina M. Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, District of Columbia, United States of America
- Department of Oncology, Georgetown University Medical Center, Washington, DC, United States of America
- Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, United States of America
- Cancer Prevention and Control Program, Lombardi Comprehensive Cancer Center, Washington, DC, United States of America
| |
Collapse
|
37
|
Nicholson DN, Rubinetti V, Hu D, Thielk M, Hunter LE, Greene CS. Examining linguistic shifts between preprints and publications. PLoS Biol 2022; 20:e3001470. [PMID: 35104289 PMCID: PMC8806061 DOI: 10.1371/journal.pbio.3001470] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 11/05/2021] [Indexed: 11/19/2022] Open
Abstract
Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.
Collapse
Affiliation(s)
- David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Dongbo Hu
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Marvin Thielk
- Elsevier, Philadelphia, Pennsylvania, United States of America
| | - Lawrence E. Hunter
- Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| |
Collapse
|
38
|
Bobak CA, Muse M, Giffin KA, Williamson DA, Greene CS, Moore JH, Wall DP. Human Intrigue: Meta-analysis approaches for big questions with big data while shaking up the peer review process. Pac Symp Biocomput 2022; 27:156-162. [PMID: 34890145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Scientific innovation has long been heralded the collaborative effort of many people, groups, and studies to drive forward research. However, the traditional peer review process relies on reviewers acting in a silo to critically judge research. As research becomes more cross-disciplinary, finding reviewers with appropriate expertise to provide feedback on an entire paper is increasingly difficult. We sought to pilot a crowd peer review process that allowed reviewers to interact with one another in the spirit of collaborative science. We focused this session on manuscripts using meta-analysis, to fully embrace the importance of collaborative and open scientific research in the field of biocomputing. Our pilot study found that researchers enjoy a more collaborative peer review process and felt that the process led to higher quality feedback for submitting authors than traditional review offers.
Collapse
Affiliation(s)
- Carly A Bobak
- Program in Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, USA,
| | | | | | | | | | | | | |
Collapse
|
39
|
Rando HM, Wellhausen N, Ghosh S, Lee AJ, Dattoli AA, Hu F, Byrd JB, Rafizadeh DN, Lordan R, Qi Y, Sun Y, Brueffer C, Field JM, Ben Guebila M, Jadavji NM, Skelly AN, Ramsundar B, Wang J, Goel RR, Park Y, Boca SM, Gitter A, Greene CS. Identification and Development of Therapeutics for COVID-19. mSystems 2021; 6:e0023321. [PMID: 34726496 PMCID: PMC8562484 DOI: 10.1128/msystems.00233-21] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
After emerging in China in late 2019, the novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread worldwide, and as of mid-2021, it remains a significant threat globally. Only a few coronaviruses are known to infect humans, and only two cause infections similar in severity to SARS-CoV-2: Severe acute respiratory syndrome-related coronavirus, a species closely related to SARS-CoV-2 that emerged in 2002, and Middle East respiratory syndrome-related coronavirus, which emerged in 2012. Unlike the current pandemic, previous epidemics were controlled rapidly through public health measures, but the body of research investigating severe acute respiratory syndrome and Middle East respiratory syndrome has proven valuable for identifying approaches to treating and preventing novel coronavirus disease 2019 (COVID-19). Building on this research, the medical and scientific communities have responded rapidly to the COVID-19 crisis and identified many candidate therapeutics. The approaches used to identify candidates fall into four main categories: adaptation of clinical approaches to diseases with related pathologies, adaptation based on virological properties, adaptation based on host response, and data-driven identification (ID) of candidates based on physical properties or on pharmacological compendia. To date, a small number of therapeutics have already been authorized by regulatory agencies such as the Food and Drug Administration (FDA), while most remain under investigation. The scale of the COVID-19 crisis offers a rare opportunity to collect data on the effects of candidate therapeutics. This information provides insight not only into the management of coronavirus diseases but also into the relative success of different approaches to identifying candidate therapeutics against an emerging disease. IMPORTANCE The COVID-19 pandemic is a rapidly evolving crisis. With the worldwide scientific community shifting focus onto the SARS-CoV-2 virus and COVID-19, a large number of possible pharmaceutical approaches for treatment and prevention have been proposed. What was known about each of these potential interventions evolved rapidly throughout 2020 and 2021. This fast-paced area of research provides important insight into how the ongoing pandemic can be managed and also demonstrates the power of interdisciplinary collaboration to rapidly understand a virus and match its characteristics with existing or novel pharmaceuticals. As illustrated by the continued threat of viral epidemics during the current millennium, a rapid and strategic response to emerging viral threats can save lives. In this review, we explore how different modes of identifying candidate therapeutics have borne out during COVID-19.
Collapse
Affiliation(s)
- Halie M. Rando
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Nils Wellhausen
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Soumita Ghosh
- Institute of Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Alexandra J. Lee
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Anna Ada Dattoli
- Department of Systems Pharmacology & Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Fengling Hu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - James Brian Byrd
- University of Michigan School of Medicine, Ann Arbor, Michigan, USA
| | - Diane N. Rafizadeh
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ronan Lordan
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, Virginia, USA
| | - Yuchen Sun
- Department of Computer Science, University of Virginia, Charlottesville, Virginia, USA
| | | | - Jeffrey M. Field
- Department of Systems Pharmacology & Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
| | - Nafisa M. Jadavji
- Biomedical Science, Midwestern University, Glendale, Arizona, USA
- Department of Neuroscience, Carleton University, Ottawa, Ontario, Canada
| | - Ashwin N. Skelly
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Immunology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | | | - Jinhui Wang
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Rishi Raj Goel
- Institute for Immunology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - YoSon Park
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - COVID-19 Review Consortium
BansalVikasBartonJohn P.BocaSimina M.BoerckelJoel D.BruefferChristianByrdJames BrianCaponeStephenDasShiktaDattoliAnna AdaDziakJohn J.FieldJeffrey M.GhoshSoumitaGitterAnthonyGoelRishi RajGreeneCasey S.GuebilaMarouen BenHimmelsteinDaniel S.HuFenglingJadavjiNafisa M.KamilJeremy P.KnyazevSergeyKollaLikhithaLeeAlexandra J.LordanRonanLubianaTiagoLukanTemitayoMacLeanAdam L.MaiDavidMangulSergheiManheimDavidMcGowanLucy D’AgostinoNaikAmrutaParkYoSonPerrinDimitriQiYanjunRafizadehDiane N.RamsundarBharathRandoHalie M.RaySandipanRobsonMichael P.RubinettiVincentSellElizabethShinholsterLamonicaSkellyAshwin N.SunYuchenSunYushaSzetoGregory L.VelazquezRyanWangJinhuiWellhausenNils
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
- Institute of Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Systems Pharmacology & Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- University of Michigan School of Medicine, Ann Arbor, Michigan, USA
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Computer Science, University of Virginia, Charlottesville, Virginia, USA
- Department of Clinical Sciences, Lund University, Lund, Sweden
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
- Biomedical Science, Midwestern University, Glendale, Arizona, USA
- Department of Neuroscience, Carleton University, Ottawa, Ontario, Canada
- Institute for Immunology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- The DeepChem Project
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
- Early Biometrics & Statistical Innovation, Data Science & Artificial Intelligence, R & D, AstraZeneca, Gaithersburg, Maryland, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin, USA
- Morgridge Institute for Research, Madison, Wisconsin, USA
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| | - Simina M. Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
- Early Biometrics & Statistical Innovation, Data Science & Artificial Intelligence, R & D, AstraZeneca, Gaithersburg, Maryland, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin, USA
- Morgridge Institute for Research, Madison, Wisconsin, USA
| | - Casey S. Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
- Department of Systems Pharmacology & Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| |
Collapse
|
40
|
Deer RR, Rock MA, Vasilevsky N, Carmody L, Rando H, Anzalone AJ, Basson MD, Bennett TD, Bergquist T, Boudreau EA, Bramante CT, Byrd JB, Callahan TJ, Chan LE, Chu H, Chute CG, Coleman BD, Davis HE, Gagnier J, Greene CS, Hillegass WB, Kavuluru R, Kimble WD, Koraishy FM, Köhler S, Liang C, Liu F, Liu H, Madhira V, Madlock-Brown CR, Matentzoglu N, Mazzotti DR, McMurry JA, McNair DS, Moffitt RA, Monteith TS, Parker AM, Perry MA, Pfaff E, Reese JT, Saltz J, Schuff RA, Solomonides AE, Solway J, Spratt H, Stein GS, Sule AA, Topaloglu U, Vavougios GD, Wang L, Haendel MA, Robinson PN. Characterizing Long COVID: Deep Phenotype of a Complex Condition. EBioMedicine 2021; 74:103722. [PMID: 34839263 PMCID: PMC8613500 DOI: 10.1016/j.ebiom.2021.103722] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 10/22/2021] [Accepted: 11/15/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 (PASC or "long COVID"), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations. Patient-led studies are of particular importance for understanding the natural history of COVID-19, but integration is hampered because they often use different terms to describe the same symptom or condition. This significant disparity in patient versus clinical characterization motivated the proposed ontological approach to specifying manifestations, which will improve capture and integration of future long COVID studies. METHODS The Human Phenotype Ontology (HPO) is a widely used standard for exchange and analysis of phenotypic abnormalities in human disease but has not yet been applied to the analysis of COVID-19. FUNDING We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to HPO terms. We present layperson synonyms and definitions that can be used to link patient self-report questionnaires to standard medical terminology. Long COVID clinical manifestations are not assessed consistently across studies, and most manifestations have been reported with a wide range of synonyms by different authors. Across at least 10 cohorts, authors reported 31 unique clinical features corresponding to HPO terms; the most commonly reported feature was Fatigue (median 45.1%) and the least commonly reported was Nausea (median 3.9%), but the reported percentages varied widely between studies. INTERPRETATION Translating long COVID manifestations into computable HPO terms will improve analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared/pooled more effectively. Furthermore, mapping lay terminology to HPO will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, thereby improving the stratification, diagnosis, and treatment of long COVID. FUNDING U24TR002306; UL1TR001439; P30AG024832; GBMF4552; R01HG010067; UL1TR002535; K23HL128909; UL1TR002389; K99GM145411.
Collapse
Affiliation(s)
- Rachel R Deer
- University of Texas Medical Branch, Galveston, TX, USA.
| | | | - Nicole Vasilevsky
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Monarch Initiative
| | - Leigh Carmody
- Monarch Initiative; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Halie Rando
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alfred J Anzalone
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE, USA
| | - Marc D Basson
- Department of Surgery, University of North Dakota School of Medicine and Health Sciences
| | - Tellen D Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Eilis A Boudreau
- Department of Neurology; Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239
| | - Carolyn T Bramante
- Departments of Internal Medicine and Pediatrics, University of Minnesota Medical School, Minneapolis, MN 55455
| | - James Brian Byrd
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109
| | - Tiffany J Callahan
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lauren E Chan
- Monarch Initiative; College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA
| | - Haitao Chu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN USA
| | - Christopher G Chute
- Johns Hopkins University, Schools of Medicine, Public Health, and Nursing, Baltimore, MD, USA
| | - Ben D Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| | | | - Joel Gagnier
- Departments of Orthopaedic Surgery & Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Casey S Greene
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - William B Hillegass
- University of Mississippi Medical Center, University of Mississippi Medical Center, Jackson, MS, USA; Departments of Data Science and Medicine
| | | | - Wesley D Kimble
- West Virginia Clinical and Translational Science Institute, West Virginia University, Morgantown, WV, USA
| | | | | | - Chen Liang
- Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, USA
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, MN, USA
| | | | - Charisse R Madlock-Brown
- Department of Diagnostic and Health Sciences, University of Tennessee Health Science Center, 920 Madison Ave. Suite 518N, Memphis TN 38613
| | - Nicolas Matentzoglu
- Monarch Initiative; Semanticly Ltd; European Bioinformatics Institute (EMBL-EBI)
| | - Diego R Mazzotti
- Division of Medical Informatics, Department of Internal Medicine, University of Kansas Medical Center
| | - Julie A McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Monarch Initiative
| | - Douglas S McNair
- Quantitative Sciences, Global Health Div., Gates Foundation, Seattle, WA 98109, USA
| | | | | | - Ann M Parker
- Pulmonary and Critical Care Medicine, Johns Hopkins University, Schools of Medicine, Baltimore, MD, USA
| | - Mallory A Perry
- Children's Hospital of Philadelphia Research Institute, Philadelphia, PA, USA
| | | | - Justin T Reese
- Monarch Initiative; Lawrence Berkeley National Laboratory
| | - Joel Saltz
- Stony Brook University; Biomedical Informatics
| | | | - Anthony E Solomonides
- Outcomes Research Network, Research Institute, NorthShore University HealthSystem, Evanston, IL 60201, USA; Institute for Translational Medicine, University of Chicago, Chicago, IL, USA
| | - Julian Solway
- Institute for Translational Medicine, University of Chicago, Chicago, IL, USA
| | - Heidi Spratt
- University of Texas Medical Branch, Galveston, TX, USA
| | - Gary S Stein
- University of Vermont Larner College of Medicine, Departments of Biochemistry and Surgery, Burlington, Vermont 05405
| | | | | | - George D Vavougios
- Department of Computer Science and Telecommunications, University of Thessaly, Papasiopoulou 2 - 4, P.C.; 131 - Galaneika, Lamia, Greece; Department of Neurology, Athens Naval Hospital 70 Deinokratous Street, P.C. 115 21 Athens, Greece; Department of Respiratory Medicine, Faculty of Medicine, University of Thessaly, Biopolis, P.C. 41500 Larissa, Greece
| | - Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, MN, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Monarch Initiative.
| | - Peter N Robinson
- Monarch Initiative; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA.
| |
Collapse
|
41
|
Rando HM, MacLean AL, Lee AJ, Lordan R, Ray S, Bansal V, Skelly AN, Sell E, Dziak JJ, Shinholster L, D’Agostino McGowan L, Ben Guebila M, Wellhausen N, Knyazev S, Boca SM, Capone S, Qi Y, Park Y, Mai D, Sun Y, Boerckel JD, Brueffer C, Byrd JB, Kamil JP, Wang J, Velazquez R, Szeto GL, Barton JP, Goel RR, Mangul S, Lubiana T, Gitter A, Greene CS. Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure. mSystems 2021; 6:e0009521. [PMID: 34698547 PMCID: PMC8547481 DOI: 10.1128/msystems.00095-21] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2021] [Indexed: 02/06/2023] Open
Abstract
The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease. IMPORTANCE COVID-19 involves a number of organ systems and can present with a wide range of symptoms. From how the virus infects cells to how it spreads between people, the available research suggests that these patterns are very similar to those seen in the closely related viruses SARS-CoV-1 and possibly Middle East respiratory syndrome-related CoV (MERS-CoV). Understanding the pathogenesis of the SARS-CoV-2 virus also contextualizes how the different biological systems affected by COVID-19 connect. Exploring the structure, phylogeny, and pathogenesis of the virus therefore helps to guide interpretation of the broader impacts of the virus on the human body and on human populations. For this reason, an in-depth exploration of viral mechanisms is critical to a robust understanding of SARS-CoV-2 and, potentially, future emergent human CoVs (HCoVs).
Collapse
Affiliation(s)
- Halie M. Rando
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Adam L. MacLean
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Alexandra J. Lee
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ronan Lordan
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Sandipan Ray
- Department of Biotechnology, Indian Institute of Technology Hyderabad, Sangareddy, Telangana, India
| | - Vikas Bansal
- Biomedical Data Science and Machine Learning Group, German Center for Neurodegenerative Diseases, Tübingen, Germany
| | - Ashwin N. Skelly
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Elizabeth Sell
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - John J. Dziak
- Edna Bennett Pierce Prevention Research Center, The Pennsylvania State University, University Park, Pennsylvania, USA
| | | | - Lucy D’Agostino McGowan
- Department of Mathematics and Statistics, Wake Forest University, Winston-Salem, North Carolina, USA
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
| | - Nils Wellhausen
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | | - Simina M. Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Stephen Capone
- St. George’s University School of Medicine, St. George’s, Grenada
| | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, Virginia, USA
| | - YoSon Park
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - David Mai
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yuchen Sun
- Department of Computer Science, University of Virginia, Charlottesville, Virginia, USA
| | - Joel D. Boerckel
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Orthopaedic Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | | - James Brian Byrd
- University of Michigan School of Medicine, Ann Arbor, Michigan, USA
| | - Jeremy P. Kamil
- Department of Microbiology and Immunology, Louisiana State University Health Sciences Center Shreveport, Shreveport, Louisiana, USA
| | - Jinhui Wang
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | | | | - John P. Barton
- Department of Physics and Astronomy, University of California-Riverside, Riverside, California, USA
| | - Rishi Raj Goel
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, California, USA
| | - Tiago Lubiana
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - COVID-19 Review Consortium
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biotechnology, Indian Institute of Technology Hyderabad, Sangareddy, Telangana, India
- Biomedical Data Science and Machine Learning Group, German Center for Neurodegenerative Diseases, Tübingen, Germany
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Edna Bennett Pierce Prevention Research Center, The Pennsylvania State University, University Park, Pennsylvania, USA
- Mercer University, Macon, Georgia, USA
- Department of Mathematics and Statistics, Wake Forest University, Winston-Salem, North Carolina, USA
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
- Georgia State University, Atlanta, Georgia, USA
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
- St. George’s University School of Medicine, St. George’s, Grenada
- Department of Computer Science, University of Virginia, Charlottesville, Virginia, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Orthopaedic Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Clinical Sciences, Lund University, Lund, Sweden
- University of Michigan School of Medicine, Ann Arbor, Michigan, USA
- Department of Microbiology and Immunology, Louisiana State University Health Sciences Center Shreveport, Shreveport, Louisiana, USA
- Azimuth1, McLean, Virginia, USA
- Allen Institute for Immunology, Seattle, Washington, USA
- Department of Physics and Astronomy, University of California-Riverside, Riverside, California, USA
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, California, USA
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA
- Morgridge Institute for Research, Madison, Wisconsin, USA
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA
- Morgridge Institute for Research, Madison, Wisconsin, USA
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| |
Collapse
|
42
|
Abstract
To make machine learning analyses in the life sciences more computationally reproducible, we propose standards based on data, model, and code publication, programming best practices, and workflow automation. By meeting these standards, the community of researchers applying machine learning methods in the life sciences can ensure that their analyses are worthy of trust. this article has been peer reviewed.
Collapse
Affiliation(s)
- Benjamin J Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Florian Markowetz
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Su-In Lee
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Casey S Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, USA.
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA.
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
43
|
Way GP, Greene CS, Carninci P, Carvalho BS, de Hoon M, Finley SD, Gosline SJC, Lȇ Cao KA, Lee JSH, Marchionni L, Robine N, Sindi SS, Theis FJ, Yang JYH, Carpenter AE, Fertig EJ. A field guide to cultivating computational biology. PLoS Biol 2021; 19:e3001419. [PMID: 34618807 PMCID: PMC8525744 DOI: 10.1371/journal.pbio.3001419] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 10/19/2021] [Indexed: 11/18/2022] Open
Abstract
Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.
Collapse
Affiliation(s)
- Gregory P. Way
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Casey S. Greene
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences Yokohama, Kanagawa, Japan
- Human Technopole, Milan, Italy
| | - Benilton S. Carvalho
- Department of Statistics, Institute of Mathematics, Statistics and Scientific Computing, University of Campinas, Campinas, Brazil
| | - Michiel de Hoon
- RIKEN Center for Integrative Medical Sciences Yokohama, Kanagawa, Japan
| | - Stacey D. Finley
- Department of Biomedical Engineering, Quantitative and Computational Biology, and Chemical Engineering & Materials Science, University of Southern California, Los Angeles, California, United States of America
| | - Sara J. C. Gosline
- Pacific Northwest National Laboratory, Seattle, Washington, United States of America
| | - Kim-Anh Lȇ Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| | - Jerry S. H. Lee
- Ellison Institute and Departments of Medicine/Oncology, Chemical Engineering, and Material Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill-Cornell Medicine, New York, New York, United States of America
| | - Nicolas Robine
- Computational Biology Lab, New York Genome Center, New York, New York, United States of America
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California Merced, Merced, California, United States of America
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Center Munich and Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Jean Y. H. Yang
- Charles Perkins Centre and School of Mathematics and Statistics, The University of Sydney, Australia
| | - Anne E. Carpenter
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Elana J. Fertig
- Convergence Institute, Departments of Oncology, Biomedical Engineering, and Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
44
|
Weber LM, Hippen AA, Hickey PF, Berrett KC, Gertz J, Doherty JA, Greene CS, Hicks SC. Genetic demultiplexing of pooled single-cell RNA-sequencing samples in cancer facilitates effective experimental design. Gigascience 2021; 10:giab062. [PMID: 34553212 PMCID: PMC8458035 DOI: 10.1093/gigascience/giab062] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 07/19/2021] [Accepted: 08/26/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Pooling cells from multiple biological samples prior to library preparation within the same single-cell RNA sequencing experiment provides several advantages, including lower library preparation costs and reduced unwanted technological variation, such as batch effects. Computational demultiplexing tools based on natural genetic variation between individuals provide a simple approach to demultiplex samples, which does not require complex additional experimental procedures. However, to our knowledge these tools have not been evaluated in cancer, where somatic variants, which could differ between cells from the same sample, may obscure the signal in natural genetic variation. RESULTS Here, we performed in silico benchmark evaluations by combining raw sequencing reads from multiple single-cell samples in high-grade serous ovarian cancer, which has a high copy number burden, and lung adenocarcinoma, which has a high tumor mutational burden. Our results confirm that genetic demultiplexing tools can be effectively deployed on cancer tissue using a pooled experimental design, although high proportions of ambient RNA from cell debris reduce performance. CONCLUSIONS This strategy provides significant cost savings through pooled library preparation. To facilitate similar analyses at the experimental design phase, we provide freely accessible code and a reproducible Snakemake workflow built around the best-performing tools found in our in silico benchmark evaluations, available at https://github.com/lmweber/snp-dmx-cancer.
Collapse
Affiliation(s)
- Lukas M Weber
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Ariel A Hippen
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Peter F Hickey
- Advanced Technology & Biology Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
| | - Kristofer C Berrett
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, USA
| | - Jason Gertz
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, USA
| | - Jennifer Anne Doherty
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, USA
| | - Casey S Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| |
Collapse
|
45
|
Rando HM, Boca SM, McGowan LD, Himmelstein DS, Robson MP, Rubinetti V, Velazquez R, Greene CS, Gitter A. An Open-Publishing Response to the COVID-19 Infodemic. ArXiv 2021:arXiv:2109.08633v1. [PMID: 34545336 PMCID: PMC8452106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-to-date data from online sources nightly, regenerating some of the manuscript's figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this effort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many efforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis.
Collapse
Affiliation(s)
- Halie M Rando
- University of Colorado School of Medicine, Center for Health AI, Aurora, CO, USA
- University of Colorado School of Medicine, Department of Biochemistry and Molecular Genetics, Aurora, CO, USA
- University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics, Philadelphia, PA, USA
| | - Simina M Boca
- Georgetown University Medical Center, Innovation Center for Biomedical Informatics, Washington, DC, USA
| | | | - Daniel S Himmelstein
- University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics, Philadelphia, PA, USA
- Related Sciences
| | - Michael P Robson
- Villanova University, Department of Computing Sciences, Villanova, PA, USA
| | - Vincent Rubinetti
- University of Colorado School of Medicine, Center for Health AI, Aurora, CO, USA
- University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics, Philadelphia, PA, USA
| | | | - Casey S Greene
- University of Colorado School of Medicine, Center for Health AI, Aurora, CO, USA
- University of Colorado School of Medicine, Department of Biochemistry and Molecular Genetics, Aurora, CO, USA
- University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics, Philadelphia, PA, USA
- Alex's Lemonade Stand Foundation, Childhood Cancer Data Lab, Philadelphia, PA, USA
| | - Anthony Gitter
- University of Wisconsin-Madison, Department of Biostatistics and Medical Informatics, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| |
Collapse
|
46
|
Rando HM, Boca SM, McGowan LD, Himmelstein DS, Robson MP, Rubinetti V, Velazquez R, Greene CS, Gitter A. An Open-Publishing Response to the COVID-19 Infodemic. CEUR Workshop Proc 2021; 2976:29-38. [PMID: 35558551 PMCID: PMC9093051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-to-date data from online sources nightly, regenerating some of the manuscript's figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this effort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many efforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis.
Collapse
Affiliation(s)
- Halie M. Rando
- University of Colorado School of Medicine, Center for Health AI, Aurora, CO, USA, University of Colorado School of Medicine, Department of Biochemistry and Molecular Genetics, Aurora, CO, USA, University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics, Philadelphia, PA, USA
| | - Simina M. Boca
- Georgetown University Medical Center, Innovation Center for Biomedical Informatics, Washington, DC, USA
| | | | - Daniel S. Himmelstein
- University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics, Philadelphia, PA, USA, Related Sciences
| | - Michael P. Robson
- Villanova University, Department of Computing Sciences, Villanova, PA, USA
| | - Vincent Rubinetti
- University of Colorado School of Medicine, Center for Health AI, Aurora, CO, USA, University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics, Philadelphia, PA, USA
| | | | - Casey S. Greene
- University of Colorado School of Medicine, Center for Health AI, Aurora, CO, USA, University of Colorado School of Medicine, Department of Biochemistry and Molecular Genetics, Aurora, CO, USA, University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics, Philadelphia, PA, USA, Alex’s Lemonade Stand Foundation, Childhood Cancer Data Lab, Philadelphia, PA, USA
| | - Anthony Gitter
- University of Wisconsin-Madison, Department of Biostatistics and Medical Informatics, Madison, WI, USA, Morgridge Institute for Research, Madison, WI, USA
| |
Collapse
|
47
|
Cao KAL, Abadi AJ, Davis-Marcisak EF, Hsu L, Arora A, Coullomb A, Deshpande A, Feng Y, Jeganathan P, Loth M, Meng C, Mu W, Pancaldi V, Sankaran K, Righelli D, Singh A, Sodicoff JS, Stein-O'Brien GL, Subramanian A, Welch JD, You Y, Argelaguet R, Carey VJ, Dries R, Greene CS, Holmes S, Love MI, Ritchie ME, Yuan GC, Culhane AC, Fertig E. Author Correction: Community-wide hackathons to identify central themes in single-cell multi-omics. Genome Biol 2021; 22:246. [PMID: 34433496 PMCID: PMC8385897 DOI: 10.1186/s13059-021-02468-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia.
| | - Al J Abadi
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Emily F Davis-Marcisak
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Lauren Hsu
- Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Arshi Arora
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Alexis Coullomb
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
| | - Atul Deshpande
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Yuzhou Feng
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | | | - Melanie Loth
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Chen Meng
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Wancen Mu
- Department of Biostatistics, UNC, Chapel Hill, NC, USA
| | - Vera Pancaldi
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Kris Sankaran
- Department of Statistics, University of Wisconsin, Madison, WI, USA
| | - Dario Righelli
- Department of Statistical Sciences, University of Padova, Padova, PD, Italy
| | - Amrit Singh
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
- PROOF Centre of Excellence, Vancouver, BC, Canada
| | - Joshua S Sodicoff
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Genevieve L Stein-O'Brien
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, USA
| | | | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
| | | | - Vincent J Carey
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Ruben Dries
- Department of Hematology and Oncology, Boston Medical Center, Boston, MA, USA
- Department of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
- Center for Regenerative Medicine (CReM), Boston University, Boston, MA, USA
| | - Casey S Greene
- Center for Health AI and Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Susan Holmes
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Michael I Love
- Department of Biostatistics, UNC, Chapel Hill, NC, USA
- Department of Genetics, UNC, Chapel Hill, NC, USA
| | - Matthew E Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Aedin C Culhane
- Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Elana Fertig
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, MD, USA
| |
Collapse
|
48
|
Lê Cao KA, Abadi AJ, Davis-Marcisak EF, Hsu L, Arora A, Coullomb A, Deshpande A, Feng Y, Jeganathan P, Loth M, Meng C, Mu W, Pancaldi V, Sankaran K, Righelli D, Singh A, Sodicoff JS, Stein-O’Brien GL, Subramanian A, Welch JD, You Y, Argelaguet R, Carey VJ, Dries R, Greene CS, Holmes S, Love MI, Ritchie ME, Yuan GC, Culhane AC, Fertig E. Community-wide hackathons to identify central themes in single-cell multi-omics. Genome Biol 2021; 22:220. [PMID: 34353350 PMCID: PMC8340473 DOI: 10.1186/s13059-021-02433-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Affiliation(s)
- Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Al J. Abadi
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Emily F. Davis-Marcisak
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
| | - Lauren Hsu
- Data Science, Dana-Farber Cancer Institute, Boston, MA USA
- Department of Genetics, UNC, Chapel Hill, NC USA
| | - Arshi Arora
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY USA
| | - Alexis Coullomb
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
| | - Atul Deshpande
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Yuzhou Feng
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | | | - Melanie Loth
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Chen Meng
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Wancen Mu
- Department of Biostatistics, UNC, Chapel Hill, NC USA
| | - Vera Pancaldi
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Kris Sankaran
- Department of Statistics, University of Wisconsin, Madison, WI USA
| | - Dario Righelli
- Department of Statistical Sciences, University of Padova, Padova, PD Italy
| | - Amrit Singh
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC Canada
- PROOF Centre of Excellence, Vancouver, BC Canada
| | - Joshua S. Sodicoff
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI USA
| | - Genevieve L. Stein-O’Brien
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD USA
| | | | - Joshua D. Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI USA
| | - Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
| | | | - Vincent J. Carey
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA
| | - Ruben Dries
- Department of Hematology and Oncology, Boston Medical Center, Boston, MA USA
- Department of Computational Biomedicine, Boston University School of Medicine, Boston, MA USA
- Center for Regenerative Medicine (CReM), Boston University, Boston, MA USA
| | - Casey S. Greene
- Center for Health AI and Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO USA
| | - Susan Holmes
- Department of Statistics, Stanford University, Stanford, CA USA
| | - Michael I. Love
- Department of Biostatistics, UNC, Chapel Hill, NC USA
- Department of Genetics, UNC, Chapel Hill, NC USA
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Aedin C. Culhane
- Data Science, Dana-Farber Cancer Institute, Boston, MA USA
- Biostatistics, Harvard TH Chan School of Public Health, Boston, MA USA
| | - Elana Fertig
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD USA
- Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, MD USA
| |
Collapse
|
49
|
Hippen AA, Falco MM, Weber LM, Erkan EP, Zhang K, Doherty JA, Vähärautio A, Greene CS, Hicks SC. miQC: An adaptive probabilistic framework for quality control of single-cell RNA-sequencing data. PLoS Comput Biol 2021; 17:e1009290. [PMID: 34428202 PMCID: PMC8415599 DOI: 10.1371/journal.pcbi.1009290] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 09/03/2021] [Accepted: 07/20/2021] [Indexed: 12/23/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has made it possible to profile gene expression in tissues at high resolution. An important preprocessing step prior to performing downstream analyses is to identify and remove cells with poor or degraded sample quality using quality control (QC) metrics. Two widely used QC metrics to identify a 'low-quality' cell are (i) if the cell includes a high proportion of reads that map to mitochondrial DNA (mtDNA) encoded genes and (ii) if a small number of genes are detected. Current best practices use these QC metrics independently with either arbitrary, uniform thresholds (e.g. 5%) or biological context-dependent (e.g. species) thresholds, and fail to jointly model these metrics in a data-driven manner. Current practices are often overly stringent and especially untenable on certain types of tissues, such as archived tumor tissues, or tissues associated with mitochondrial function, such as kidney tissue [1]. We propose a data-driven QC metric (miQC) that jointly models both the proportion of reads mapping to mtDNA genes and the number of detected genes with mixture models in a probabilistic framework to predict the low-quality cells in a given dataset. We demonstrate how our QC metric easily adapts to different types of single-cell datasets to remove low-quality cells while preserving high-quality cells that can be used for downstream analyses. Our software package is available at https://bioconductor.org/packages/miQC.
Collapse
Affiliation(s)
- Ariel A. Hippen
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Matias M. Falco
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Lukas M. Weber
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Erdogan Pekcan Erkan
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Kaiyang Zhang
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jennifer Anne Doherty
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, Utah, United States of America
| | - Anna Vähärautio
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Casey S. Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| |
Collapse
|
50
|
Abstract
Coronavirus disease 2019 (COVID-19) has caused global disruption and a significant loss of life. Existing treatments that can be repurposed as prophylactic and therapeutic agents may reduce the pandemic's devastation. Emerging evidence of potential applications in other therapeutic contexts has led to the investigation of dietary supplements and nutraceuticals for COVID-19. Such products include vitamin C, vitamin D, omega 3 polyunsaturated fatty acids, probiotics, and zinc, all of which are currently under clinical investigation. In this review, we critically appraise the evidence surrounding dietary supplements and nutraceuticals for the prophylaxis and treatment of COVID-19. Overall, further study is required before evidence-based recommendations can be formulated, but nutritional status plays a significant role in patient outcomes, and these products may help alleviate deficiencies. For example, evidence indicates that vitamin D deficiency may be associated with a greater incidence of infection and severity of COVID-19, suggesting that vitamin D supplementation may hold prophylactic or therapeutic value. A growing number of scientific organizations are now considering recommending vitamin D supplementation to those at high risk of COVID-19. Because research in vitamin D and other nutraceuticals and supplements is preliminary, here we evaluate the extent to which these nutraceutical and dietary supplements hold potential in the COVID-19 crisis.IMPORTANCE Sales of dietary supplements and nutraceuticals have increased during the pandemic due to their perceived "immune-boosting" effects. However, little is known about the efficacy of these dietary supplements and nutraceuticals against the novel coronavirus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) or the disease that it causes, CoV disease 2019 (COVID-19). This review provides a critical overview of the potential prophylactic and therapeutic value of various dietary supplements and nutraceuticals from the evidence available to date. These include vitamin C, vitamin D, and zinc, which are often perceived by the public as treating respiratory infections or supporting immune health. Consumers need to be aware of misinformation and false promises surrounding some supplements, which may be subject to limited regulation by authorities. However, considerably more research is required to determine whether dietary supplements and nutraceuticals exhibit prophylactic and therapeutic value against SARS-CoV-2 infection and COVID-19. This review provides perspective on which nutraceuticals and supplements are involved in biological processes that are relevant to recovery from or prevention of COVID-19.
Collapse
Affiliation(s)
- Ronan Lordan
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Halie M Rando
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, USA
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| |
Collapse
|