1
|
Aberasturi DT, Piegorsch WW, Bedrick EJ, Lussier YA. Accounting for extra-binomial variability with differentially expressed genetic pathway data: a collaborative bioinformatic study. Stat (Int Stat Inst) 2023; 12:e518. [PMID: 37885703 PMCID: PMC10601968 DOI: 10.1002/sta4.518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 10/21/2022] [Indexed: 10/28/2023]
Abstract
We describe a collaborative project involving faculty and students in a university bioinformatics/biostatistics center. The project focuses on identification of differentially expressed gene sets ("pathways") in subjects expressing a disease state, medical intervention, or other distinguishable condition. The key feature of the endeavor is the data structure presented to the team: a single cohort of subjects with two samples taken from each subject - one for each of two differing conditions without replication. This particular structure leads to essentially a cohort of 2 × 2 contingency tables, where each table compares the differential gene state with the pathway condition. Recognizing that correlations both within and between pathway responses can disrupt standard 2 × 2 table analytics, we develop methods for analyzing this data structure in the presence of complicated intra-table correlations. These provide some convenient approaches for this problem, using design effect adjustments from sample survey theory and manipulations of the summary 2 × 2 table counts. Monte Carlo simulations show that the methods operate extremely well, validating their use in practice. In the end, the collaborative connections among the team members led to solutions no one of us would have envisioned separately.
Collapse
Affiliation(s)
- Dillon T Aberasturi
- Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, USA
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
| | - Walter W Piegorsch
- Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, USA
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
- Department of Statistics, School of Public Health, University of Arizona, Tucson, AZ, USA
| | - Edward J Bedrick
- Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, USA
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
- Department of Statistics, School of Public Health, University of Arizona, Tucson, AZ, USA
- Department of Medicine, School of Medicine, University of Arizona, Tucson, AZ, USA
| | - Yves A Lussier
- Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, USA
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
- Department of Medicine, School of Medicine, University of Arizona, Tucson, AZ, USA
- Arizona Comprehensive Cancer Center, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
2
|
Aberasturi D, Pouladi N, Zaim SR, Kenost C, Berghout J, Piegorsch WW, Lussier YA. 'Single-subject studies'-derived analyses unveil altered biomechanisms between very small cohorts: implications for rare diseases. Bioinformatics 2021; 37:i67-i75. [PMID: 34252934 PMCID: PMC8336591 DOI: 10.1093/bioinformatics/btab290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/26/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Identifying altered transcripts between very small human cohorts is particularly challenging and is compounded by the low accrual rate of human subjects in rare diseases or sub-stratified common disorders. Yet, single-subject studies (S3) can compare paired transcriptome samples drawn from the same patient under two conditions (e.g. treated versus pre-treatment) and suggest patient-specific responsive biomechanisms based on the overrepresentation of functionally defined gene sets. These improve statistical power by: (i) reducing the total features tested and (ii) relaxing the requirement of within-cohort uniformity at the transcript level. We propose Inter-N-of-1, a novel method, to identify meaningful differences between very small cohorts by using the effect size of 'single-subject-study'-derived responsive biological mechanisms. RESULTS In each subject, Inter-N-of-1 requires applying previously published S3-type N-of-1-pathways MixEnrich to two paired samples (e.g. diseased versus unaffected tissues) for determining patient-specific enriched genes sets: Odds Ratios (S3-OR) and S3-variance using Gene Ontology Biological Processes. To evaluate small cohorts, we calculated the precision and recall of Inter-N-of-1 and that of a control method (GLM+EGS) when comparing two cohorts of decreasing sizes (from 20 versus 20 to 2 versus 2) in a comprehensive six-parameter simulation and in a proof-of-concept clinical dataset. In simulations, the Inter-N-of-1 median precision and recall are > 90% and >75% in cohorts of 3 versus 3 distinct subjects (regardless of the parameter values), whereas conventional methods outperform Inter-N-of-1 at sample sizes 9 versus 9 and larger. Similar results were obtained in the clinical proof-of-concept dataset. AVAILABILITY AND IMPLEMENTATION R software is available at Lussierlab.net/BSSD.
Collapse
Affiliation(s)
- Dillon Aberasturi
- Center for Biomedical Informatics and Biostatistics (CB2), University of Arizona Health Sciences, University of Arizona, Tucson, AZ, USA 85721.,Department of Medicine, University of Arizona, Tucson, AZ, USA 85724-5035.,Graduate Interdisciplinary Program in Statistics & Data Science, Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA 85721
| | - Nima Pouladi
- Department of Medicine, University of Arizona, Tucson, AZ, USA 85724-5035.,Department of Biomedical Informatics, University of Utah, UT, USA 84108
| | - Samir Rachid Zaim
- Center for Biomedical Informatics and Biostatistics (CB2), University of Arizona Health Sciences, University of Arizona, Tucson, AZ, USA 85721.,Department of Medicine, University of Arizona, Tucson, AZ, USA 85724-5035.,Graduate Interdisciplinary Program in Statistics & Data Science, Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA 85721
| | - Colleen Kenost
- Center for Biomedical Informatics and Biostatistics (CB2), University of Arizona Health Sciences, University of Arizona, Tucson, AZ, USA 85721.,Department of Medicine, University of Arizona, Tucson, AZ, USA 85724-5035.,Graduate Interdisciplinary Program in Statistics & Data Science, Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA 85721.,Department of Biomedical Informatics, University of Utah, UT, USA 84108
| | - Joanne Berghout
- Center for Biomedical Informatics and Biostatistics (CB2), University of Arizona Health Sciences, University of Arizona, Tucson, AZ, USA 85721.,Department of Medicine, University of Arizona, Tucson, AZ, USA 85724-5035.,Ctr for Appl. Genetics and Genomic Medic, University of Arizona, Tucson, AZ, USA 85721
| | - Walter W Piegorsch
- Center for Biomedical Informatics and Biostatistics (CB2), University of Arizona Health Sciences, University of Arizona, Tucson, AZ, USA 85721.,Graduate Interdisciplinary Program in Statistics & Data Science, Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA 85721.,Bio5 Institute, University of Arizona, Tucson, AZ, USA 85721
| | - Yves A Lussier
- Center for Biomedical Informatics and Biostatistics (CB2), University of Arizona Health Sciences, University of Arizona, Tucson, AZ, USA 85721.,Department of Medicine, University of Arizona, Tucson, AZ, USA 85724-5035.,Graduate Interdisciplinary Program in Statistics & Data Science, Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA 85721.,Department of Biomedical Informatics, University of Utah, UT, USA 84108.,Ctr for Appl. Genetics and Genomic Medic, University of Arizona, Tucson, AZ, USA 85721.,Bio5 Institute, University of Arizona, Tucson, AZ, USA 85721
| |
Collapse
|
3
|
Li Q, Zaim SR, Aberasturi D, Berghout J, Li H, Vitali F, Kenost C, Zhang HH, Lussier YA. Interpretation of 'Omics dynamics in a single subject using local estimates of dispersion between two transcriptomes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:582-591. [PMID: 32308852 PMCID: PMC7153139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Calculating Differentially Expressed Genes (DEGs) from RNA-sequencing requires replicates to estimate gene-wise variability, a requirement that is at times financially or physiologically infeasible in clinics. By imposing restrictive transcriptome-wide assumptions limiting inferential opportunities of conventional methods (edgeR, NOISeq-sim, DESeq, DEGseq), comparing two conditions without replicates (TCWR) has been proposed, but not evaluated. Under TCWR conditions (e.g., unaffected tissue vs. tumor), differences of transformed expression of the proposed individualized DEG (iDEG) method follow a distribution calculated across a local partition of related transcripts at baseline expression; thereafter the probability of each DEG is estimated by empirical Bayes with local false discovery rate control using a two-group mixture model. In extensive simulation studies of TCWR methods, iDEG and NOISeq are more accurate at 5%<DEGs<20% (precision>90%, recall>75%, false_positive_rate<1%) and 30%<DEGs<40% (precision=recall~90%), respectively. The proposed iDEG method borrows localized distribution information from the same individual, a strategy that improves accuracy to compare transcriptomes in absence of replicates at low DEGsconditions. http://www.lussiergroup.org/publications/iDEG.
Collapse
Affiliation(s)
- Qike Li
- Center for Biomedical Informatics and Biostatistics(CB2)
- Department of Medicine
- Graduate Interdisciplinary Program in Statistics
- Corresponding authors
| | - Samir Rachid Zaim
- Center for Biomedical Informatics and Biostatistics(CB2)
- Department of Medicine
- Graduate Interdisciplinary Program in Statistics
- Corresponding authors
| | - Dillon Aberasturi
- Center for Biomedical Informatics and Biostatistics(CB2)
- Department of Medicine
- Graduate Interdisciplinary Program in Statistics
| | - Joanne Berghout
- Center for Biomedical Informatics and Biostatistics(CB2)
- Center for Applied Genetics and Genomic Medicine
| | - Haiquan Li
- Center for Biomedical Informatics and Biostatistics(CB2)
- Department of Medicine
- Graduate Interdisciplinary Program in Statistics
| | - Francesca Vitali
- Center for Biomedical Informatics and Biostatistics(CB2)
- Department of Medicine
| | - Colleen Kenost
- Center for Biomedical Informatics and Biostatistics(CB2)
- Department of Medicine
- BIO5 Institute
| | - Helen Hao Zhang
- Graduate Interdisciplinary Program in Statistics
- Department of Mathematics, The University of Arizona, Tucson, AZ 85721, USA
- Contributed equally
| | - Yves A Lussier
- Center for Biomedical Informatics and Biostatistics(CB2)
- University of Arizona Cancer Center
- Contributed equally
| |
Collapse
|
4
|
Rachid Zaim S, Kenost C, Berghout J, Vitali F, Zhang HH, Lussier YA. Evaluating single-subject study methods for personal transcriptomic interpretations to advance precision medicine. BMC Med Genomics 2019; 12:96. [PMID: 31296218 PMCID: PMC6624180 DOI: 10.1186/s12920-019-0513-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Background Gene expression profiling has benefited medicine by providing clinically relevant insights at the molecular candidate and systems levels. However, to adopt a more ‘precision’ approach that integrates individual variability including ‘omics data into risk assessments, diagnoses, and therapeutic decision making, whole transcriptome expression needs to be interpreted meaningfully for single subjects. We propose an “all-against-one” framework that uses biological replicates in isogenic conditions for testing differentially expressed genes (DEGs) in a single subject (ss) in the absence of an appropriate external reference standard or replicates. To evaluate our proposed “all-against-one” framework, we construct reference standards (RSs) with five conventional replicate-anchored analyses (NOISeq, DEGseq, edgeR, DESeq, DESeq2) and the remainder were treated separately as single-subject sample pairs for ss analyses (without replicates). Results Eight ss methods (NOISeq, DEGseq, edgeR, mixture model, DESeq, DESeq2, iDEG, and ensemble) for identifying genes with differential expression were compared in Yeast (parental line versus snf2 deletion mutant; n = 42/condition) and a MCF7 breast-cancer cell line (baseline versus stimulated with estradiol; n = 7/condition). Receiver-operator characteristic (ROC) and precision-recall plots were determined for eight ss methods against each of the five RSs in both datasets. Consistent with prior analyses of these data, ~ 50% and ~ 15% DEGs were obtained in Yeast and MCF7 datasets respectively, regardless of the RSs method. NOISeq, edgeR, and DESeq were the most concordant for creating a RS. Single-subject versions of NOISeq, DEGseq, and an ensemble learner achieved the best median ROC-area-under-the-curve to compare two transcriptomes without replicates regardless of the RS method and dataset (> 90% in Yeast, > 0.75 in MCF7). Further, distinct specific single-subject methods perform better according to different proportions of DEGs. Conclusions The “all-against-one” framework provides a honest evaluation framework for single-subject DEG studies since these methods are evaluated, by design, against reference standards produced by unrelated DEG methods. The ss-ensemble method was the only one to reliably produce higher accuracies in all conditions tested in this conservative evaluation framework. However, single-subject methods for identifying DEGs from paired samples need improvement, as no method performed with precision> 90% and obtained moderate levels of recall. http://www.lussiergroup.org/publications/EnsembleBiomarker
Collapse
Affiliation(s)
- Samir Rachid Zaim
- The Center for Biomedical Informatics & Biostatistics of the University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA.,The Department of Medicine, College of Medicine Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85724-5035, USA.,The Graduate Interdisciplinary Program in Statistics, The University of Arizona, 617 N. Santa Rita Ave, Tucson, AZ, 85721, USA
| | - Colleen Kenost
- The Center for Biomedical Informatics & Biostatistics of the University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA.,The Department of Medicine, College of Medicine Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85724-5035, USA
| | - Joanne Berghout
- The Center for Biomedical Informatics & Biostatistics of the University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA.,The Department of Medicine, College of Medicine Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85724-5035, USA.,The Center for Applied Genetic and Genomic Medicine, 1295 N. Martin, Tucson, AZ, 85721, USA
| | - Francesca Vitali
- The Center for Biomedical Informatics & Biostatistics of the University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA.,The Department of Medicine, College of Medicine Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85724-5035, USA
| | - Helen Hao Zhang
- The Graduate Interdisciplinary Program in Statistics, The University of Arizona, 617 N. Santa Rita Ave, Tucson, AZ, 85721, USA.,The Department of Mathematics, College of Sciences, The University of Arizona, 617 N. Santa Rita Ave, Tucson, AZ, 85721, USA
| | - Yves A Lussier
- The Center for Biomedical Informatics & Biostatistics of the University of Arizona Health Sciences, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA. .,The Department of Medicine, College of Medicine Tucson, 1501 N. Campbell Ave, Tucson, AZ, 85724-5035, USA. .,The Graduate Interdisciplinary Program in Statistics, The University of Arizona, 617 N. Santa Rita Ave, Tucson, AZ, 85721, USA. .,The Center for Applied Genetic and Genomic Medicine, 1295 N. Martin, Tucson, AZ, 85721, USA. .,The University of Arizona Cancer Center, 3838 N. Campbell Ave, Tucson, AZ, 85719-1454, USA.
| |
Collapse
|
5
|
Vitali F, Li Q, Schissler AG, Berghout J, Kenost C, Lussier YA. Developing a 'personalome' for precision medicine: emerging methods that compute interpretable effect sizes from single-subject transcriptomes. Brief Bioinform 2019; 20:789-805. [PMID: 29272327 PMCID: PMC6585155 DOI: 10.1093/bib/bbx149] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2017] [Revised: 10/06/2017] [Indexed: 12/13/2022] Open
Abstract
The development of computational methods capable of analyzing -omics data at the individual level is critical for the success of precision medicine. Although unprecedented opportunities now exist to gather data on an individual's -omics profile ('personalome'), interpreting and extracting meaningful information from single-subject -omics remain underdeveloped, particularly for quantitative non-sequence measurements, including complete transcriptome or proteome expression and metabolite abundance. Conventional bioinformatics approaches have largely been designed for making population-level inferences about 'average' disease processes; thus, they may not adequately capture and describe individual variability. Novel approaches intended to exploit a variety of -omics data are required for identifying individualized signals for meaningful interpretation. In this review-intended for biomedical researchers, computational biologists and bioinformaticians-we survey emerging computational and translational informatics methods capable of constructing a single subject's 'personalome' for predicting clinical outcomes or therapeutic responses, with an emphasis on methods that provide interpretable readouts. Key points: (i) the single-subject analytics of the transcriptome shows the greatest development to date and, (ii) the methods were all validated in simulations, cross-validations or independent retrospective data sets. This survey uncovers a growing field that offers numerous opportunities for the development of novel validation methods and opens the door for future studies focusing on the interpretation of comprehensive 'personalomes' through the integration of multiple -omics, providing valuable insights into individual patient outcomes and treatments.
Collapse
Affiliation(s)
| | - Qike Li
- BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | | | | | | | | |
Collapse
|
6
|
Schissler AG, Aberasturi D, Kenost C, Lussier YA. A Single-Subject Method to Detect Pathways Enriched With Alternatively Spliced Genes. Front Genet 2019; 10:414. [PMID: 31143202 PMCID: PMC6521780 DOI: 10.3389/fgene.2019.00414] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 04/16/2019] [Indexed: 01/25/2023] Open
Abstract
RNA-Sequencing data offers an opportunity to enable precision medicine, but most methods rely on gene expression alone. To date, no methodology exists to identify and interpret alternative splicing patterns within pathways for an individual patient. This study develops methodology and conducts computational experiments to test the hypothesis that pathway aggregation of subject-specific alternatively spliced genes (ASGs) can inform upon disease mechanisms and predict survival. We propose the N-of-1-pathways Alternatively Spliced (N1PAS) method that takes an individual patient’s paired-sample RNA-Seq isoform expression data (e.g., tumor vs. non-tumor, before-treatment vs. during-therapy) and pathway annotations as inputs. N1PAS quantifies the degree of alternative splicing via Hellinger distances followed by two-stage clustering to determine pathway enrichment. We provide a clinically relevant “odds ratio” along with statistical significance to quantify pathway enrichment. We validate our method in clinical samples and find that our method selects relevant pathways (p < 0.05 in 4/6 data sets). Extensive Monte Carlo studies show N1PAS powerfully detects pathway enrichment of ASGs while adequately controlling false discovery rates. Importantly, our studies also unveil highly heterogeneous single-subject alternative splicing patterns that cohort-based approaches overlook. Finally, we apply our patient-specific results to predict cancer survival (FDR < 20%) while providing diagnostics in pursuit of translating transcriptome data into clinically actionable information. Software available at https://github.com/grizant/n1pas/tree/master.
Collapse
Affiliation(s)
- Alfred Grant Schissler
- Department of Mathematics and Statistics, University of Nevada, Reno, Reno, NV, United States.,Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, United States
| | - Dillon Aberasturi
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, United States.,Department of Medicine, The University of Arizona, Tucson, AZ, United States.,The Graduate Interdisciplinary Program in Statistics, The University of Arizona, Tucson, AZ, United States
| | - Colleen Kenost
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, United States.,Department of Medicine, The University of Arizona, Tucson, AZ, United States
| | - Yves A Lussier
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, United States.,Department of Medicine, The University of Arizona, Tucson, AZ, United States.,BIO5 Institute, The University of Arizona, Tucson, AZ, United States.,Cancer Center, The University of Arizona, Tucson, AZ, United States.,University of Arizona Health Sciences, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|
7
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|
8
|
Parimbelli E, Marini S, Sacchi L, Bellazzi R. Patient similarity for precision medicine: A systematic review. J Biomed Inform 2018; 83:87-96. [PMID: 29864490 DOI: 10.1016/j.jbi.2018.06.001] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 05/16/2018] [Accepted: 06/01/2018] [Indexed: 12/19/2022]
Abstract
Evidence-based medicine is the most prevalent paradigm adopted by physicians. Clinical practice guidelines typically define a set of recommendations together with eligibility criteria that restrict their applicability to a specific group of patients. The ever-growing size and availability of health-related data is currently challenging the broad definitions of guideline-defined patient groups. Precision medicine leverages on genetic, phenotypic, or psychosocial characteristics to provide precise identification of patient subsets for treatment targeting. Defining a patient similarity measure is thus an essential step to allow stratification of patients into clinically-meaningful subgroups. The present review investigates the use of patient similarity as a tool to enable precision medicine. 279 articles were analyzed along four dimensions: data types considered, clinical domains of application, data analysis methods, and translational stage of findings. Cancer-related research employing molecular profiling and standard data analysis techniques such as clustering constitute the majority of the retrieved studies. Chronic and psychiatric diseases follow as the second most represented clinical domains. Interestingly, almost one quarter of the studies analyzed presented a novel methodology, with the most advanced employing data integration strategies and being portable to different clinical domains. Integration of such techniques into decision support systems constitutes and interesting trend for future research.
Collapse
Affiliation(s)
- E Parimbelli
- Telfer School of Management, University of Ottawa, Ottawa, Canada; Interdepartmental Centre for Health Technologies, University of Pavia, Italy.
| | - S Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - L Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - R Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy; RCCS ICS Maugeri, Pavia, Italy
| |
Collapse
|
9
|
Zaim SR, Li Q, Schissler AG, Lussier YA. Emergence of pathway-level composite biomarkers from converging gene set signals of heterogeneous transcriptomic responses. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:484-495. [PMID: 29218907 PMCID: PMC5730363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Recent precision medicine initiatives have led to the expectation of improved clinical decisionmaking anchored in genomic data science. However, over the last decade, only a handful of new single-gene product biomarkers have been translated to clinical practice (FDA approved) in spite of considerable discovery efforts deployed and a plethora of transcriptomes available in the Gene Expression Omnibus. With this modest outcome of current approaches in mind, we developed a pilot simulation study to demonstrate the untapped benefits of developing disease detection methods for cases where the true signal lies at the pathway level, even if the pathway's gene expression alterations may be heterogeneous across patients. In other words, we relaxed the crosspatient homogeneity assumption from the transcript level (cohort assumptions of deregulated gene expression) to the pathway level (assumptions of deregulated pathway expression). Furthermore, we have expanded previous single-subject (SS) methods into cohort analyses to illustrate the benefit of accounting for an individual's variability in cohort scenarios. We compare SS and cohort-based (CB) techniques under 54 distinct scenarios, each with 1,000 simulations, to demonstrate that the emergence of a pathway-level signal occurs through the summative effect of its altered gene expression, heterogeneous across patients. Studied variables include pathway gene set size, fraction of expressed gene responsive within gene set, fraction of expressed gene responsive up- vs down-regulated, and cohort size. We demonstrated that our SS approach was uniquely suited to detect signals in heterogeneous populations in which individuals have varying levels of baseline risks that are simultaneously confounded by patient-specific "genome -by-environment" interactions (G×E). Area under the precision-recall curve of the SS approach far surpassed that of the CB (1st quartile, median, 3rd quartile: SS = 0.94, 0.96, 0.99; CB= 0.50, 0.52, 0.65). We conclude that single-subject pathway detection methods are uniquely suited for consistently detecting pathway dysregulation by the inclusion of a patient's individual variability. http://www.lussiergroup.org/publications/PathwayMarker/.
Collapse
Affiliation(s)
- Samir Rachid Zaim
- Ctr for Biomed. Informatics & Biostatistics, Dept of Medicine, Grad. Interdisciplinary Prog. in Statist., The University of Arizona, 1657 E. Helen Street, Tucson, AZ, 85721, USA,
| | | | | | | |
Collapse
|
10
|
Li Q, Schissler AG, Gardeux V, Achour I, Kenost C, Berghout J, Li H, Zhang HH, Lussier YA. N-of-1-pathways MixEnrich: advancing precision medicine via single-subject analysis in discovering dynamic changes of transcriptomes. BMC Med Genomics 2017; 10:27. [PMID: 28589853 PMCID: PMC5461551 DOI: 10.1186/s12920-017-0263-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Transcriptome analytic tools are commonly used across patient cohorts to develop drugs and predict clinical outcomes. However, as precision medicine pursues more accurate and individualized treatment decisions, these methods are not designed to address single-patient transcriptome analyses. We previously developed and validated the N-of-1-pathways framework using two methods, Wilcoxon and Mahalanobis Distance (MD), for personal transcriptome analysis derived from a pair of samples of a single patient. Although, both methods uncover concordantly dysregulated pathways, they are not designed to detect dysregulated pathways with up- and down-regulated genes (bidirectional dysregulation) that are ubiquitous in biological systems. Results We developed N-of-1-pathways MixEnrich, a mixture model followed by a gene set enrichment test, to uncover bidirectional and concordantly dysregulated pathways one patient at a time. We assess its accuracy in a comprehensive simulation study and in a RNA-Seq data analysis of head and neck squamous cell carcinomas (HNSCCs). In presence of bidirectionally dysregulated genes in the pathway or in presence of high background noise, MixEnrich substantially outperforms previous single-subject transcriptome analysis methods, both in the simulation study and the HNSCCs data analysis (ROC Curves; higher true positive rates; lower false positive rates). Bidirectional and concordant dysregulated pathways uncovered by MixEnrich in each patient largely overlapped with the quasi-gold standard compared to other single-subject and cohort-based transcriptome analyses. Conclusion The greater performance of MixEnrich presents an advantage over previous methods to meet the promise of providing accurate personal transcriptome analysis to support precision medicine at point of care. Electronic supplementary material The online version of this article (doi:10.1186/s12920-017-0263-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qike Li
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA.,Graduate Interdisciplinary Program in Statistics, The University of Arizona, Tucson, AZ, 85721, USA
| | - A Grant Schissler
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA.,Graduate Interdisciplinary Program in Statistics, The University of Arizona, Tucson, AZ, 85721, USA
| | - Vincent Gardeux
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA
| | - Ikbel Achour
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA
| | - Colleen Kenost
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA
| | - Joanne Berghout
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA
| | - Haiquan Li
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA. .,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA. .,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA.
| | - Hao Helen Zhang
- Graduate Interdisciplinary Program in Statistics, The University of Arizona, Tucson, AZ, 85721, USA. .,Department of Mathematics, The University of Arizona, Tucson, AZ, 85721, USA.
| | - Yves A Lussier
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA. .,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA. .,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA. .,Graduate Interdisciplinary Program in Statistics, The University of Arizona, Tucson, AZ, 85721, USA. .,University of Arizona Cancer Center, The University of Arizona, Tucson, AZ, 85721, USA. .,Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL, 60637, USA.
| |
Collapse
|